+1 for creating baseline alerts via API
Hey @magnus.kulke Thanks for the +1 - I’ll get that added for you.
@siddhant.agarwal - Sorry I missed your last comment - I don’t believe there’s a way around this with NRQL.
In general, if the ‘detailed’ setting is complicated, why not set it to the same default you start with when you set it in the GUI? Allow users to adjust from there? For teh second part, allow us to set ‘sensitivity’ separately after we learn what the value should be from human experimentation. For example, I typically build our new alerts in the GUI. However, with multiple environments managed by different accounts, I need a way to take those ‘manually’ created entries and turn them into code so I can make sure every accounts alerts are built the same way. The idea of ‘deciding’ what to set should be able to be simplified to ‘default’, then allow a method to follow up and reset the specific value later, if we have a change to the default behavior.
We are experimenting with the baseline settings and encounter situations where alerting is less than optimal. It would be nice to have API/user settings for the following:
Enable/disable the dynamic baseline calculation if a condition is met. If throughput falls off a cliff, don’t continue to adjust the baseline downward. This could be similar to “call count deviates from the baseline” > “for more than” > etc.
Fix/set the baseline value used in alert determination to a specified point in time, such as the value from one hour ago. We notice that a gradual decline in throughput results in a gradual baseline decline, and no notification. Adjusting the more/fewer violations slider still results in over/under alerting. If used in conjunction with the previous item, ideally an alert will trigger when throughput falls sharply, the baseline remains static, and that same baseline value is used to determine when the alert should clear.
We are considering using the API to get the throughput values within a time range, calculate the 1- or 3-hour average and deviation, save the threshold value (for determining when the alert should clear), and generate an alert via a synthetic.
Does this also mean that baseline alerts cannot be terraformed?
Hello all — I wanted to make note of the fact that NRQL baseline alert conditions have been made available through NerdGraph, our GraphQL-based API. Here’s what the schema looks like:
While application metric baselines are still unavailable through the API, I wanted to share that document as this could provide some of what you’re all looking for.
@soumya.jk — I’m not certain whether Terraform supports NRQL baselines at the moment. I believe it hooks into our REST API, which does not provide support for baseline alert conditions (only the aforementioned GraphQL API does). That may be a good question to run by Terraform, though.