All of the folks behind the scenes are hard at work on a slew of new features for Alerts and Applied Intelligence, and carefully determining which old features we will be sunsetting. Below, I’ll go through what exciting things to expect, what we’ll be end-of-lifing (EOL), and what actions you can take to prepare for the changes.
Note that this is an early announcement to allow you to plan any work you need to complete over the next 12-18 months. Not all the dates mentioned are locked in, but the rough timeline should help you with your planning.
Why are we doing this?
In short, we’re providing a whole new alerting lifecycle experience. The new experience provides faster time to resolution, reduced noise and increased reliability
Based on your feedback, we have designed an entirely new workflow for responding to incidents that will result in faster time-to-resolution with decreased noise. It involves correlating other, related anomalies with the core incident, to raise visibility of possible problem sources across your estate. Some of these changes are directly related to this new workflow.
In addition, we’re simplifying alert configuration and standardizing on one single type of condition so that we can provide deeper features and improve reliability. Some of the new changes relate to streamlining alert conditions and the process you use to create them.
Finally, we’re rolling out a brand new method of evaluation (sliding windows aggregation) that will offer more flexibility and resiliency to thresholds that have been using the “Sum of query results” option.
Each of these initiatives involves introducing new features and EOLing old ones.
What’s coming?
- A new incident response experience (January 2022)
- The introduction of Issues and Incidents, which will replace Incidents and Violations, respectively
- New Issues page, which will enrich your Issues with correlated details
- New Nerdgraph API functionality
- A new way to manage notifications (January 2022)
- New name: Destinations/Workflows
- Increased flexibility and configurability
- Ability to notify on Warning thresholds
- Configurable notification content
- Streamlining and simplifying alert configuration (October 2022)
- Creating NRQL alert conditions for all entity types will be streamlined and simplified
- Integrating alert creation throughout the platform
- Sliding window aggregation will allow for flexible averages to be evaluated (February 2022)
- This will replace “Sum of query results” thresholds
- This continues the movement toward giving you more control over the signals you’re monitoring with Alerts
What’s going away?
- Current incident response workflow (replaced by the new incident response experience) (October 2022)
- Incident page
- Violation details page
- Incident, Violation and Alerting Events APIs
- Notification channels and their relationship to policies (replaced by Destinations and Workflows, a new way to manage notifications) (April 2023)
- Some channel types (OpsGenie, VictorOps, XMatters) will not have a dedicated integration, and will instead use a webhook template.
- All non-NRQL alert condition types (replaced by NRQL alert conditions with increased functionality) (2nd half of 2023)
- Condition type that will still be available, in addition to NRQL conditions: Synthetics multi-location condition types (single-location will be going away)
- “Sum of query results” thresholds (replaced by sliding window aggregation) (June 2022)
- More details in the “Sum of” EOL announcement at this link
- Outlier NRQL alert condition types (not being replaced by any new feature) (31 March 2022)
- Due to minimal adoption, this feature will be retired.
- More details in the Outliers EOL announcement here.
- This has been completed – Outlier alert conditions have been retired.
What must you do to prepare?
This section will look at changes that need to happen at a high level before the features mentioned above are EOL’d. If you are using the UI to make these changes, there will be more information (and potentially tools) on how to execute on these in the following weeks.
- If you are using APIs related to Incident or Violation details or performing actions such as acknowledging/closing incidents, you will need to adjust your scripts to use the new Nerdgraph endpoints for Incidents (formerly known as Violations) and Issues (formerly known as Incidents).
- If you’re not using APIs for these functions, nothing needs to be done to prepare for the new incident workflow.
- Create new Destinations to replace your existing notification channels, and create Workflows to define when and how the new Issues will handle notifications.
- Migrate all non-NRQL conditions (exceptions mentioned above) to NRQL alert conditions.
- We will be releasing tooling in the UI to help with this migration.
- Adjust your “Sum of query results” alert conditions to use sliding aggregation instead.
- You can do this conversion one-at-a-time using the button in the condition creation UI.
- Specific instructions on how to adjust these using the bulk conversion tool are documented here.
How can you keep track of these changes?
Follow this thread – updates with further information and instructions will be posted here.
Where can you go for help?
You can reach out to your account team for any assistance. You’re also welcome to ask questions in this thread.