UPDATE 11/20/2020
If you are migrating your NRQL alert conditions from the old system to the new Streaming Alerts Platform, please be aware that this open-source tool is available to help ease the toil involved with that!
-------
***** UPDATE 11/5/2020 *****
As of 10/30, most all of the accounts have been enabled for New Relic Streaming Alerts. There are a very small number of accounts that have not been enabled. Those accounts will see see a banner that says ”opt in“ . If you see that banner, talk with your New Relic admin on the agreed upon timeline.
-------
New Relic is rolling out a new, unified streaming alerts platform for New Relic One. This new streaming alerts platform will power NRQL Alert Conditions, and over the next year, all alert condition types will be consolidated into NRQL conditions.
New Relic One Streaming Alerts delivers:
- More reliable alerting that is far less susceptible to data latency and processing lag.
- Increased accuracy of the data points that are being evaluated
- Reduced time-to-detect through improvements in the streaming algorithm, and configurable aggregation duration.
- Greater control over the signals being monitored. You can specify how to evaluate signal gaps, when to consider a signal as lost, and what actions should be taken.
- Consistent behavior and configuration of Alert conditions regardless of the telemetry type, source of the signal being monitored, or specifics of your NRQL query.
- Increased scalability in the number of time series that an Alert Condition can monitor and in the total number of conditions that can be configured
Opt-in Migration
When we roll out this new streaming platform, there is a change in behavior related to how we process aggregation time windows that do not have data. If you are monitoring for when a signal goes to ”0“ in order to determine if an entity stops reporting, this approach will no longer work after moving to the new platform. To maintain this functionality you must enable Loss of Signal detection on these conditions in advance of moving your account in order to prevent false negatives. You may opt-in to this new platform now. Read more about the rollout plan in the FAQ section below.
Increased Reliability and Accuracy
This new streaming platform upgrades the streaming algorithm to an event-based mechanism that uses the incoming data points to move the streaming aggregation windows forward. The current model uses the clock on the server to trigger aggregation. With the new approach, an aggregation window will wait until the related data points arrive, thus greatly decreasing any negative effects that may be caused by lag in a data stream. This will also greatly reduce the alert latency and improve accuracy for Cloud Integrations that use a polling based integration.
Configurable Gap Filling Strategies
Not all signals or time series that are being monitored have a consistent flow of data points. The streaming alerts platform evaluates time windows of a specified duration. In many cases, the telemetry signals you send to New Relic will have gaps, meaning that some time windows will not have data. With the new streaming platform, you can specify how we should evaluate those gaps. You can also set different gap filling strategies, sometimes called extrapolation strategies, for each alert condition.
Loss Of Signal Detection
The NR One Streaming Alerts Platform now provides official support for Loss of Signal Detection. While there are workarounds to achieve this in the current platform, they are inconsistent, and the shift to an event based streaming algorithm disables that workaround. With configurable Loss of Signal Detection, on any NRQL Alert Condition, you simply specify how many seconds we should wait from the time we saw the last data point before we consider that signal to be lost. Once that time expires, you can choose to be notified of the Loss of Signal, or you can simply close any open violations if you expect the entity or signal to go away.
Faster alerts (Sub-minute time-to-detect)
With the NR One Streaming Alert Platform, all telemetry data can be evaluated in sub-minute timeframes. We will allow you to configure the aggregation duration down to as low as 5 seconds, and increase it to a maximum of 15 minutes. This, combined with the benefits of the event-driven streaming algorithm will allow you to achieve sub-minute time-to-detect while increasing both accuracy and reliability. Depending on your data configuration and the requirements of your scenario, you can achieve a time-to-detect as low as 10-15 seconds.
--------- Frequently Asked Questions. -----------
Q: When is this available?
A: THIS IS NOW ENABLED ACROSS ALL ACCOUNTS
Q: How do I request to have our account(s) enabled?
A: If you account is not yet enabled, please talk with whomever manages your New Relic account in your organization.
Q:Is there any Documentation?
A: Yes. An overview of Loss of Signal and Gap Filling Strategies, along with how to configure them in graphQL is documented here: https://docs.newrelic.com/docs/alerts-applied-intelligence/new-relic-alerts/alert-conditions/create-nrql-alert-conditions
Q: How do I manage these features?
A: You can configure these features on NRQL Conditions using the UI, GraphQL API for NRQL Conditions, and the REST API for NRQL Conditions.
Q: Can I configure these settings before having the new streaming platform enabled?
A: Yes, if you are opting in before 10/5, we can enable the UI for you before you enable the account. This will allow you to update your NRQL conditions, if needed, before the features are enabled. After the week of 10/5 , all accounts will have access to the UI and APIs. If your account is not enabled during that week, you can use the UI and API to update any alert conditions before having these new features enabled.
Q: Will the NR One Streaming Alerts Platform cover all alerting services?
A: Only NRQL Conditions will receive the full set of New Relic One Streaming Alerts functionality. APM, Infrastructure, and Synthetics alerts will be migrated to NRQL Conditions over the course of the year.
Q: Are all of the features mentioned above available?
A: Gap Filling, Signal Loss Detection, and configurable aggregation duration are available now. The event based streaming algorithm will be released later in the year.
Q: Will this eliminate false positives.
A: No, but this should greatly reduce false positives. Eliminating false positives and false negatives is an audacious goal that all alerting engines continuously combat and one we continue to work toward.
Additionally, Loss of Signal Detection is monitoring for the absence of data for a period of time. Whenever clock time is involved, there is a higher chance of false positives when there is significant disruption to the flow of data. If there is known latency within the New Relic platform, we take that into consideration, but that does not address all possible signal disruptions between the data collection source and the New Relic One Streaming Alert Platform.
Q: I have more questions, how can I get answers ?
A: Please reach out to your account teams if you have questions or concerns.
Alternatively, you can ask questions in the discussion area below, and a New Relic community leader will answer. For a deeper dive into what is new, and how to best use these new features, sign up for New Relic Nerd Days on October 13, and check out the Alerts session at 2:00 PM PST. I will share the recording here afterwards.