New Relic is rolling out a new, unified streaming alerts platform for New Relic One. This new streaming alerts platform will power NRQL Alert Conditions, and over the next year, all alert condition types will be consolidated into NRQL conditions.
New Relic One Streaming Alerts delivers:
- More reliable alerting that is far less susceptible to data latency and processing lag.
- Increased accuracy of the data points that are being evaluated
- Reduced time-to-detect through improvements in the streaming algorithm, and configurable aggregation duration.
- Greater control over the signals being monitored. You can specify how to evaluate signal gaps, when to consider a signal as lost, and what actions should be taken.
- Consistent behavior and configuration of Alert conditions regardless of the telemetry type, source of the signal being monitored, or specifics of your NRQL query.
- Increased scalability in the number of time series that an Alert Condition can monitor and in the total number of conditions that can be configured
When we roll out this new streaming platform, there is a change in behavior related to how we process aggregation time windows that do not have data. If you are monitoring for when a signal goes to “0” in order to determine if an entity stops reporting, this approach will no longer work after moving to the new platform. To maintain this functionality you must enable Loss of Signal detection on these conditions in advance of moving your account in order to prevent false negatives. You may opt-in to this new platform now. Read more about the rollout plan in the FAQ section below.
Increased Reliability and Accuracy
This new streaming platform upgrades the streaming algorithm to an event-based mechanism that uses the incoming data points to move the streaming aggregation windows forward. The current model uses the clock on the server to trigger aggregation. With the new approach, an aggregation window will wait until the related data points arrive, thus greatly decreasing any negative effects that may be caused by lag in a data stream. This will also greatly reduce the alert latency and improve accuracy for Cloud Integrations that use a polling based integration.
Configurable Gap Filling Strategies
Not all signals or time series that are being monitored have a consistent flow of data points. The streaming alerts platform evaluates time windows of a specified duration. In many cases, the telemetry signals you send to New Relic will have gaps, meaning that some time windows will not have data. With the new streaming platform, you can specify how we should evaluate those gaps. You can also set different gap filling strategies, sometimes called extrapolation strategies, for each alert condition.
Loss Of Signal Detection
The NR One Streaming Alerts Platform now provides official support for Loss of Signal Detection. While there are workarounds to achieve this in the current platform, they are inconsistent, and the shift to an event based streaming algorithm disables that workaround. With configurable Loss of Signal Detection, on any NRQL Alert Condition, you simply specify how many seconds we should wait from the time we saw the last data point before we consider that signal to be lost. Once that time expires, you can choose to be notified of the Loss of Signal, or you can simply close any open violations if you expect the entity or signal to go away.
Faster alerts (Sub-minute time-to-detect)
With the NR One Streaming Alert Platform, all telemetry data can be evaluated in sub-minute timeframes. We will allow you to configure the aggregation duration down to as low as 5 seconds, and increase it to a maximum of 15 minutes. This, combined with the benefits of the event-driven streaming algorithm will allow you to achieve sub-minute time-to-detect while increasing both accuracy and reliability. Depending on your data configuration and the requirements of your scenario, you can achieve a time-to-detect as low as 10-15 seconds.
--------- Frequently Asked Questions. -----------
Q: When is this available?
A: You can Opt-in to enable New Relic Streaming Alerts on NRQL conditions now.
We plan to enable the majority of accounts the week of October 5th.
Accounts that have NRQL Conditions that may be monitoring for loss of signal will be enabled on October 28th. These are NRQL Conditions that either use the “Less Than” operator, or have an operator and threshold of “Equals 0”.
Q: How do I request to have our account(s) enabled?
A: Simply complete this form: https://sgnf.typeform.com/to/FkUEMwBP
We will be enabling accounts in batches on Tuesdays, Wednesdays, and Thursdays.
Please specify when you would like for your accounts to be enabled, and let us know if you have questions. You may also discuss this with your account team.
Q: How will I know if my account has been enabled.
A: When we roll this out the week of 10/5, there will be a banner on the Policies page and the NRQL Condition create/edit page. If your account is not enabled, the banner will ask you to enable New Relic One Streaming Alerts, and link you back to this document.
Q:Is there any Documentation?
A: Yes. An overview of Loss of Signal and Gap Filling Strategies, along with how to configure them in graphQL is documented here: NerdGraph API: Loss of signal and gap filling .
Additional documentation will be published very soon, and this section will be updated
Q: How do I manage these features?
A: You can configure these features on NRQL Conditions using the UI, GraphQL API for NRQL Conditions, and the REST API for NRQL Conditions.
Q: Can I configure these settings before having the new streaming platform enabled?
A: Yes, if you are opting in before 10/5, we can enable the UI for you before you enable the account. This will allow you to update your NRQL conditions, if needed, before the features are enabled. After the week of 10/5 , all accounts will have access to the UI and APIs. If your account is not enabled during that week, you can use the UI and API to update any alert conditions before having these new features enabled.
Q: Will the NR One Streaming Alerts Platform cover all alerting services?
A: Only NRQL Conditions will receive the full set of New Relic One Streaming Alerts functionality. APM, Infrastructure, and Synthetics alerts will be migrated to NRQL Conditions over the course of the year.
Q: Are all of the features mentioned above available?
A: Gap Filling and Loss of Signal Detection are available now. The remaining features, configurable aggregation duration and the event based streaming algorithm will be released incrementally throughout the rollout period.
Q: Will this eliminate false positives.
A: No, but this should greatly reduce false positives. Eliminating false positives and false negatives is an audacious goal that all alerting engines continuously combat and one we continue to work toward.
Additionally, Loss of Signal Detection is monitoring for the absence of data for a period of time. Whenever clock time is involved, there is a higher chance of false positives when there is significant disruption to the flow of data. If there is known latency within the New Relic platform, we take that into consideration, but that does not address all possible signal disruptions between the data collection source and the New Relic One Streaming Alert Platform.
Q: I have more questions, how can I get answers ?
A: Please reach out to your account teams if you have questions or concerns.
Alternatively, you can ask questions in the discussion area below, and a New Relic community leader will answer. For a deeper dive into what is new, and how to best use these new features, sign up for New Relic Nerd Days on October 13, and check out the Alerts session at 2:00 PM PST. I will share the recording here afterwards.