Migrate NRQL conditions to conform to native Loss of Signal Detection -
What is This App:
This is a bulk review and edit tool for NRQL conditions that may need Loss of Signal detection configured in order to prevent false negatives from occuring after the rollout of New Relic One Streaming Alerts.
Throughout the month of October, 2020, New Relic is rolling out “New Relic One Streaming Alerts” for NRQL Conditions. This new platform will deliver a good number of benefits that will ultimately make alerts more accurate and reliable while delivering significant improvements in time-to-detect. Read about the details of this release here.
With this rollout, we will be making the entire streaming pipeline event-driven to improve reliability, delivering official support for loss of signal detection, and allowing you to specify which gap filling strategy you wish to use.
Critical Change in Behavior:
This rollout will change behavior that you may be relying on to detect when an entity or service goes offline. We will no longer be inserting a “0” into the alert evaluation stream when there are gaps in the data. Gaps in data occur when there is no data for a specific aggregation window. Therefore, if you are currently monitoring for the uptime of an entity of service using an alert condition with an evaluation that uses a “<” operator, or “=0” , they will immediately stop working once the new streaming platform is enabled on your account. This will result in “false negatives” if that monitored service does go down.
To prevent this, you MUST update all such NRQL conditions to use the new loss of signal detection capability BEFORE, or immediately after, New Relic One Streaming Alerts is enabled for your account. If you are reading this after October 28, 2020, then you can use this to find the conditions that may no longer be working and need updating.
How This App Works:
This app is a bulk review and editing tool for all nrql conditions for a given account that use either the “<” operator with any threshold value, or use the “=” operator with a threshold value of “0”. The key elements of the NRQL Condition are listed as read-only in the left portion of a row, with the new settings for Loss of Signal and Gap Filling exposed on the right side. If there are already values set for those fields, they will be displayed. You may edit and update conditions individually, or edit and update them in batches.
The “Auto-fill Suggestions” button is here to help you if you are not sure what to enter for Loss of Signal duration. When you select a set of NRQL condition rows, and click the button, it will fill in a time for the duration and enable the “open violation on expiration” option. The duration time will be the sum of the evaluation duration and the evaluation offset.
What do the Settings Mean:
Loss of Signal
Duration: Also listed as “expiration.duration” in NerdGraph, is the clock-time for how long we should wait from the time the alert service received the last data point before we consider this signal lost.
Open Violation: When a signal is considered lost, you have the option to open a new violation, which will usually create a new notification. “Auto-fill Suggestions” will check this box.
Close Violations: When a signal is considered lost, you can choose to close all open violations related to that signal. This is useful for ephemeral services, or if the signal stopping causes a violation to not close. When both options are chosen, all violations for that signal are closed before the new “Loss of Signal” violation
Gap Filling Strategy
Fill Option: There are 3 fill option strategies to choose from : None, Static, and Last Value.
- “None” is the default. When there is an aggregation window that has no value when being evaluated, the window will stay empty, and the evaluation duration timer will reset.
- “Last value” will carry the last seen value forward to fill that gap before being evaluated.
- “Static” will fill the gap with a static value that you specify in the “Fill Value” field. For most use cases, the value used here is “0”. The “Fill Value” field is only valid for the “static” fill option.
Installation / Setup -
Login to New Relic One, click on the New Relic One Catalog launcher, select Loss of Signal Alerts Migrator, and install/configure account-level access to the application.
Or else, use the NR1 CLI to download, install, and deploy it yourself.
nr1 nerdpack:clone -r https://github.com/newrelic/nr1-alerts-los-migrator.git cd nr1-alerts-los-migrator nr1 nerdpack:serve
Visit https://one.newrelic.com/?nerdpacks=local, navigate to the Nerdpack, and
This Nerdpack is supported by the developers here in this community thread, or alternatively you can ask questions and in the Github Issues page. If you can fix the issue yourself, please do submit a pull request.