@Fidelicatessen Thanks for the updates. Where can I find the details around the scripting changes that will be necessary?
We endeavor to keep our API docs up-to-date, and have made the Terraform provider a top priority, so that should be up-to-date as well.
All of these features are still months out, so it’s unclear yet exactly what changes you’ll need to make, but if you follow these pages, it should help!
API docs for Alerts:
- Alerts conditions API field names | New Relic Documentation
- Nerdgraph has in-UI documentation, but here’s a Nerdgraph tutorial for creating NRQL alert conditions: NerdGraph tutorial: NRQL condition alerts | New Relic Documentation
Terraform provider for New Relic:
@Fidelicatessen Love seeing NRQL alerts getting linked to their entities for health. I am seeing a little oddity in that on the explorer dashboard the entity is marked red during an incident, but when I go into the Brower UI there are no “Open Violations” that link back to the ‘incident’. I’ll open a support ticket on it, but wanted to provide that feedback here too.
Thanks for your response, much appreciated. I’ve delved into this a little more and found myself to be wrong: all FACET appName
entities are now covered by their NRQL alert conditions. Absolutely stellar result!
Only thing remaining is that the NRQL alert conditions themselves are not rendered on the app’s APM > Alert conditions page, as shown in the screenshot attached. This seems like an anomaly so have raised ticket #474806 accordingly.
What will the recommended NRQL for HNR alerts be? We might want to get an early start using that, because the existing “Don’t trigger alerts for hosts that perform a clean shutdown” checkbox doesn’t seem to catch Amazon EKS nodes that scale down during auto-scaling or an EKS node refresh.
You can already get an effective HNR alert, assuming your entity is emitting telemetry data to New Relic. Here’s how I would do it if I had a host running Infrastructure:
SELECT count(*) FROM SystemSample FACET hostname
Set Loss of Signal to 5 minutes for the standard default 5m HNR violation, and set the threshold to look for values below 1, so that violations will close on their own after the host starts reporting again.
This will also track each host separately, since the query uses FACET hostname
.
Ok, I was expecting something new in NRQL land, but simplicity is often the best. Thanks for clarifying.
Would it not be above 1, instead of below?
Would it not be above 1, instead of below?
No, because “above 1” indicates that the host is still reporting. A result below 1 would indicate that the host stopped reporting.
NOTE: In order to open a violation on this, however, you would also need to set up a Loss of Signal, since (as this article explains), the result of a count(*)
function, in this case, would never reach 0.
Are all the above features live already?
Hi folks!
We have revisited the work that we’re doing to improve your Alerts and Incident Response experience and have revised the target dates in both the What's coming?
and What's going away?
sections. I have updated the original post in this thread to reflect our revised dates. Please have a look!
hi. i still do not see any progress on it. warning alerts are not being sent to our notification channel. but the erros do.
Hi @kostyantyn
You will need to create a Workflow that specifies Warning-level (Workflows calls this High
, as opposed to Critical
). You can then specify exactly what to do with Warning-level incidents, such as sending a notification.
Documentation on Workflows can be found here: Workflows | New Relic Documentation
Have the Workflows & Destinations features been added to the Terraform provider? / will they?
Also, on my account, I see something called Pathways where Workflows is shown in the docs, has it been renamed to Pathways or is Workflows just not visible to me?
I just got in touch with the Product Manager who oversees Workflows to ask them about the Terraform provider – they may pop in here with more insight on that.
With regards to Pathways
showing up in your UI, this is because you still have the old Incident Intelligence feature turned on (a feature that never made it out of early access). Since both this feature and the new features cannot co-exist, you’re seeing only Incident Intelligence. You will need to contact your account team to request that switch be turned off on the accounts you’re concerned with.
Great to hear you’re in touch with the PM as well. At any large-scale or enterprise-level, Terraform becomes the de facto to configure monitoring.
This is from a wee while back and the team had demonstrated these words by having day 1 Terraform support when the new alert types were released.
Totally understandable that it’s not easy to replicate that level of compatibility every time, but that definitely stood out as one of the most thought-out and well-executed releases.
Fingers-crossed to see more of that with any new changes!
Oh okay, so there’s no point in having the existing alert policies with that ‘Connect to Incident Intelligence’ checked? That’s effectively dead? Or at least it’s unrelated to anything like the new workflow and destinations?
Hi @ntkach ,
The somewhat poorly named “Connect to Incident Intelligence” checkbox you see on Alert Policies is an important setting that will effectively add that Policy to the “Sources” for correlation by Decisions.
I know that was a mouthful and a lot of terminology so I’ll use some pictures to illustrate.
We have the setting on the Policy for those who use Terraform and need to be able to create policies that send their incident events to the correlation engine so Decisions can intelligently group events together and reduce noise.
I hope this was helpful.