We use NR extensivly across our services and are looking to implement a simple SLO dashboard to help the service owner negotiate and monitor golden signal thresholds. Read https://blog.newrelic.com/engineering/best-practices-for-setting-slos-and-slis-for-modern-complex-systems/ but cannot find a way to combine SLIs into a simple RED | Green box or something to say if a service is in or out of compliance. Its frustrating because the reponse time and error rate data is in NR. Is there a plugin or something we can use or do we need to extract the data via the API and build our own simple webapp?
Great question @feidhlim.oneill!
This might be a great use-case for dashboards. Are there specific queries which you are working on or charts which you are trying to create?
I would be happy to talk through any specific questions or queries you might have and figure out whether NRQL and Dashboards would suit your SLO dashboard needs.
I would also be interested to know if anyone else in the community has used Insights for a similar purpose?
The logic we want to implement is if the 99% response time is greater than a threshold OR the error ratio of bad responses (http response > 500 for now) to all requests is above a threshold then the service is considered outside its SLO - we want the service owners to set the thresholds.
So we have both queries and can set limts in the widgets but want this super simple and combine the queries.
Hey @feidhlim.oneill - I’m not 100% sure what you’re looking for is possible - certainly you can have a dashboard with multiple widgets (Response Times, Error Rate, etc…)
So long as your query returns a single numeric, you can set a threshold on it. Those thresholds in the UI, when breached will show in yellow or red depending on the threshold it breached.
In theory a query like below could return the data you need:
SELECT average(duration), percentage(count(*), WHERE httpResponseCode != '200' ) FROM Transaction SINCE 2 days ago
BUT - with that you are returning 2 metrics in one query, which is not supported for thresholds that lead to the coloured numbers
You’d need to break the query up into separate widgets, alternatively you could come up with a query to return the data in a single numeric value, but I’m struggling to think of an appropriate query for that.
Hopefully others in the community may be able to help with that.
thanks - will look at that.
You’re welcome! Let us know how it goes
I’ve worked out the following gives us error ratio as a Percentage.
SELECT filter(count(), WHERE httpResponseCode >= ‘500’)/count() * 100 AS PERCENTAGE FROM Transaction where appName = ‘service1’ SINCE 24 hours ago TIMESERIES
I also need the ‘slow response time’ ratio
How do I get the ratio of slow transactions for service1 (where slow means > .3 or something) to all transactions on the 99th percentile - so ignoring the slowest 1%?
Is that possible?