Be careful of NRQL errors breaking metric mutations for "summary" data vs "uniqueCount" data?

NOTE: Read the bottom of this thread for the answer, the first few messages just give the history of the confusion that a bad NRQL and a silent NerdGraph error can cause…

Hi - I’ve not had to play with converting events to metrics before, but as I need longer term access to data I’ve been told to try converting events to metrics.

However I am a bit confused, I followed the examples on https://docs.newrelic.com/docs/accounts/accounts/data-management/introduction-events-metrics-service and created 2 metrics in a query, and the result in nerd graph seemed to be a success.

However when I query unique names in the the Metrics table, I can only see the first metric? Confused, I created another metric by itself, and it too is missing…

As I write this, I just noticed that the 2 missing metrics are both “summary” metrics vs. uniqueCount ones - is it possible the summary metrics are broken - or I need to do something special - however why would NerdGraf report success if I had done something wrong?

I am very confused.

My script looks like the following:

mutation {
  eventsToMetricsCreateRule(rules: {
name: "activity.pageView.* for production platforms",
description:"Created 24/Sep/2020. Used for activity dashboard",
nrql:"FROM PageView SELECT uniqueCount(session) as 'activity.pageView.uniqueCountSession', summary(session) as 'activity.pageView.summary' WHERE appName in ('audit.xxx.co.uk', 'go.xxx.com') FACET appName",
accountId: xxxxx
  })
  {
successes {
  id
  name
  nrql
  enabled
}
failures {
  submitted {
    name
    nrql
    accountId        
  }
  errors {
    reason
    description
  }
}
  }
}

So you don’t have anything with this?

Select count(activity.pageView.summary) from Metric since 1 week ago

No - I get zero.

And if I query the Metric table e.g. “SELECT uniques(metricName) FROM Metric”

I see a list of apm.XXX metrics, newrelic.timeslice.value and only activity.pageView.uniqueCountSession - not activity.pageView.summary, and not another “summary” metric I’ve tried to create?

I am also intrigued by the metric that is in place, as it only appears to have half the data in it - as I’ve run a timeseries side by side with the original event data and my AU events match perfectly, but my UK events are either missing or the last few end days that are present seem to show different values?

In the picture above - where I am showing the AU value, it has the same number in the widget above it, but look at the purple UK values - they are clearly different in the same dataset - but both are using the same mutation - I don’t get it?

I’m seeing all kinds of oddities with NewRelic one to be honest - and I know some of these have been fixed in the background but others like this aren’t filling me with confidence… its like I’m using a different product to everyone else?

Thought it worth adding, that while I can’t see the rule that was supposedly created in the Metrics table, if I try and create it again I get the error:

“A rule has already been created with the input name app.signOff.* for production platforms. Run the command again with a different rule name.”,

Adding more detail to this thread - as I’m now reporting this to support@ as I just don’t understand why these metrics aren’t working.

I am also concerned that for the metric that is working - whether it is working properly. I still can’t explain the missing UK data before Sep 24th (and the wrong data at the beginning of the 24th), and have noticed that over the weekend the metric has continued to gather data for both the UK and AU - and while the AU data exactly matches between my raw events and the metric now been gathered, for the UK it is only following a similar trend between events and metrics (but different numbers), as the time periods being shown are different. The metric seems to get data at :45 minute intervals, whereas the event data is at :47 minute intervals (over the same time period). As mentioned, the AU data matches perfectly…

I think the UK data is probably correct - but given the earlier issues in the data, how can I be sure, and how can I trust that its working and that I’m not getting false results?

Just to update on this thread - there were several things at play that caught me out on this:

  1. The NerdGraph tool doesn’t report errors very well - so when it said it had created my metrics, it actually had created a rule but didn’t report some subtle errors in my nrql queries
  2. Summary queries must have a numeric value - it makes sense when you think about it, but when dealing with “counts” (and you may not be counting numeric data, but simply occurences) - there is a trick in small print - use “summary(1)” so you are summarizing a count of 1
  3. If overlaying data in a line graph, check all the queries (sometimes they aren’t visible, and you need to scroll down) - I had an event query and a metric query, and this is why it looked like my metric data wasn’t comparable - and ALSO, when you create metrics, they don’t convert data back in time - its forwards only (for some reason i had misread this).

Hope this helps someone else in the future, and hopefully NerdGraph can be improved to better report errors, as I think the silent failure is very confusing. (thanks for NR support for helping me diagnose this)

2 Likes

@tim.mackinnon1 Thank you so much for providing such a detailed update! :star_struck: I’m sure it will help community members in the future.