Specifying which rows returned by LIMIT

The documentation for LIMIT in NRQL (https://docs.newrelic.com/docs/insights/nrql-new-relic-query-language/nrql-resources/nrql-syntax-components-functions#sel-limit) doesn’t describe how it determines which rows are selected to be included in the results.

My own experimentation (as well as the example) implies that the rows are returned in descending order by timestamp for non-facet queries, but descending order by the first aggregated column for facet queries.

Presumably if I’m interested in the “bottom 50” instead of “top 50” for a particular stat, I’d have to express the aggregation as a negative (e.g. “SELECT max(0 - retries) … FACET customer_id LIMIT 20” to get the customers with the fewest retries instead of the most.

Is this correct, and if so, can we rely on this going forward, or is it undocumented and subject to change?

Not sure I follow the negative number thing. I generally always keep my limit at 1000 and then manage the content with ranges etc. You also need to manage your time range and can use since and until with actual datetime.

Little hidden feature than works sometimes "ORDER BY… ". You can try asc and dsc. Last time I checked it wasn’t in the docs, but it has worked for some time.

2 Likes

Hey @sbroberg! Just checking in to see if @6MM was able to answer all your questions. If you were able to get sorted, please let us know! :blush:

The issue for us is that we’re collecting data from endpoints (several million), but if we’re looking for exceptional cases, we don’t want to see all 2 million, just the most extreme cases - either endpoints exhibiting a very large result for a particular metric, or a very small one.

The docs don’t describe whether LIMIT is deterministic or non-deterministic (which could be the case if the order of the rows being returned was based on how they were encountered when traversing an index, which may vary depending on how NR’s backend optimizer chooses to attack the query).

However, repeated use seems to indicate that LIMIT is deterministic, and is ordering rows from greatest to least according to what I mentioned above (first column for FACET, timestamp for everything else).

Since with a FACET result, sometimes you may consider the extreme case to be the smallest (for example, you might want to find the endpoints that had the least connection_time to the server instead of the most). Since LIMIT will give you the largest connection_times first, you’ll never be able to see who the smallest ones are unless you slog through 2 million rows to get to the end.

Making the results negative inverts this - a customer with -1 seconds of connection time is larger than one with -10000 seconds, so LIMIT now returns you the customers with the least connection time.

Anyway, I was talking to some NR tech folks last week and they confirmed this behavior for me (as well as confirming the “negative trick” is used by them as well, although they use X * -1 instead of 0 - X).

1 Like

Such great info—thanks for sharing, @sbroberg! Sounds like you have everything you need right now? Let us know how else we can help! :blush: