Your data. Anywhere you go.

New Relic for iOS or Android


Download on the App Store    Android App on Google play


New Relic Insights App for iOS


Download on the App Store


Learn more

Close icon

March 2019 Coffee Chat: An AMA with a New Relic Software Engineer

coffee-chat

#1

March Coffee Chat :coffee:

We are excited this month to host our first AMA (Ask Me Anything) in our #coffee-chat slack channel! In honor of our Spring Cleaning initiative, let’s start fresh and tackle anything you have been wondering or wanna chat with our community about. This month our Coffee Chat will feature our own Jan Urbański—a Principal Software Engineer—and ask any and all New Relic questions. This will be happening on a European timezone so the one-and-only @RyanVeitch can host us.

What can you expect from a Coffee Chat ? It’s an entire hour dedicated to learning from New Relic experts! We move a conversation topic to slack and you can pepper our willing expert with questions in real time. I then take that conversation and publish it here, in the community! (Check out this one , and this one !)

Coffee Chats are a great opportunity to learn something new, share knowledge with each other in a casual setting, and share a virtual cup of coffee or tea with other New Relic Explorers.

Coffee chat with Jan Urbański - New Relic AMA

  • Topic: New Relic AMA Coffee Chat
  • Date: Wednesday, March 20, 2019
  • Time: 4 AM Pacific, 2PM CET, 1PM UTC
  • Location Coffee Chat Slack Instance (#coffee-chat)

How to join the Coffee Chat Slack Channel

To get invited to our New Relic Users Slack workspace follow the instructions below:

  1. Head over to the sign up page: https://newrelicusers-signup.herokuapp.com/
  2. Enter in your email address. Pretty soon you’ll receive an invite in your email inbox from Slack to join the New Relic Users workspace.
  3. Once you’re in the correct workspace, either through your browser or the Slack app you can navigate to the #coffee-chat channel . You’re welcome to also come to #introduce-yourself to tell us a bit about you, and what your New Relic specialty subjects are . At the time the Coffee Chat is due to begin, come on into the #coffee-chat channel to ask your questions.

Note that this workspace isn’t monitored outside of scheduled Coffee Chats. If you do have questions outside of the timeframe of the scheduled coffee chats, please post that question on the Explorers Hub.

See you on the 20th for some nerd-ing out on New Relic! :nerd_face:


#2

:coffee: :coffee: :coffee:

AMA Coffee Chat with Principal Software Engineer: Jan Urbański

March 20, 2019 conversation from #coffee-chat slack channel, New Relic Users.

@RyanVeitch Alrighty! It’s 1pm where I am, it’s 2pm where @Jan Urbański is, let’s start some real talk and forget timezones for now!

Jan Urbański [6:00 AM]
:confetti_ball:

Ryan [6:01 AM]
I can get started with a question for you, Jan.

As an engineer I can often see involved with resolving service interruptions during EMEA hours, what advice do you have for ensuring Alert policies and notification channels are set up so that you only get woken up at 3am when you absolutely need to be woken up?

Jan Urbański [6:03 AM]
Ha! Still a tiny bit about timezones, isn’t it? :wink: But to answer truthfully, fine-tuning alert policies is an iterative process.
the important part is every time you get woken up at 3am, to revisit this the next day and decide: was this alert going to the correct policy? should I have been woken up?
we typically set up a policy that’s associated with a wake-me-up notification channel, and another that has a just-nag-me channel
sometimes we move conditions between the two, if we deem a certain condition is not critical enough to go to the wake-me-up policy

Ryan [6:05 AM]
I like the advice of a warning channel that you can check when you start working, but an absolutely critical channel that is set up to wake you if needed.

Jan Urbański [6:07 AM]
another idea is to have a slight delay between alert happening and page going off, but sending the notification to slack immediately
oftentimes I would see the slack message and ack the alert before a US-based teammate would be paged
(and they’d do the same for me)

Ryan [6:07 AM]
Team work makes the dream work… Right?

I guess a good question to begin with would have been - Jan, who are you and what is your day to day job at New Relic? I might have jumped in to ask the alerting question a little too early. :smile:

Jan Urbański [6:08 AM]
haha, true :wink:
hey there, everybody thanks for coming! I’m Jan, and I’m a Principal Enginner at New Relic, working mostly with the Core Data Platform (i.e. backend stuff)
right now I’m on a team responsible for processing metric timeslice data sent in by APM agents, as well as Browser, Mobile and Plugins

Jan Urbański [6:10 AM]
and I’m based in Warsaw, Poland, which explains why I’m here at this time of day :wink:

Ryan [6:10 AM]
We need to get you a visit to the Dublin office some time

Jan Urbański [6:10 AM]
I’d love to do that!

Warren H. [6:11 AM]
Hi @Jan Urbański I’ve got some questions about the Mobile platform.

Jan Urbański [6:11 AM]
hi @Warren H., sure thing, bring it on

Warren H. [6:13 AM]
We have been using APM for a long time on our backend services. We are relatively new to utilizing NR Mobile. I am often asked about the concerancy of data being reported by mobile clients wrt to data from the client being delivered to NR.
What happens in cases where let’s say network connectivity of the client may disrupt Mobile events being reported. Is this done when connectivity is restored and if so how does it affect timeseries reporting for example?

Jan Urbański [6:16 AM]
I don’t have deep knowledge of how the Mobile agent works - when my systems see the data, it’s already inside our network, but AFAIK the Mobile agent deals with outages similarily to how the APM agents do - data gets buffered and aggregated for some time and when connectivity is restored, it gets sent up to NR
this is similar to what happens if an app instrumented with the APM agent loses Internet connectivity, the agent will hold on to the data for some time, retrying the connection to NR
in order to keep memory under control, the agent won’t keep raw data, but instead will keep aggregating new data into its existing buffer, so the timeseries reported might have “flat” regions once connectivity is back up

Ryan [6:17 AM]
I believe there’s a 5 minute buffer for Mobile data

Jan Urbański [6:19 AM]
important to note that this aggregation retains things like max and min, so even if the averages look flat, you still get precise min and max measurements

Ryan [6:19 AM]
As an aside, since there’s no network connectivity at that time, network performance data isn’t kept in the buffer while the app is offline.

mikeburroughs1 [6:19 AM]
I actually have a question as well with regards to Synthetics. Maybe it’s more of a feature request. Currently, it appears to be a Curl request. Is there any plan in the future to add ICMP into Synthetics? And similarly to that, a more full-featured SNMP implementation than the current GitHub version into NR Infrastructure? Sorry, I know that’s a lot to unpack

Jan Urbański [6:20 AM]
does that answer your question @Warren H.?

Jan Urbański [6:21 AM]
@mikeburroughs1 well, that’s more of a product question, but I’d wager Synthetics is going to keep being a HTTP monitoring solution, with SNMP monitoring being done via NR Infrasturcture

Warren H. [6:21 AM]
Thanks @Jan Urbański and @Ryan. I think I might need further clarification on the 5 minute buffer.

Jan Urbański [6:22 AM]
I don’t know enough about the Infrasturcture team’s roadmap to know if they’re planning on building out a more featureful SNMP implementation

Ryan [6:23 AM]
As far as I know, this is what we have with regards to SNMP: https://github.com/newrelic/nri-snmp

mikeburroughs1 [6:23 AM]
@Warren H. I think what they’re trying to say is that the device that’s failing to connect will cache the analytics for up to 5 minutes while looking for the connection, then once it does, it sends all the data at once. But the manifest likely includes time codes so it doesn’t look like a spike

Jan Urbański [6:23 AM]
@Warren H. yup, that’s more or less it

mikeburroughs1 [6:25 AM]
Thanks on the Github link. That’s the one I’m playing with. It’s great, I’m just worried that it won’t evolve much as it seems to be more of a community effort. I’d eventually like to drop Zabbix in the future if I can

Warren H. [6:25 AM]
Great. So for tracking HTTP Error responses the requests over within that 5 minutes will be registered with the timestamp of the event and not all together. And beyond 5 minutes? Say using the app in a remote area. Is that data not transmmitted (lost) or aggregated in some way?

[threaded conversation: Ryan [2 hours ago]
This is a rolling 5 minute buffer, the older data gets thrown out to avoid memory exhaustion, and the latest 5 minutes of data is cached.

Laura Dickey [2 hours ago]
Even for custom events? We use a lot of custom events in our app to track app activity, and our app can often be used while offline…

Warren H. [2 hours ago]
@Ryan ok that’s clear. thanks.

Jan Urbański [1 hour ago]
Yes, even for custom events - the agent tries very hard to avoid creating too much memory pressure on the host device, so the buffer is intentionally kept small.

Jan Urbański [1 hour ago]
@Laura Dickey in your case, you could keep the events in memory and use the agent API to send them out when connectivity is restored

mikeburroughs1 [1 hour ago]
Would you be able to maintain the integrity of the timecodes while doing that?

Jan Urbański [1 hour ago]
it’s more manual work, but the guiding principle for our agents is “first, do no harm”

Ryan [1 hour ago]
With all that said, events are a little different too, rather than aggregate data, there’s just a limit to the number of stored events, which is 1000 by default.
Here’s the doc for Android, though this is similar for iOS and the docs are available for that.
https://docs.newrelic.com/docs/mobile-monitoring/new-relic-mobile-android/android-sdk-api/set-max-event-pool-size
docs.newrelic.com
setMaxEventPoolSize (Android SDK API) | New Relic Documentation
New Relic API for Android mobile app monitoring: set maximum event pool size.

Jan Urbański [1 hour ago]
@mikeburroughs1 good question and if using the agent’s API, I think your events will all be timestamped to current time

Jan Urbański [1 hour ago]
the Custom Event API allows you to backdate events: https://docs.newrelic.com/docs/insights/insights-data-sources/custom-data/send-custom-events-event-api
docs.newrelic.com
Send custom events with the Event API | New Relic Documentation
To insert custom events into New Relic Insights, use a JSON POST command and a standard HTTP request to decorate events with custom attributes.]

mikeburroughs1 [6:26 AM]
@Warren H. just out of curiosity, what kind of metrics are you expecting to get from a device that can’t connect to WAN?

Tim [6:30 AM]
Anyone here have any knowledge on getting the on host integration working for Kafka - we have it deployed but having trouble getting the consumer/producer data portion of the integration - as in we’re not seeing any of that data show up on the default dashboard that comes with the integration

Jan Urbański [6:33 AM]
@Tim hard to tell what might be going wrong: enabling the Infrastructure agent’s verbose logging would be a good start
https://docs.newrelic.com/docs/infrastructure/new-relic-infrastructure/troubleshooting/generate-logs-troubleshooting-infrastructure
docs.newrelic.com
Generate logs for troubleshooting (Infrastructure) | New Relic Documentation
Enable verbose logging for the Infrastructure agent, then collect about 3 to 5 minutes worth of data to help troubleshoot.

Ryan [6:35 AM]
This may not be the solution to your problems @Tim, but we have seen others with a similar issue where it was discovered that an access issue from NR Infra - Java: Kafka integration - issue related with fetching consumer/producer metrics over JMX (edited)

Tim [6:35 AM]
we do have a ticket open and we’ve been trying to work through it with support, but it’s kind of difficult to go back and forth with emails…would be nice to be able to have a call with someone…or at least would be nice to have a known working example of a config file to look at - I’m not a kafka expert, but have been tasked with deploying on host integrations - the kafka integration has been the most difficult

mikeburroughs1 [6:42 AM]
This one is more for Ryan as it’s not an immediate software or dev issue: I’m currently in the process of building a platform that would require tight NR integration as part of the NR partner program. When I get to the point where I need technical assistance, is there a resource that I could reach out to? I’m happy to discuss further in a PM

Ryan [6:43 AM]
I’m not 100% sure of who internally would be the best person, but if you can share more details in a DM thread I can do some sleuthing to find the right people for you to talk to @mikeburroughs1 :smile:

mikeburroughs1 [6:43 AM]
cool

Ryan [6:44 AM]
I’ve got another question for you @Jan Urbański… New Relic APM Agents use different methods of naming transactions, is there a simplified explanation of how this happens?, but from a more broad perspective than focussed on one agent, more so the general naming philosophy all agents follow.

Jan Urbański [6:46 AM]
right, so transactions are named to balance giving as much detail as possible, plus being as reckognisable to the end user as possible vs grouping together related operations in an application
for a concrete example, when I see a transaction in NR, I’d like to quickly be able to relate it to my app’s code
when a transaction is called Controller/users/signup and I see it’s taking 2x the time it used to, I immediately know where to look in my Rails app
that’s the ideal we’re shooting for
at the highest level, we divide transactions into “web” and “background” which is why you see that dropdown in APM allowing you to switch between them
after that, it gets a bit agent-specific, because of that concern I mentioned - wanting to look as natural as possible to the end user; a Rails person will understand what Controller is, so that’s why the Ruby agent names transactions like that

Ryan [6:51 AM]
This is awesome! Thanks Jan. Of course Transaction naming isn’t perfect, we often see metric grouping issues, but the general philosophy is sound - and groupings can be resolved :smile:

Lar [6:52 AM]
I left for a meeting with our Director, so if we’re all in agreement on time zones now :slightly_smiling_face: i have a question about the Infrastructure on-host integration for Kubernetes

Jan Urbański [6:53 AM]
@Lar I’m all ears :slightly_smiling_face:

Lar [6:56 AM]
We are running OpenShift and we already had newrelic-infra running on each OpenShift node. We installed the prerequisite kube-state-metrics and then the playbook to do the integration…
That install created a pod for nr-kubernetes AND a pod for newrelic infra…
The pod with newrelic-infra in it can’t see everything on the OpenShift host, so it alerts on all kinds of things, because it can’t see outside its pod…
Should we be running newrelic-infra in a pod or just on the VM?

Lar [6:59 AM]
Or both? (Our architect thinks we should run both, but I think he’s smoking weed.)

mikeburroughs1 [7:00 AM]
man, some of the best engineers and architects are :slightly_smiling_face:

Lar [7:00 AM]
lol

Jan Urbański [7:01 AM]
the k8s infra agent integration reports data from kube-state-metrics
so your K8sPodSample events should contain data about all the pods… right?

Lar [7:01 AM]
yes
sort of
(i’m clicking madly in another window)
I see most things on the integration page, but “Namespaces per Cluster” is blank
this is also sparse on details…

Jan Urbański [7:06 AM]
ok, you did set CLUSTER_NAME in the yaml file, right?

Lar [7:06 AM]

I am going to say yes, but with the caveat that our SRE team did not do the actual installation.

Jan Urbański [7:07 AM]
you can also go to Insights and check for K8sNamespaceSample events, AFAIK they should be visible there

Lar [7:07 AM]
We had to pass that to the OpenShift team

Jan Urbański [7:08 AM]
if they’re not, the agent is not grabbing namespace data correctly (again, verbose logs usually help :wink:

Lar [7:08 AM]
No events found in Insights
Well.
I’ll go loom over the desk of an OpenShift guy and tell him that you sent me.

Ryan [7:09 AM]
It looks like if we’ll need logs, we may be better handling this with the Support team. - if you can’t get anywhere with the openshift team, feel free to get in touch with our support team

Lar [7:10 AM]
I have a case open with Support.

Ryan [7:10 AM]
Oh great :smile:

Lar [7:10 AM]
I loom everywhere.

Ryan [7:10 AM]
We are over time, so we’ll wrap it up there. Thanks so much for joining us @Jan Urbański - I’m sure everyone has appreciated your time here.

Lar [7:10 AM]
Thanks, guys!

Jan Urbański [7:11 AM]
welp, it is over time! thanks, everyone, for your questions
it’s been great chatting with y’all!

Ryan [7:11 AM]
:smile: If anyone else has questions, feel free to post over on https://discuss.newrelic.com - our Support experts lurk in the forum and can help out there.
discuss.newrelic.com
New Relic Explorers Hub
Get help using New Relic products and discuss application performance in our Community Forum

mikeburroughs1 [7:12 AM]
thanks, Jan!

Jan Urbański [7:12 AM]
we, too, loom everywhere :smile:

Ashley Pinner [7:15 AM]

We are over time
@Ryan Depending on your timezone? :wink:

Ryan [7:15 AM]
It always comes back to timezones!!

Ashley Pinner [7:22 AM]
timezones are very handy for working out who to prod about an event :slightly_smiling_face:

MonitoringLife [7:28 AM]
speaking of, was there any movement on auto setting the TZ for the NR accounts?
I believe they (at least use to) defaulted to pst

Lar [7:30 AM]
For those of you following along at home, CLUSTER_NAME was set in the yaml.

Thanks to everyone who participated in our AMA with Jan! We will have many more Coffee Chats so be sure to join us next month!