Your data. Anywhere you go.

New Relic for iOS or Android


Download on the App Store    Android App on Google play


New Relic Insights App for iOS


Download on the App Store


Learn more

Close icon

Thread profiler not collecting app code while using gunicorn

threadprofiler
gunicorn

#1

It only get 100% into the gunicorn/workers/sync.py.@wait#35 :35 process:

Is there a way to collect the right data, inside the app code?
(I’m not using gevent)


#2

@rodolfo At first glance it looks like you aren’t getting much traffic. Is that right? If you push more traffic through you might get more detail. If you push through more traffic and you still aren’t seeing the level of detail you expect, could you send me a permalink to your application and the thread profile you are having trouble with?


#3

I just get it with a thread profiler for 5,375 samples:

https://rpm.newrelic.com/accounts/910358/applications/38555243/profiles/3586275


#4

@rodolfo It looks like pushing through more traffic is working. I see the time spent in sync.py.accept (the last line in your screenshot) has increased to 6.3% from 0.1%. However, it looks like your workers are still sitting idle for most of the time (about 93% of the time, during that thread profile session). As most of the time the workers are sitting idle, you’ll see this reflected in that stack (93% of the time this is where the code is executing). The thread profiler creates this stack from samples taken over time in aggregate, so to see a fuller stack, you’ll really want to throw a lot more traffic at it.


#5

So I get the same behavior (100% time in bootstrap) and my website gets about 20M hits a day. I made 4 attempts to get this working, the last I switched to gevent. It seems to be instrumenting just the ‘accept’ thread.

Please let me know if there is any config to get this working. I’m using gunicorn, python 2.7, Django. Thanks!


#7

So after investigating, the profiler is definitely profiling the dispatch process vs the workers which are actually doing the work.


#8

@andrew.tso if you’re still having trouble with this, could you share a permalink for us to take a look?


#9

Is this what you mean?

https://rpm.newrelic.com/accounts/2344186/applications/160568972/profiles/4502346
https://rpm.newrelic.com/accounts/2344186/applications/160568972/profiles/4502339
https://rpm.newrelic.com/accounts/2344186/applications/160568972/profiles/4502342
https://rpm.newrelic.com/accounts/2344186/applications/160568972/profiles/4502343


#10

Yup, thanks! I discussed this with the Python agent engineering team. We were able to reproduce this for a sample gevent app. This behavior comes from how the Thread profiler tool works.

It works by periodically (100ms) capturing the stack trace of each thread for a specified duration. At the end of the specified duration, the stack traces are aggregated to build a tree.

Although the call tree cannot capture the entire execution, a large enough sample can be a good representation of the application behavior. This provides insights into the “hot” functions of the app where most of the time is spent.

The stack traces taken during sampling show this thread spends most of the time in the framework code, and the agent didn’t find any application specific code in those stack traces.


#11

Hi Bruce, the profiles I linked above were on a webserver servicing about 1,000 requests per minute for 5-10 minutes, at 70% cpu load. I’m pretty sure, the thread profiler is profiling the parent/dispatch process and NOT the child worker processes that are forked. The child processes are the ones actually servicing web requests. Gunicorn forks n workers with the --workers flag ex: gunicorn --workers 3 --worker-class gevent (also doesn't work for gthread).

If you look closely at the stack trace and compare it against the gunicorn source, it’s definitely not profiling the worker processes.


#12

I brought this to the Python agent engineering team. The stacks are obscured through the use of greenlet. The greenlet API doesn’t have an effective way of sharing these hidden stacks with the thread profiler, so the thread profiler is unable to support greenlet usage. Only the top levels of the gevent stack remain unchanged and visible to the thread profiler.


#13

Hi Bruce, how can we get a thread profile using python+django? We are trialing specifically for this feature which we have a few days left, but haven’t got it to work. I also have not able to get this to work in gthread mode. Is there any configuration that you know of that will get us a useful trace?


#14

I shared this with the Python agent engineering team. Looking at this example from earlier, the python+django monitoring seems to work:

They noted that the threads spend a lot of time waiting. To increase the likelihood of the threads having a fuller stack when the thread profiler takes a sample, the threads need more work. That can be done by provisioning less threads, or giving the app more work.

The thread profiler should work with gthread, as that uses posix threads instead of greenlets. If you think there is an error, could you share an example profiling session?


#15

Hi Bruce our servers are under heavy load, our site gets a about 20M requests / day (look at our dashboard), with cpu at 80%, which is why we’re trying to profile it. The new relic profiling snapshot tells us we’re in 93% wait and doing 0 work over 10 minutes.

Here’s a profile from today.
https://rpm.newrelic.com/accounts/2344186/applications/160568972/profiles/4502428

Our 14 day trial ends today, so we’re in awkward spot. We’ve run a thread profiler in dev, this is definitely not working.


#16

Thanks for that example.

Looking through, I do see one worker in 6.8% of the stack trace samples. The 80% CPU usage in APM would probably be one worker using one CPU core, not the entire CPU being used. More info is in the doc APM CPU metric. With 93% waiting time, the thread profiler suggests this app could use less threads to run this app and save some memory.

The Python agent team was able to reproduce this. This comes from over-provisioning threads, obscuring the working threads with the waiting threads. As more waiting threads are added, they make up a larger percentage of the sampled stack traces, making the percentage of working threads smaller. Any entry that is smaller than 0.05% is dropped from the thread profiler, as could be the case here. Said in a different way from my last message, provisioning less threads would give a bigger percentage to the working threads, and increase the likelihood of their stack traces showing within the thread profiler.


#17

Bruce, if you look at my example, we’re configured with gevents which is event based and doesn’t use threads, so what you are saying doesn’t really make sense.


#18

Sorry, I didn’t check that since we were talking about gthread. Could you share where that 80% CPU usage stat comes from? I’m not seeing that in the APM overview during the thread profiling session.

With that, I’ve addressed the thread profiler’s functionality for both gevent and gthread, with a summary post coming up. Please let me know of any questions.


#19

Summary For Thread Profiling Gunicorn using gthreads or gevent

When using either of these workers, these are the inherited constraints:

gevent

You can try out adding this to the newrelic.ini, but this is not supported:
debug.enable_coroutine_profiling = true

gthread