Your data. Anywhere you go.

New Relic for iOS or Android

Download on the App Store    Android App on Google play

New Relic Insights App for iOS

Download on the App Store

Learn more

Close icon

Thread profiler not collecting app code while using gunicorn



It only get 100% into the gunicorn/workers/ :35 process:

Is there a way to collect the right data, inside the app code?
(I’m not using gevent)


@rodolfo At first glance it looks like you aren’t getting much traffic. Is that right? If you push more traffic through you might get more detail. If you push through more traffic and you still aren’t seeing the level of detail you expect, could you send me a permalink to your application and the thread profile you are having trouble with?


I just get it with a thread profiler for 5,375 samples:


@rodolfo It looks like pushing through more traffic is working. I see the time spent in (the last line in your screenshot) has increased to 6.3% from 0.1%. However, it looks like your workers are still sitting idle for most of the time (about 93% of the time, during that thread profile session). As most of the time the workers are sitting idle, you’ll see this reflected in that stack (93% of the time this is where the code is executing). The thread profiler creates this stack from samples taken over time in aggregate, so to see a fuller stack, you’ll really want to throw a lot more traffic at it.


So I get the same behavior (100% time in bootstrap) and my website gets about 20M hits a day. I made 4 attempts to get this working, the last I switched to gevent. It seems to be instrumenting just the ‘accept’ thread.

Please let me know if there is any config to get this working. I’m using gunicorn, python 2.7, Django. Thanks!


So after investigating, the profiler is definitely profiling the dispatch process vs the workers which are actually doing the work.


@andrew.tso if you’re still having trouble with this, could you share a permalink for us to take a look?


Is this what you mean?


Yup, thanks! I discussed this with the Python agent engineering team. We were able to reproduce this for a sample gevent app. This behavior comes from how the Thread profiler tool works.

It works by periodically (100ms) capturing the stack trace of each thread for a specified duration. At the end of the specified duration, the stack traces are aggregated to build a tree.

Although the call tree cannot capture the entire execution, a large enough sample can be a good representation of the application behavior. This provides insights into the “hot” functions of the app where most of the time is spent.

The stack traces taken during sampling show this thread spends most of the time in the framework code, and the agent didn’t find any application specific code in those stack traces.


Hi Bruce, the profiles I linked above were on a webserver servicing about 1,000 requests per minute for 5-10 minutes, at 70% cpu load. I’m pretty sure, the thread profiler is profiling the parent/dispatch process and NOT the child worker processes that are forked. The child processes are the ones actually servicing web requests. Gunicorn forks n workers with the --workers flag ex: gunicorn --workers 3 --worker-class gevent (also doesn't work for gthread).

If you look closely at the stack trace and compare it against the gunicorn source, it’s definitely not profiling the worker processes.


I brought this to the Python agent engineering team. The stacks are obscured through the use of greenlet. The greenlet API doesn’t have an effective way of sharing these hidden stacks with the thread profiler, so the thread profiler is unable to support greenlet usage. Only the top levels of the gevent stack remain unchanged and visible to the thread profiler.