Docker metrics not report correctly for Google Container Engine

Please paste the permalink to the page in question below:

https://rpm.newrelic.com/accounts/1000653/servers/16457569/virtualizations?tw[end]=1459756318&tw[start]=1459754518

Please share your agent and other relevant versions below:

newrelic-sysmond-2.3.0.129

Please share your question/describe your issue below. Include any screenshots that may help us understand your question:

I follow the instructions on this page. The docker menu is already show on the left menu, but it doesn’t show CPU or Memory report. I already check on each GKE node that cgroup for memory is already enabled, and the path to cgroup root is at default location (/sys/fs/cgroup).

Running command "docker stats " on node show CPU and Memory usage correctly.

I have the same issue on Google Container Engine. Nodes have Ubuntu 14.04, nrsysmond version 2.3.0.129 and Docker version 1.9.1, build a34a1d5.

newrelic user has been added to the docker group and I see the containers in New Relic but no stats. docker stats works on the nodes.

Hi @chaiwat, @alan.hartless,

Could you check the result for the following commands and let us know:

cat /sys/fs/cgroup/cpuacct/docker/$CONTAINER_ID/cpuacct.stat

and

cat /sys/fs/cgroup/memory/docker/$CONTAINER_ID/memory.stat

Note: you will need to replace $CONTAINER_ID with the real IDs for your containers.

Thanks,

Interesting! There are no /sys/fs/cgroup/cpuacct/docker/$CONTAINER_ID/ and /sys/fs/cgroup/memory/docker/$CONTAINER_ID/ but the path are /sys/fs/cgroup/cpuacct/$CONTAINER_ID/ and /sys/fs/cgroup/memory/$CONTAINER_ID/

root@gke-dev-biller-receipt-containers-cf5e76ad-node-yubz:~# cat /sys/fs/cgroup/cpuacct/4ba181705796d732f3096c471277630f9468d4564fffd24b645563f2950b3913/cpuacct.stat
user 2606
system 346

root@gke-dev-biller-receipt-containers-cf5e76ad-node-yubz:~# cat /sys/fs/cgroup/memory/4ba181705796d732f3096c471277630f9468d4564fffd24b645563f2950b3913/memory.stat
cache 150765568
rss 289710080
rss_huge 0
mapped_file 25214976
writeback 0
swap 0
pgpgin 155340
pgpgout 47802
pgfault 115273
pgmajfault 355
inactive_anon 16384
active_anon 289804288
inactive_file 52289536
active_file 98365440
unevictable 0
hierarchical_memory_limit 18446744073709551615
hierarchical_memsw_limit 18446744073709551615
total_cache 150765568
total_rss 289710080
total_rss_huge 0
total_mapped_file 25214976
total_writeback 0
total_swap 0
total_pgpgin 155340
total_pgpgout 47802
total_pgfault 115273
total_pgmajfault 355
total_inactive_anon 16384
total_active_anon 289804288
total_inactive_file 52289536
total_active_file 98365440
total_unevictable 0

Same here, there is no docker directory in /sys/fs/cgroup/cpuacct. Not sure if $CONTAINER_ID is supposed to match what’s given by docker ps or not (sorry; relative noob to docker) but as with chaiwat, I found the file with an extended file name (docker ps gives 441ab57f5ecf)

/sys/fs/cgroup/cpuacct/441ab57f5ecfd1599f91cd6a968c6b44ec9092af1b8899218f737dc17deef92ta and in /sys/fs/cgroup/memory/441ab57f5ecfd1599f91cd6a968c6b44ec9092af1b8899218f737dc17deef92ta

cat /sys/fs/cgroup/cpuacct/441ab57f5ecfd1599f91cd6a968c6b44ec9092af1b8899218f737dc17deef92a/cpuacct.stat user 2 system 2

cat /sys/fs/cgroup/memory/441ab57f5ecfd1599f91cd6a968c6b44ec9092af1b8899218f737dc17deef92a/memory.stat cache 8192 rss 1650688 rss_huge 0 mapped_file 0 writeback 0 pgpgin 519 pgpgout 114 pgfault 530 pgmajfault 0 inactive_anon 4096 active_anon 1654784 inactive_file 0 active_file 0 unevictable 0 hierarchical_memory_limit 18446744073709551615 total_cache 8192 total_rss 1650688 total_rss_huge 0 total_mapped_file 0 total_writeback 0 total_pgpgin 519 total_pgpgout 114 total_pgfault 530 total_pgmajfault 0 total_inactive_anon 4096 total_active_anon 1654784 total_inactive_file 0 total_active_file 0 total_unevictable 0

I inquired about this with the Kubernetes guys and they stated that the stats will always be in /sys/fs/cgroup/cpu/$CONTAINER_ID/cpu.shares.

Is this something NR will plan to fix in order to support Google Container?

@alan.hartless, @chaiwat

Google Container Engine’s implementation of Docker isn’t currently supported by the LSM. This is because of the difference in the path to the .stat files. I have heard of folks using custom cgroups in their docker run command to revert the GCE behavior to default, and get the LSM working, but I have yet to see it in action. This might be worth investigating.

I have submitted support for GCE as a feature request (for both of you) to our product management team to be considered for a future release.

If it is implemented, you will be notified. While we can’t guarantee when or if this feature will be implemented, we take customer requests very seriously and use them to prioritize which features we implement next. Thanks for helping us improve the product!

2 Likes

Thanks for the update and submitting the feature request for us. Hopefully support gets implemented :slight_smile: I would imagine Google’s container engine is growing in popularity.

1 Like

@mmayo Seems the kubernetes team is willing to work with you guys to get the agent to support GCE. https://github.com/kubernetes/kubernetes/issues/24418#issuecomment-211527905

My files on the right position.
But does cpuacct.stat shows enought information? Is that a problem?

/sys/fs/cgroup/cpuacct/docker/16524216ebd71a0edc1b7baf744ebe518ebccbd7dfe47d7e4588780cb7d4a552/cpuacct.stat
user 5959
system 1983

/sys/fs/cgroup/memory/docker/16524216ebd71a0edc1b7baf744ebe518ebccbd7dfe47d7e4588780cb7d4a552/memory.stat
cache 42188800
rss 384827392
rss_huge 322961408
mapped_file 4374528
dirty 20480
writeback 0
pgpgin 123339
pgpgout 98803
pgfault 115457
pgmajfault 49
inactive_anon 24576
active_anon 384872448
inactive_file 24023040
active_file 18096128
unevictable 0
hierarchical_memory_limit 9223372036854771712
total_cache 42188800
total_rss 384827392
total_rss_huge 322961408
total_mapped_file 4374528
total_dirty 20480
total_writeback 0
total_pgpgin 123339
total_pgpgout 98803
total_pgfault 115457
total_pgmajfault 49
total_inactive_anon 24576
total_active_anon 384872448
total_inactive_file 24023040
total_active_file 18096128
total_unevictable 0

Are you running on GKE? Which version of your nodes? I’m using 1.2.1 but planned to upgrade nodes to 1.2.2 over this weekend.

Any update on this issue? I just upgrade GKE node to the latest version (1.2.3) and also install the latest new relic agent. Still can’t monitor my docker instances.

@chaiwat we do not have any further updates regarding support for Google’s Container Engine at this time. However, I’ve submitted another feature request on your behalf for support of this platform just to keep our product team aware of this need. We don’t have any timelines for when or if support will be implemented, but if we do add support for GCE, we’ll send you a notification.

Thanks for voicing your continued interest in our product on GCE!

1 Like

Looking forward to NewRelic working correctly on GKE :+1:

@JeanMertz I’ve added you to the feature request! :smiley:

Is it just because GKE node has difference path in /sys that New Relic agent wanna check/monitor?

Could you just allow us to customize the CPU/Memory monitoring path in the config file?

@chaiwat

The difference in cgroup path is the reason Google Container Engine is not compatible with LSM. Currently customizing the cgroup style is not an option, though it is a possible way to address the issue. The feature request I submitted for you outlines the reasons monitoring doesn’t work, and does include having the user set the cgroup path as a possible solution.

any updates here? Has New Relic fixed the problem?

No news right now, @dimitrisredlink! Thanks for checking in—I will be sure and pass your need for this along to our product team. :thumbsup:

Any updates now? I am anxious to this. Right now I know that Stackdriver has the correct info, but I really like new relic.