Is there a way to monitor the control plane components such as the api server, controller manager, coredns, scheduler, kubelets, calico. I would like to pull in the metrics that are provided by their metrics (prometheus) endpoint but in New Relic so we can make sure we have a healthy control plane. Right now we have to use prometheus along with new relic to get these metrics.
Hey @Mitchell.Maler our integration should already pull most (if not all) of this data. Though, I’m unsure about Calico?
However, our integration does indeed pull a lot of the data you’re requesting:
Could you let us know if you’ve tried out the integration yet? And if so, what seems to be missing?
I have been running the Kubernetes integration. It does pull info about pods running, their cpu, memory, host process network usage, etc. but the control plane components give off a lot of other metrics that is not part of the kube-state-metrics. I am proposing that the control plane metrics endpoints can be used to pull this data. Just knowing that the pods/containers are running and using good resources is not enough to know that my cluster is healthy. Most deployed system components give off a lot of different metrics on their health endpoint.
Is this something that can be pulled from
kubectl or is there a specific endpoint that you are referring to that I can look into?
Each control plane component gives off metrics at the /metrics endpoint. I think the difficulty here is these metrics are in Prometheus format and would need to be translated into new relic metrics. Being able to pull all these metrics into New Relic would allow us to know our clusters control plan is healthy. If any of these metrics show issues then it is very important that we can get alerted or view data over time and act. Currently we have to run Prometheus along side of New Relic and manage monitoring on two fonts because today these metrics cannot be pulled into New Relic.
Hey @Mitchell.Maler thanks so much for those specific component endpoints! Super valuable…
Though, I don’t think there’s a current way to collect that data. I’m going to tag this post for a feature idea so our admins can come through and the community can vote on it!