Is there a way to pull in etcd metrics to make sure our control plane is running correctly? With prometheus we can view if a leader is missing, how often elections are happening, db size, latency between members, etc. I would like to pull these type of metrics into New Relic so that way we don’t need to run Prometheus as well to monitor these components.
I don’t believe we collect etcd metrics. You can find the available metrics that we do collect here:
I’ll go ahead and create a feature request for etcd metrics.
Since ETCD is one of the most crucial parts of running a cluster, monitoring it is very importation to make sure our clusters are healthy. Specifically RAFT metrics (leader, members, elections, etc.), disk IO and latency, and networking latency. Not having this makes it difficult to completely switch to the New Relic Kubernetes integration for Kubernetes monitoring.