Infrastructure is a great product for monitoring all aspects of your servers. However, insuring that you set up your alert conditions properly is the key to getting notified when something is going wrong.
Let’s take disk alert conditions. Sometimes these alert conditions get set up to monitor disk fullness, but then the alert condition doesn’t open a violation when a disk on one of the targeted hosts violates the threshold.
Why is this?
It can be important, when setting up disk alert conditions, to make sure that you’re targeting Storage Metrics, not System Metrics, depending on your needs.
System Metrics focuses on overall system health. If you target this in your disk fullness alert condition, the condition will never fire unless all the storage capacity on a given host combined reaches the specified threshold.
Here’s an example of what I mean. Let’s say you have a host containing 3 disks: an 80GB disk and two 10GB disks. If you’re using System Metrics for your disk fullness alert condition, our alerts evaluation system will consider this as a 100GB system. If your threshold was set to alert you if disk fullness goes over 90%, 90GB of disk space would have to be getting used before that alert condition would open a violation. Even if both of the 10GB disks were completely full, the alert condition would not open a violation until the 80GB disk was 70GB full.
OK, so how should I set up my disk fullness alert conditions?
If you scope your alert condition to Storage Metrics instead, you will be able to filter the list of hosts that’s being targeted, and you will also be able to filter on every single disk attached to that list of hosts. An alert condition scoped to Storage Metrics will then open a violation whenever any single disk violates the threshold you specify.
So I should always use Storage Metrics for disk alert conditions?
It depends on your use case. It can be useful to know about host-level disk performance – that’s when you use System Metrics. At other times, you will want to scope to individual disks using Storage Metrics. Regardless of what your use case is, now you know the difference and can set up alert conditions on your account to better suit your needs.