One we see pretty commonly is that network issues that exist transiently between our infrastructure and your’s can lead to data not being sent up by the agent or not being received in a timely manner. This can set off a cascade of false positive alarms for customers and when you’ve got thousands of servers, owned by dozens of teams, in a handful of countries you’re starting to talk about getting a lot of people out of bed.
Your customers’ apps running in The Cloud rely on dozens of external services: Payment, social media, file hosting, weather APIs, geographic IP lookups, email blasting, and on and on and on.
If there is a network connectivity problem between their servers and yours, then there’s a problem connecting to other services. The lack of data from the agent might be the very first detectable sign that something’s wrong, and I’d rather get woken up for nothing once or twice a year than find out three hours later that our code for retrying credit card authorization doesn’t actually retry and we’ve lost revenue.
On Legacy Alerts, there have been (knock on wood) like three big false alarm storms in the past two years. Pretty sure all of them were during north america business hours, too. If that’s the failure rate in this hypothetical new service, I’d be a pretty damn happy customer.
As it stands now, with no ability to trigger this kind of alert, and an increasingly-abandoned legacy product, I am an incredibly unhappy one.