@alicegoldfuss of GitHub gave a great talk at Monitorama. You should watch her talk Martyrs on Film: Learning to Hate the #OnCallSelfie right now! She shares her experiences of being on-call and how you can determine if there is something wrong with your company (and career).
Goldfuss provides two things you can do today to improve your On-Call rotation:
1) Notification Cleanup
Here is her definition of actionable alerts:
She says “If they don’t meet ALL four of these criteria, I don’t give a sh*t! That is not an actionable alert. Send me an email, don’t wake me up!”
2) Put Devs On Call
Devs who created the problem will more know how to fix it more quickly. If Devs start to feel the pain, many problems will not make it to Production. This is not easy to do, but what you can do to improve a bad on-call situation.
If you don’t improve the on-call situation, you or your employees will burn out. You can leave, or worse, stay and do mediocre work. As Goldfuss points out, no one puts “on-call for three years” on their resume. Make sure you carve out time to work on interesting projects. Start with figuring out how to make on-call less awful. Start by making your New Relic Alert policies consistent: read Alert Preferences are the Key to Consistent Alert Notifications, and you’ll be on your way.