Your data. Anywhere you go.

New Relic for iOS or Android


Download on the App Store    Android App on Google play


New Relic Insights App for iOS


Download on the App Store


Learn more

Close icon

Nightmare on Deploy Street: True Confessions at FutureStack16

futurestack

#1

Halloween is the celebration of all things frightful. And since New Relic’s FutureStack16 comes to San Francisco shortly after the haunted holiday, we thought we’d serve up some scary stories—and the lessons learned (beyond “don’t go into the dark basement”)—to help attendees put their own sites’ issues into perspective.

A seriously scary story

To help get you in the mood, imagine this sinister scenario, adapted from an anonymous company facing a nightmare situation described in Lee Atchison’s new book Architecting for Scale:

“We were wondering how changing a setting on our MySQL database might impact our website performance, but we were worried that the change might cause our production database to fail. Since we didn’t want to bring down production, we decided to make the change to our backup (replica) database instead. After all, it wasn’t being used for anything…”

[Cue eerie music!]

Then, one day, we deployed an update. The production database failed. Not that scary, right? We’ve got a backup database.

The backup database initially did what it was suppose to do. It took over the job of the primary database. Except, it really couldn’t.

[Cue more eerie music!]

The settings on the backup database had wandered so far away from those required by the primary database that it could no longer reliably handle the same traffic load that the primary database handled.

[Cue “witch cackle” sound effects.]

Horrors! The backup database slowly failed … and the site went down.

What’s your story?

In our world, one of the scariest things is a software deployment that goes bad. Combine human fallibility with lots of variables and you will get failure.

But while bad deployments can be a scary nightmare, they can also provide some great lessons. At FutureStack16 in San Francisco, November 16 and 17, we are planning a session called “True Confessions: My Worst Nightmare Deployment.” The idea is to invite New Relic customers to share a deployment that went bad and the lessons they learned from it.

This is designed to be a fun session in the spirit of “we’ve all been there,” but just to make sure there are no unexpected repercussions, presenters will stand behind a screen to preserve their anonymity.

Are you game? Share a bad deployment story for the edification of your peers, and maybe a little closure for yourself. To apply, send an email to community-team@newrelic.com or just tweet at @NewRelic with the hashtag #ScaryDeploy.

Note: Event dates, participants, and topics are subject to change without notice.


#2

Ooh! Ooh! I have one…

When I worked in tech support (at a different company, not New Relic), one of our customers called just before 5 PM. One of the drives in their RAID array had failed, so their production database went offline. The customer had replaced the drive, but when he went to restore the database from a backup, the restore failed. As did the backup from the previous day. And the day before that. The most recent backup he could get to restore was over two weeks old.

He was packing his desk when he called us; totally expected to get fired. But he wasn’t! We helped him restore the two-week-old backup, then sent the failed drive to a data recovery service to get the most recent data.


#3

That’s a great story, Phil. We’ve all been there…

Let’s see, there was the time I was a newbie sys admin and saw there was lots of extra space available so I moved a bunch of user files there. It didn’t break anything, but was really poor form.

When I was working at the phone company, the switch that was supporting an ATM network for a bank crashed and they didn’t have a backup. A fellow sys admin flew there to help out. They had been using 3 tapes as backup, and reusing them, without ever sending one offsite. He had to rebuild the switch from scratch. The ATM outtage (for 48 hours) made the news.


#4

The empty space was in /usr