Amazon Simple Storage Service (S3) is a well-known storage service from Amazon Web Services. It’s a frequently used component in many different types of services, web, and mobile applications.
But what do you do when an S3 region goes down? For additional redundancy against S3 objects being unavailable for an entire region, AWS Cross-Region Replication may be a potential solution. Here’s a high-level overview of how it works.
Amazon S3 Cross-Region Replication works for individual S3 buckets when versioning is enabled. After replication is enabled on the versioned bucket, any uploads to a bucket (“the source bucket”) in one region are asynchronously replicated to a bucket in another region (“the destination bucket”). This feature can be enabled for all objects in a bucket or selectively using a prefix filter.
The full AWS documentation—with details on what is and is not replicated, and necessary ACLs, is here.
How to enable cross-region replication
As described in AWS’s official documentation, cross-region replication is a feature available through S3 bucket versioning—the contents of the source bucket will only be replicated to one other region. There’s a helpful walkthrough when both buckets are owned by the same account and full documentation for enabling through the AWS Console here.
What happens when a region becomes available
For Amazon S3 buckets that power a static website, S3 region replication allows you to easily switch the region to the destination bucket using DNS if the source bucket becomes unavailable.
For more complex services and applications that aren’t simply serving static files—handling failover is more difficult. Here are some questions to ask when designing for S3 region failover with replication enabled:
- Is it possible to offer a degraded, read-only S3 service when a region becomes available?
- What are the DNS TTLs and how quickly can a failover to the destination replication region occur? Has this been tested in gameday scenarios?
- What happens to changes in the source bucket that haven’t been replicated? Is any data loss acceptable during failover?
- Are the necessary bucket ACLs in place to allow failover? Have they been tested?
If writes are critical to an S3 bucket and failover occurs, how will replication be re-enabled once service is restored to all regions?
It’s often hard to keep track of various ACLs and bucket configuration settings across regions. At New Relic, we track bucket configuration—including versioning and replication settings—using our AWS S3 Integration.
With recent events, it’s clear we’re still learning best practices for operating S3 for high-availability settings. Other comments, suggestions, or considerations for using S3 are welcome in the comments.