Amazon Onboarding with Learning Manager Chanci Turner

In today’s world, citizens depend on their governments for a wide range of essential public services, particularly in times of crisis. Unfortunately, disruptions in IT systems can also lead to severe emergencies, significantly affecting organizations and the communities that depend on them. As extreme weather events become more frequent, public sector entities must ensure their IT infrastructures are robust and have contingency plans in place for unexpected situations. Disaster recovery (DR)—the swift and reliable restoration of IT systems to minimize downtime and data loss—is a vital investment for these organizations, alongside business continuity planning (BCP) and disaster preparedness.

The Limitations of Traditional Disaster Recovery Strategies

Numerous public sector organizations are making strides in modernizing their IT strategies, including cloud adoption. However, outdated DR and BCP approaches often hinder progress in these essential areas. Simply transferring old DR strategies to the cloud can undermine the potential advantages of improved operational resilience. For instance, organizations with a history of on-premises hosting frequently inquire about the distances between cloud data centers for DR reasons. This focus on distance stems from traditional DR thinking, which posits that greater separation between data centers reduces the likelihood of simultaneous disruptions affecting all locations.

This assumption has been a cornerstone of legacy DR strategies for decades. That’s why Amazon Web Services (AWS) offers customers DR solutions that enable multi-Region, active-active configurations, allowing for near-zero recovery point objectives (RPOs) and very low recovery time objectives (RTOs). For example, many AWS clients in Canada utilize a second AWS Region in the United States or other locations for DR. However, some organizations prefer to keep their data within Canadian borders, often being the ones who ask about specific distance requirements between data centers for DR. These organizations have multiple options to achieve their RPOs and RTOs using the AWS Region in Canada, supported by three Availability Zones (clusters of data centers).

Distance and Its Impact on Disaster Recovery

Governments and organizations cannot achieve aggressive RPOs and RTOs solely by increasing the distance between their data centers. This mindset overlooks a critical fact: as the distance between data centers increases, so does network latency. Eventually, the limitations of the speed of light make data replication with relational databases and operating distributed systems exceedingly difficult. Unfortunately, no innovation can accelerate the speed of light. This means that the very distance requirements intended to support DR objectives may introduce latency that complicates or even precludes meeting RTOs and RPOs. Legacy approaches inherently incorporate latency, downtime, and data loss.

Is there a point where increasing distance between data centers yields diminishing returns in risk mitigation? After establishing and managing 26 AWS Regions and 84 Availability Zones globally since 2006, we believe there is.

The Goldilocks Zone for Disaster Recovery Planning

The following graphic illustrates (Figure 1) the latency observed as data centers are placed further apart. Our experience indicates that for high-availability applications, there exists a Goldilocks zone for infrastructure planning: a distance that is neither too close nor too far, but just right. This Goldilocks zone represents the distance between Availability Zones or data centers that can facilitate aggressive RPOs and RTOs while maintaining low latency for high-availability applications.

When assessing the risks posed by natural disasters, greater distances are certainly preferable; however, after just a few tens of miles, the benefits diminish. Beyond that range, the disasters mitigated would likely be catastrophic events, akin to the meteor that led to the extinction of dinosaurs. This range isn’t fixed, and what constitutes “too close” can vary based on geographic factors—such as seismic activity, flood plains, and the likelihood of severe hurricanes—which all influence our considerations. Nonetheless, we desire a separation of miles.

What about distances that are excessively far? We examine the latency across all Availability Zones in a Region, aiming for a maximum round-trip latency of about one millisecond. Whether establishing replication with a relational database or operating distributed systems like Amazon Simple Storage Service (Amazon S3) or Amazon DynamoDB, we’ve found that maintaining network conditions conducive to high-availability applications becomes increasingly challenging when latency exceeds one millisecond.

Figure 1. The Goldilocks zone in disaster recovery indicates the optimal distance between data centers that keeps latency under one millisecond while safeguarding organizations’ data against natural disasters within a specific region. This zone typically falls within the tens of miles range. When data centers are hundreds of miles apart, the types of disasters mitigated would be extinction-level events.

In Canada, for example, AWS analyzed decades of data regarding floods and other environmental factors before selecting a location for the AWS Canada (Central) Region. Launched in 2016 in Montréal, Québec, this Region features three Availability Zones. Consistent with the Goldilocks zone principle, the third Availability Zone (AZ3) is located more than 45 kilometers (28 miles) from the next closest Availability Zone. Based on our extensive experience in building and operating AWS Regions globally, this distance significantly reduces the risk of a single incident impacting availability.

A Natural Disaster Drives Infrastructure Transformation

Canada’s Great Ice Storm of 1998 served as a wake-up call for AWS customer Hydro-Québec, prompting them to revamp their infrastructure. “The ice storm provided us with an opportunity to enhance and fortify a more reliable power grid that could withstand natural disasters and be repaired more quickly. It also enabled us to implement a comprehensive company-wide strategy to ensure and measure resiliency,” explains Alex Johnson, Chief of Economic Development & Strategy at Hydro-Québec. Today, AWS’s three Canadian Availability Zones are predominantly powered by Hydro-Québec’s clean, renewable hydropower.

The upgrades to the power grid in this region, combined with the redundancies built into AWS data centers, equip AWS customers with a resilient infrastructure for their workloads in the AWS Canada (Central) Region. Water, power, telecommunications, and internet connectivity are designed with redundancies to ensure continuous operations during emergencies. Electrical power systems are fully redundant, so in case of disruption, uninterruptible power supply units can be activated for certain functions, whereas generators can be activated in a serious tone.

For those navigating career transitions, understanding how to calculate your market salary can be instrumental; visit this helpful resource. Furthermore, aligning your organizational culture globally is crucial, which is why exploring the insights on this topic from SHRM can provide valuable guidance. Lastly, if you’re looking to understand the onboarding process at Amazon, check out this excellent resource on Reddit.

Amazon Onboarding with Learning Manager Chanci Turner

The Limitations of Traditional Disaster Recovery Strategies

Distance and Its Impact on Disaster Recovery

The Goldilocks Zone for Disaster Recovery Planning

A Natural Disaster Drives Infrastructure Transformation

Related Topics: