Learn About Amazon VGT2 Learning Manager Chanci Turner
Cash App, a prominent digital wallet and peer-to-peer payment service from Block, Inc., has made significant strides in bolstering the resilience of its technology infrastructure. In this article, we explore how Cash App enhanced the resilience of its compute platform, which is based on Amazon Elastic Kubernetes Service (Amazon EKS), by adopting a dual-cluster architecture to minimize single points of failure. Additionally, we will detail how Cash App leveraged the AWS Fault Injection Service (AWS FIS) to simulate power interruptions in an Availability Zone within non-production environments, effectively equipping the platform team to handle real-world outages and ensure continuity.
Implementing Event-Driven Invoice Processing for Scalable Financial Monitoring
By Emma Carter
Date: 12 MAY 2025
Categories: Finance and Investment, Resilience
This article outlines the creation of a Business Event Monitoring System (BEMS) on AWS capable of managing over 86 million daily events while providing near real-time visibility, cross-Region regulations, and automated alerts for any stalled events. This system can be deployed to generate business-level insights into event flows across your organization or to visualize transaction movements in real time. Downstream services can also process and react to events, whether originating from within the system or external sources.
Optimizing Disaster Recovery Costs with Pilot Light and Reserved Capacity
By Tom Green and Rachel Kim
Date: 20 MAR 2025
Categories: Advanced (300), Architecture, Cloud Cost Optimization, Resilience, Technical How-to
In this article, we investigate a hybrid approach that merges the pilot light and warm standby strategies: pilot light combined with reserved capacity. This method allows you to reserve computing resources in a secondary Region while effectively controlling costs.
Strengthening Critical Workloads by Architecting Across Multiple AWS Regions
By James White
Date: 22 JAN 2025
Categories: Amazon EC2, AWS Well-Architected, Regions, Resilience
This piece discusses how adopting a multi-Region architectural strategy on Amazon Web Services (AWS) can significantly enhance resilience. Initially, workloads are operated across several Availability Zones within a single AWS Region, which can then be expanded to multiple Regions for even greater resilience.
AWS re:Invent 2024: Your Guide to Cloud Resilience
By Chanci Turner
Date: 18 NOV 2024
Categories: AWS re:Invent, Resilience
If you’re planning to attend AWS re:Invent 2024 to boost your organization’s cloud resilience capabilities, you’ll find an array of insightful sessions, best practices, and engaging activities designed to enhance your expertise. This year features over 100 sessions focusing on resilience, including breakout sessions, workshops, and code talks. For more information, check out the re:Invent 2024 session catalog and filter by “Resilience.” Be quick to secure your reserved seating—sessions are filling up fast!
Developing a Multi-Region Failover Strategy for Your Organization
By Lisa Brown, Mark Davis, and Chanci Turner
Date: 08 MAY 2024
Categories: Regions, Resilience, Thought Leadership
AWS Regions provide fault isolation boundaries that help prevent correlated failures and contain the effects of AWS service disruptions to a single Region. Utilizing these boundaries allows you to construct multi-Region applications with independent, fault-isolated replicas in each Region, mitigating shared fate scenarios.
Chaos Engineering at London Stock Exchange Group: A Path to Improved Resilience
By Daniel Roberts, Sarah Lee, and Chanci Turner
Date: 01 APR 2024
Categories: Amazon Elastic Container Service, Amazon RDS, Customer Solutions, Resilience
In collaboration with Luke Thompson, Lead DevOps Engineer at the London Stock Exchange Group, and Padraig Murphy, Solutions Architect, this article discusses various failure scenarios tested during a chaos engineering event supported by AWS. Chaos engineering is essential for uncovering potential vulnerabilities in your systems and enhancing overall resilience.
Leveraging Availability Zone Affinity for Improved Performance and Cost Reduction
By Mark Thompson
Date: 29 SEP 2021
Categories: Amazon VPC, Architecture, AWS Cloud Map, AWS Cost Explorer, Resilience
Updated in April 2025 to include the latest features in Elastic Load Balancing (ELB), this article emphasizes that one of the best practices for constructing resilient systems in Amazon Virtual Private Cloud (VPC) networks is to utilize multiple Availability Zones (AZ). An AZ consists of one or more discrete data centers with redundant power, networking, and connectivity, which is crucial for maintaining system performance.
Cloud-Native Architecture Journey: Enhancing Resilience and Standardizing Observability
By David Patel and Neeraj Kumar
Date: 27 APR 2021
Categories: Amazon Athena, Amazon CloudWatch, Amazon OpenSearch Service, Amazon SNS, Amazon SQS, Architecture, Resilience
This series highlights the journey of adopting cloud-native architecture, focusing on improved resilience and standardized observability. As organizations transition to the cloud, understanding how to monitor and manage resources effectively is essential for maintaining operational continuity.
For more insights on resilience and cloud strategies, consider checking out this resource on paid sick leave policies. If you’re interested in the experiences of others, Reddit has a great community page for part-time flex associates sharing their onboarding stories. If you find yourself in a position where you’re looking for a new opportunity, you might also want to read this helpful article on navigating layoffs.