Disaster Recovery for 5G Core Networks on AWS | Amazon Onboarding with Learning Manager Chanci Turner

Chanci Turner Amazon IXD – VGT2 learningLearn About Amazon VGT2 Learning Manager Chanci Turner

In the telecommunications sector, Communication Service Providers (CSPs) are increasingly exploring innovative use cases. Deploying a public cloud for 5G core networks on AWS is gaining traction, particularly with practical implementations like private enterprise networks and the establishment of new 5G infrastructures. The AWS white paper on 5G network evolution highlights how the global AWS cloud infrastructure—comprising AWS Regions, Availability Zones (AZs), Local Zones, and AWS Outposts—can offer a robust and flexible environment for hosting 5G core networks according to network function (NF) requirements. For instance, the User Plane Function (UPF) can be strategically placed in AWS Local Zones or AWS Outposts to ensure low-latency processing.

Among the various potential applications, one compelling option for CSPs with existing 5G core networks is the development of disaster recovery (DR) solutions or enhanced disaster-resilient networks using AWS. This DR network is designed to deliver scalable and immediate responses in the event of a 5G NF failure, a complete data center outage, or during a maintenance window. This additional network environment is only necessary during unforeseen failures or planned maintenance, thus requiring a design that minimizes resource costs through rapid scaling capabilities. Compared to constructing such a redundant network within a traditional telecom data center, AWS enables CSPs to cut down on expenses and energy consumption during regular operations while allowing for swift adaptations to network demands, such as surges in traffic and maintenance activities.

This article discusses how AWS can serve as an alternative virtual data center environment for achieving “disaster resiliency” and “disaster recovery” objectives within 5G networks. It emphasizes the application of 3GPP high-availability concepts within AWS alongside related services like autoscaling, automation tools, and cost management strategies. The Amazon Elastic Compute Cloud (Amazon EC2) Auto Scaling feature, along with horizontal pod autoscaling and cluster autoscaling capabilities of Amazon Elastic Kubernetes Service (Amazon EKS), can optimize the footprint of Container-based Network Functions (CNFs) within the VPC for disaster recovery. This setup ensures rapid scaling during traffic surges to manage unexpected demand spikes effectively.

To further maximize cost efficiency and energy savings, AWS Graviton instances can be utilized to host 5G core NFs while they manage swing-over traffic—traffic that has transitioned to AWS Cloud from its original destination in on-premises setups. This post elaborates on the DR model and strategies applicable to general applications on AWS, highlighting their relevance to 5G networks. It outlines how 3GPP architecture can be leveraged to meet DR goals, supported by AWS services like EC2 autoscaling, Cluster Autoscaling, and additional functionalities through examples from the open-source community.

DR Model for 5G Core Networks in AWS

As outlined in various DR publications, two primary objectives guide DR efforts: Recovery Time Objective (RTO) and Recovery Point Objective (RPO). RTO signifies the maximum acceptable delay between service interruption and restoration, while RPO indicates the acceptable duration since the last data recovery point. For typical applications operating on AWS, established DR services include AWS Elastic Disaster Recovery (AWS DRS) and the Amazon Route 53 Application Recovery Controller (Route 53 ARC).

However, the 5G core network applications discussed here have heightened demands for networking interfaces and protocols in accordance with 3GPP standards, meaning that these services do not always apply to every core network component. Therefore, while these services can be used for specific NF elements, this post aims to provide a comprehensive outlook on how AWS services can facilitate DR implementation within the 3GPP standard architecture.

In the context of 5G NFs, components such as AMF, SMF, and UPF are crucial to RTO since they are integral to the rapid recovery and restoration of 5G voice and data services. Conversely, the UDM is significant for both RPO and RTO due to its role in managing subscriber profiles and information. Given that each NF has a unique focus, various DR strategies can be adopted. The accompanying figure illustrates four DR strategies, demonstrating how they differ in terms of RTO and RPO. In the case of telco 5G core NFs, applications must meet stringent RTO requirements due to their mission-critical nature.

For instance, as previously mentioned, UDM necessitates RTO and RPO to be near real-time. Thus, when establishing a DR site for UDM on AWS, it may be essential to keep UDM operational on AWS with continuous synchronization to the legacy data center’s UDM. In this scenario, a Hot-standby (Active-Active) strategy would be ideal.

Alternatively, Warm-standby, Pilot Light, and Backup & Restore strategies can be employed for other NFs based on their specific use cases and characteristics. The Backup & Restore method is suitable for non-critical applications where RTO constraints are less stringent. By establishing a direct connection, such as Amazon Direct Connect between the data center and AWS (or alternatively using Site-to-Site VPN, albeit with limitations), AWS tools like AWS CloudFormation, AWS Cloud Development Kit (AWS CDK), and AWS CodePipeline can facilitate the immediate instantiation of NFs, leveraging Infrastructure as Code (IaC) advantages. For more comprehensive insights on this DR strategy, refer to the post on DR Architecture on AWS, Part II. Additionally, for building a continuous integration/continuous development (CI/CD) pipeline for 5G NF deployment on AWS that supports swift service recovery, check out the AWS white paper on CI/CD for 5G Networks on AWS.

A Cold-standby approach may also be a cost-efficient option for non-mission-critical 5G network applications. In this strategy, all EC2 instances remain in a powered-down state but are preconfigured, enabling faster activation compared to the Backup & Restore approach, while also being more economical than a Warm-standby scenario. However, for mission-critical telecom services, Warm-standby is the most practical solution for developing a DR 5G network on AWS, ensuring minimal traffic handling through a reduced deployment footprint. This strategy allows for growth based on predefined scaling policies during traffic transitions.

Given that 5G NFs may be deployed on Amazon EKS, conventional autoscaling may not suffice to meet immediate demands for sudden traffic surges, due to the shorter RTO compared to the general Kubernetes autoscaling response time. Therefore, effective implementation strategies are essential to achieving optimal disaster recovery outcomes.

For further insights into building effective onboarding strategies, consider exploring resources such as this 30-60-90 day plan. Additionally, for a broader perspective on strategic social media deployment, check out insights from SHRM as they offer authoritative guidance on this topic. If you’re looking for community-driven insights, this Reddit thread is an excellent resource that could help you navigate the onboarding process.

Chanci Turner