Showcasing Large-Scale Cloud Gaming Concurrency with Polystream and AWS Game Tech

We had the pleasure of inviting AWS APN Partner Polystream’s Platform Architect, Jamie Carter, to contribute to this guest blog. Discover how Polystream achieves 3D interactivity at scale utilizing AWS services such as Amazon Elastic Compute Cloud (EC2).

The landscape of cloud gaming is advancing rapidly, with 3D interactive applications like car configurators and collaborative development tools emerging. However, there remains a challenge in delivering these experiences at scale. For instance, how can a cloud-first game accommodate a Fortnite-like surge of 10 million concurrent users when current technology limits us to just a few thousand users in the cloud?

The viability of mass-market content, particularly in cloud gaming, hinges on the ability to achieve large-scale delivery. Yet, true scalability remains elusive. Traditional methods that rely on cloud-based GPUs fall short of the flexibility and elasticity that cloud solutions promise for streaming 3D interactive content. The scarcity of suitable GPUs and prohibitive costs further complicate matters. Polystream, with the backing of AWS Game Tech, is pioneering a new approach to deliver 3D interactivity at scale.

Current solutions typically depend on VMs with GPUs to stream video to each user. In contrast, Polystream’s Command Streaming technology eliminates this dependency on physical cloud hardware by implementing a software-defined architecture that transmits graphics commands instead of video. By harnessing the cloud’s computing power and connecting it to the billions of GPUs embedded in gaming consoles, smartphones, and computers, our innovative technology achieves levels of scale that were previously thought impossible.

Underpinning Polystream’s revolutionary Command Streaming technology is our globally distributed, multi-cloud service known as the Polystream Platform. This platform is engineered to leverage our key differentiator: the ability to operate without being constrained by cloud-based GPUs. It can provision, manage, and orchestrate vast numbers of interactive streams across any cloud provider worldwide.

In November 2019, we aimed to showcase Command Streaming’s capability to deliver scale in user concurrency. Our objective was to achieve a level of concurrent usage that would be challenging with typical GPU video streaming. We meticulously managed elastic provisioning and deployment, operational monitoring, intelligence gathering, and the graceful teardown of tens of thousands of interactive streams within a single day.

To facilitate this concurrency scaling test, we collaborated closely with AWS Game Tech, which played an instrumental role in the trial and provided most of the streaming compute resources. They delivered the necessary Virtual Machines through Amazon Elastic Compute Cloud (EC2) to support our anticipated levels of concurrent usage.

Prior to the large-scale test, we conducted several smaller tests to iron out any potential provisioning and deployment issues. We started with 1,000 CCU, then scaled up to 5,000, followed by 10,000, ultimately leading to our largest test aimed at achieving our CCU target.

Our provisioning method involved running pre-built Docker images containing the Terraform runtime, which interacts with AWS provisioning APIs to establish the virtual infrastructure and initiate the interactive streams setup. This approach enabled us to dynamically scale out before and during the tests, and scale in afterward. As a result, we could provision a substantial number of virtual machines rapidly in an automated manner, incorporating monitoring and allowing for automatic retries if any virtual machine encountered issues during provisioning or setup.

CCU Testing

After successful preliminary tests, we prepared to reach our target synthetic stream concurrency. The application we opted to utilize required two vCPUs and minimal RAM, leading us to select t3.micro SKUs. Previous tests indicated we could run around 1,000 synthetic clients on a specially configured VM, prompting us to pre-provision 40 instances, with four allocated for each AWS region.

Having encountered configuration challenges in some of our earlier tests that slowed provisioning in specific regions, we decided to over-provision in anticipation of our target of 23,000 concurrent streams, aiming for 4,000 virtual machines across 10 AWS regions. This strategy ensured we could still meet our target in a reasonable timeframe even if some regions took longer to provision.

We initiated the queuing of 40,000 provisioning requests, scheduling them to commence at 4 a.m. Our previous testing suggested this timing would yield a considerable number of interactive streams by the time the team arrived at the office, allowing us to start running synthetic clients promptly.

Our objective extended beyond simply provisioning a large infrastructure for a few hours; we aimed to demonstrate our capability to operate at that level of interactive streams. To achieve this, we began running a few thousand client sessions, terminating them after a period before launching another batch of clients. This ensured that our interactive streams could recover from previous sessions and be ready for new ones.

Once we were confident in our ability to manage and terminate numerous short-lived sessions, we ramped up the synthetic clients toward our goal. By midday, we had nearly reached our original target within two hours.

We successfully resolved earlier configuration issues, fulfilling almost every provisioning request, which provided us with ample capacity to exceed our target. Initially set to conclude our testing at 5 p.m., we saw that we were approaching full capacity, prompting us to queue an additional 2,000 t3.micro requests.

As we neared the deadline, we surpassed the 40,000 CCU milestone. At this point, we had not only maximized our interactive stream capacity but also the machines designated for our synthetic clients. With the test nearing its conclusion and having exceeded our target by approximately 45%, we initiated the graceful process of ending all client sessions and terminating our virtual machines.

The earliest streaming sessions had successfully delivered interactive 3D content for over four hours, proving that we could not only launch tens of thousands of sessions but also maintain that many streaming sessions over an extended period.

Business Intelligence

The overall scope and performance of the test were meticulously recorded and reported through our business intelligence platform. This ensured that we were not only assessing our provisioning and streaming capabilities but also load testing our telemetry pipelines and ancillary services, including integrations with third-party providers.

These pipelines collected telemetry data and metrics from each interactive stream, supporting services, and infrastructure, routing them to Logz.io, Grafana Cloud, or PowerBI and SQL Server in real-time, based on configured routing rules for each event raised.

Upon concluding all client sessions and confirming that all telemetry had been received and verified, we reflected on how this testing not only showcased our technology but also highlighted the potential for future advancements in cloud gaming.

In addition, if you’re interested in enhancing your skills and confidence in the workplace, check out this engaging resource on building confidence from Career Contessa. Also, you can find valuable information on complying with COVID-19 paid sick leave laws in California from SHRM, an authority on the subject. Furthermore, for those looking to expand their career opportunities, consider exploring this excellent resource for a learning trainer position available at Amazon.

Showcasing Large-Scale Cloud Gaming Concurrency with Polystream and AWS Game Tech

CCU Testing

Business Intelligence

Related Topics: