Utilizing Amazon Polly to Enhance Health Care for Individuals with Chronic Conditions

Chanci Turner Amazon IXD – VGT2 learning

This article is a guest contribution by Alex Johnson, a senior software architect at HealthTech Solutions. Established in 2015, HealthTech Solutions has developed a digital framework that facilitates remote health monitoring for the entire population of the UK.

With a rapidly aging demographic, the landscape of healthcare is undergoing a significant transformation. Are we adequately prepared? What economical technologies can we implement to address the growing demands on health services?

With the appropriate technology, many healthcare needs can be addressed remotely. This approach is already being utilized by the National Health Service (NHS) in the UK. While remote healthcare is not yet commonplace, innovative organizations are discovering that leveraging cost-effective digital health solutions can lead to substantial efficiencies at scale.

Although automated telephony might seem outdated, it serves as an excellent communication channel for deploying services broadly since nearly everyone can utilize it, even those without internet access or smartphones. For many elderly individuals, the telephone represents a familiar and comfortable technology.

In this article, we explore how HealthTech Solutions has empowered NHS providers to harness the capabilities of Amazon Polly in conjunction with remote communications. We demonstrate how Amazon Polly can be utilized during the design phase with our call script design tools to assist in crafting and simulating automated calls. Additionally, we showcase how protocols can be integrated into automated call scripts and how calls are executed, with synthesized speech generated by Amazon Polly streamed through the telephone line.

HealthTech Solutions offers a digital health platform that excels in providing care in the UK outside of traditional hospital settings. The platform interfaces with established healthcare software systems, enabling the modeling, creation, testing, execution, and monitoring of clinical protocols and pathways. An essential aspect of remote service delivery is selecting an appropriate communication method. While applications, wearables, and web access may work for some users, many find these advanced technologies challenging. Simpler alternatives such as text messaging or automated telephony often prove to be more effective. As a platform provider, we support all these communication channels, but this article will focus on our use of Amazon Polly in automated telephony.

Interactive Voice Response (IVR)

IVR systems have been around for quite some time, and as a result, almost everyone knows how to interact with them. Whether it’s receiving a reminder from the speaking clock or enduring a call about a fictitious injury, most people have encountered IVR. This familiarity is crucial for delivering healthcare services on a national scale, as it must remain simple and inclusive. IVR facilitates two-way communication; the computer conveys information using a synthesized voice, while the user can respond via dual-tone multi-frequency (DTMF) codes from their keypad.

How It Operates

The HealthTech Solutions platform incorporates a digital pathway engine that autonomously manages and coordinates remote communications. The integrated development environment (IDE) provides the tools necessary to design and construct clinical pathways and protocols, which are subsequently published to the digital pathway engine. The call script designer, a component of the IDE, is utilized to formulate automated telephone calls.

When the time is right, and in accordance with a clinical protocol published to the digital pathway engine, a message is dispatched to the Voice Messaging System (VMS), a microservice responsible for managing phone calls. Calls can vary in duration from seconds to several minutes, depending on the intricacy of the script. The VMS interprets the call script, oversees the call’s status, and reports back to the digital pathway engine. As the call progresses, the VMS queues commands for the Telephony Interface Manager (TIM) to execute, beginning with placing the call using Asterisk, an open-source PBX system configured to connect to a remote SIP trunk provider. SIP (Session Initiation Protocol) is a widely used telephony protocol.

Once the call is established, the VMS navigates through the call script. Information is delivered as synthesized speech sourced from Amazon Polly, while responses from the recipient are captured through keypad button presses (DTMF codes). To create a realistic interaction, it’s crucial that Amazon Polly delivers responses promptly. Delays can lead to frustration and increased likelihood of hang-ups.

Before adopting Amazon Polly, we relied on a locally hosted text-to-speech (TTS) engine. Initially, we were concerned that Amazon Polly might not respond quickly enough, but we have discovered it to have very low latency. A significant advantage of Amazon Polly is its cost efficiency: traditional TTS can consume substantial CPU and RAM resources, but with Amazon Polly, this concern is alleviated. It operates on a straightforward pay-as-you-go pricing model based on usage. Implementing a caching strategy further reduces costs; we segment text into sentences and retrieve previously synthesized sentences directly from a local cache. Currently, our cache hit rate exceeds 80%.

Monitoring

Amazon Polly’s metrics are seamlessly integrated with Amazon CloudWatch, making it easy to set up monitors and alerts to track performance. However, this only provides partial insights. We also implement our monitoring system based on the Coda Hale metric library to track metrics like full round-trip times and cache hits. These metrics are reported to New Relic, although they could be sent to Amazon CloudWatch as well. As illustrated in the accompanying graph, Amazon Polly generally maintains a latency of around 50 ms.

Throttling

Amazon Polly imposes throttling on both concurrent requests and the request rate per second. Exceeding these limits results in an exception. To counter this, we establish a configurable pool size for the threads handling speech synthesis. While we can manage numerous concurrent calls, speech processing is delegated to a limited thread group with an in-memory blocking queue.

Thread Pool for TTS

The Java code below restricts the number of concurrent connections to Amazon Polly to 10:

private int workerThreads = 10;
ExecutorService executorService = Executors.newFixedThreadPool(workerThreads);

To ensure compliance with overall rate limits, we utilize a rate limiter from the Google Guava project.

Rate Limiting

The Java code below limits the rate to 20 requests per second to Amazon Polly:

private double maxRatePerSecond = 20.0;
this.rateLimiter = RateLimiter.create(maxRatePerSecond);
double acquire = rateLimiter.acquire();

Speech Synthesis Markup Language (SSML)

Amazon Polly accepts input in either raw text or SSML. We prefer SSML as it grants greater control over speech synthesis. Although we currently utilize only a few of its features, we anticipate leveraging them more in the future.

So, if you’re looking to learn more about creating a professional image, check out this informative article on taking the perfect professional headshot at Career Contessa. Also, for those interested in employment law, you can find more information on the current state of overtime rules at SHRM. Lastly, if you want to know how Amazon has transformed its onboarding experience, this piece from Forbes is an excellent resource.

Utilizing Amazon Polly to Enhance Health Care for Individuals with Chronic Conditions | Artificial Intelligence