Learn About Amazon VGT2 Learning Manager Chanci Turner
Public health organizations have access to a vast array of data concerning various diseases, health trends, and associated risk factors. Traditionally, staff have relied on statistical models and regression analyses to make critical decisions, such as identifying populations at higher risk and forecasting disease outbreaks.
When public health challenges arise, the volume of data can surge, complicating data management and analysis. The urgency of the situation demands swift and agile data analysis, which can hinder timely and effective health responses.
Common Inquiries During Crises
Common inquiries public health organizations encounter during crises include:
- Will there be adequate therapeutics available in specific areas?
- What risk factors influence health outcomes?
- Which populations are more susceptible to reinfection?
Addressing these questions necessitates an understanding of complex interrelations among various factors—often dynamic and shifting. Machine learning (ML) is a powerful tool for analyzing, predicting, and resolving these intricate quantitative issues. We have witnessed ML’s application in tackling challenging health-related problems, such as classifying brain tumors through image analysis and predicting the need for mental health interventions for early response programs.
However, a significant challenge emerges when public health organizations lack the necessary skills to implement ML solutions. This gap can hinder the application of these potent quantitative tools to confront pressing issues.
To eliminate these obstacles, we must democratize ML, enabling a broader range of health professionals with deep domain expertise to utilize it effectively.
Amazon SageMaker Canvas serves as a no-code ML platform that equips public health professionals—including epidemiologists, informaticians, and biostatisticians—to apply ML to their inquiries without requiring a background in data science or ML expertise. This allows them to focus on data, apply their knowledge, swiftly test hypotheses, and derive insights. Canvas fosters equity in public health by democratizing ML, enabling health experts to analyze extensive datasets and gain advanced insights.
In this article, we illustrate how public health experts can forecast the demand for a specific therapeutic over the next 30 days using Canvas. The platform offers a user-friendly interface that allows for the generation of accurate ML predictions without the need for coding experience.
Solution Overview
Imagine we are working with data collected from various states across the US. We might hypothesize that a certain municipality will experience a shortage of therapeutics in the upcoming weeks. How can we efficiently and accurately test this assumption?
For this demonstration, we utilize a publicly available dataset from the US Department of Health and Human Services, which includes state-aggregated time series data related to COVID-19, covering aspects like hospital utilization and the availability of certain therapeutics. The dataset (COVID-19 Reported Patient Impact and Hospital Capacity by State Timeseries (RAW)) is accessible via healthdata.gov and comprises 135 columns and over 60,000 rows, with periodic updates.
Subsequent sections will guide you through exploratory data analysis and preparation, building the ML forecasting model, and generating predictions using Canvas.
Conducting Exploratory Data Analysis and Preparation
For time series forecasting in Canvas, we must reduce the number of features or columns to comply with service quotas. Initially, we narrow our focus to 12 columns deemed most relevant. For instance, we eliminate age-specific columns as we aim to forecast overall demand. We also discard columns with redundant data. In future iterations, it may be beneficial to experiment with retaining additional columns and employ Canvas’s feature explainability to evaluate their importance.
Reviewing the dataset, we opt to exclude all rows from 2020 due to limited therapeutic availability at that time. This step enhances the data quality for our ML model.
Reducing the columns can be accomplished in several ways, such as editing the dataset in a spreadsheet or directly within Canvas.
Data can be imported into Canvas from many sources, including local files, Amazon Simple Storage Service (Amazon S3) buckets, Amazon Athena, Snowflake (see Prepare training and validation dataset for facies classification using Snowflake integration and train using Amazon SageMaker Canvas), and over 40 additional data sources.
Once the data is imported, we can visualize it through scatterplots or bar charts for additional insights. We also examine the correlation among features to ensure we have selected the most relevant ones.
Building the ML Forecasting Model
Now, we can create our model with just a few clicks. We identify on-hand therapeutics as our target column. Canvas automatically recognizes our task as a time series forecast based on the selected target, allowing us to configure the necessary parameters.
We set the item_id, the unique identifier, as location, given that our dataset is organized by US states. To create a time series forecast, we select the timestamp, which corresponds to the date in our dataset. Lastly, we specify a 30-day forecast period. Canvas also allows for the inclusion of a holiday schedule to enhance accuracy; in this instance, we utilize US holidays relevant to our dataset.
With Canvas, we can derive insights from the data before building a model by selecting Preview model. This feature saves time and costs by avoiding model construction if results are unlikely to meet expectations. Upon previewing our model, we notice that some columns have low impact, prompting us to remove them and observe an improvement in the estimated quality metric.
When it comes to model building, we have two options: Quick build and Standard build. Quick build generates a trained model in under 20 minutes, prioritizing speed over accuracy, making it ideal for experimentation. Standard build takes under 4 hours, prioritizing accuracy and iterating through various model configurations to select the best one.
Initially, we test our model with Quick build to confirm our preview. Once satisfied, we opt for Standard build to ensure Canvas develops the most effective model for our dataset.
By the way, if you’re interested in management styles, check out this blog post on micromanagement here. Also, for insights on employee rights, see this piece from SHRM regarding age discrimination here. For community discussions and shared experiences, you can explore this excellent resource on Amazon onboarding here.