Optimize Hyperparameters with Amazon SageMaker Automatic Model Tuning

Chanci Turner Amazon IXD – VGT2 learning

Machine learning (ML) models are revolutionizing various industries. Their efficacy hinges not only on selecting the appropriate training data and model but also on fine-tuning specific parameters known as hyperparameters. So, what exactly does this term mean? The parameters learned during the training process form the model, while hyperparameters dictate how the training occurs. They include critical factors like the number of epochs, learning rates, and maximum depth of decision trees. These hyperparameters significantly influence the model’s performance, making their careful adjustment essential.

Much like tuning a radio to find the perfect frequency, optimizing hyperparameters is vital for achieving optimal performance. This meticulous process of searching for the best values is known as hyperparameter tuning or hyperparameter optimization (HPO), ultimately leading to a model that produces accurate predictions.

In this article, we will initiate our first HPO job using Amazon SageMaker Automatic Model Tuning (AMT). We will explore the available methods for result analysis and create insightful visualizations of our HPO trials.

Amazon SageMaker Automatic Model Tuning

As an ML practitioner utilizing SageMaker AMT, your main responsibilities include:

Setting up a training job
Defining an appropriate objective metric for your task
Specifying the hyperparameter search space

SageMaker AMT manages the rest, eliminating the need for you to worry about infrastructure, orchestrating training jobs, or selecting hyperparameters.

Let’s dive into our first HPO job, which involves training and tuning an XGBoost algorithm. To ensure a hands-on experience, we’ve shared an example in our GitHub repository, specifically covering the 1_tuning_of_builtin_xgboost.ipynb notebook.

In a future post, we’ll discuss not just finding the best hyperparameters but also understanding the search space and the sensitivity of models to various hyperparameter ranges. We will also illustrate how to transform a one-time tuning activity into an ongoing dialogue with the ML practitioner, fostering collaborative learning. So, stay tuned (pun intended)!

Prerequisites

This content is designed for those curious about HPO, and prior knowledge isn’t necessary. However, a basic understanding of ML concepts and proficiency in Python programming will enhance your experience. For optimal learning, we recommend running each step in the notebook as you read through this article. At the end of the notebook, you can also explore an interactive visualization that brings the tuning results to life.

Solution Overview

We will construct a complete setup to execute our first HPO job using SageMaker AMT. Upon completion of the tuning job, we will examine methods for exploring results through the AWS Management Console and programmatically via AWS SDKs and APIs.

Initially, we’ll get acquainted with the environment and SageMaker Training by executing a standalone training job without any tuning. We’ll utilize the XGBoost algorithm, one of many built-in options provided by SageMaker (no training script is needed!).

We’ll observe how SageMaker Training operates by:

Starting and stopping an instance
Provisioning the necessary container
Uploading the training and validation data to the instance
Running the training
Collecting metrics and logs
Storing the trained model

Next, we will proceed to AMT and initiate an HPO job:

We’ll set up and launch our tuning job with AMT
We will investigate available methods to extract detailed performance metrics and metadata for each training job, enabling us to understand the optimal values in our hyperparameter space
We will demonstrate how to view the trial results
Finally, we will provide tools to visualize data through a series of charts that yield valuable insights into our hyperparameter space

Training a SageMaker Built-in XGBoost Algorithm

The process begins with model training. By doing so, we can grasp the functionality of SageMaker Training. We intend to leverage the speed and convenience of SageMaker’s built-in algorithms. The following steps outline how to get started:

Prepare and upload the data – We will download and organize our dataset as input for XGBoost and upload it to our Amazon Simple Storage Service (Amazon S3) bucket.
Select the built-in algorithm’s image URI – SageMaker uses this URI to retrieve our training container, which has a ready-to-use XGBoost training script.
Define the hyperparameters – SageMaker provides an interface to specify the hyperparameters for our built-in algorithm, which align with those used in the open-source version.
Construct the estimator – Here, we will define training parameters like instance type and number of instances.
Call the fit() function – This initiates our training job.

The diagram below illustrates how these steps interconnect.

Provide the Data

To conduct ML training, we must supply data. We will share our training and validation data with SageMaker via Amazon S3. For simplicity, we will utilize the SageMaker default bucket to store our data. However, feel free to modify these values per your preferences:

sm_sess = sagemaker.session.Session([..])
BUCKET = sm_sess.default_bucket()
PREFIX = 'amt-visualize-demo'
output_path = f's3://{BUCKET}/{PREFIX}/output'

In the notebook, we utilize a public dataset and save the data locally in the data directory. Afterward, we upload our training and validation data to Amazon S3. We will then define pointers to these locations for SageMaker Training.

# acquire and prepare the data (not shown here)
# store the data locally
[..]
train_data.to_csv('data/train.csv', index=False, header=False)
valid_data.to_csv('data/valid.csv', index=False, header=False)
[..]
# upload the local files to S3
boto_sess.resource('s3').Bucket(BUCKET).Object(os.path.join(PREFIX, 'data/train/train.csv')).upload_file('data/train.csv')
boto_sess.resource('s3').Bucket(BUCKET).Object(os.path.join(PREFIX, 'data/valid/valid.csv')).upload_file('data/valid.csv')

In this post, we focus on introducing HPO. For illustration, we utilize a specific dataset and task to obtain measurements of objective metrics used to optimize hyperparameter selection. However, the actual data or task is not crucial to understanding the overall concept. For completeness, we will train an XGBoost model to classify handwritten digits from the Optical Recognition of Handwritten Digits Data Set via Scikit-learn. XGBoost is a robust algorithm for structured data and is well-suited for the Digits dataset. The values are represented as 8×8 images, as shown in the examples of a 0, a 5, and a 4.

For further insights on building your career, check out this blog post on pitching flexible work schedules. Additionally, learn how to enhance your soft skills as a student or new graduate with tips from SHRM. Lastly, if you are interested in onboarding processes, this Reddit thread is an excellent resource.