Learn About Amazon VGT2 Learning Manager Chanci Turner
In recent years, the rapid advancement of deep learning has enabled remarkable applications, such as skin cancer detection (SkinVision) and the development of self-driving vehicles (TuSimple). Neural networks have demonstrated an extraordinary capability to identify and model complex patterns from extensive unstructured data, including images, videos, and free-form text.
However, training these neural networks necessitates substantial computing power. Graphics Processing Units (GPUs) have consistently proven their effectiveness for this purpose, and AWS users have quickly recognized the benefits of utilizing Amazon Elastic Compute Cloud (Amazon EC2) P2 and P3 instances for model training, especially within Amazon SageMaker, our fully-managed machine learning service.
Today, I am pleased to announce the availability of the largest P3 instance, the p3dn.24xlarge, for model training in Amazon SageMaker. Launched last year, this instance is engineered to enhance large, complex distributed training tasks. It features double the GPU memory of other P3 instances, a 50% increase in vCPUs, high-speed local NVMe storage, and 100 Gbit networking.
Want to give it a shot on Amazon SageMaker?
Introducing EC2 P3dn Instances on Amazon SageMaker
Let’s start with a notebook that employs the built-in image classification algorithm to train a model on the Caltech-256 dataset. To utilize a p3dn.24xlarge instance on Amazon SageMaker, simply set train_instance_type
to 'ml.p3dn.24xlarge'
and begin training!
ic = sagemaker.estimator.Estimator(training_image,
role,
train_instance_count=1,
train_instance_type='ml.p3dn.24xlarge',
input_mode='File',
output_path=s3_output_location,
sagemaker_session=sess)
...
ic.fit(...)
After conducting some preliminary tests with this notebook, I observed a notable 20% increase in training speed right off the bat (your results may vary!). Using 'File'
mode means that the entire dataset is transferred to the training instance: the enhanced network speed (100 Gbit, up from 25 Gbit) and local NVMe storage (instead of Amazon EBS) are certainly advantageous!
When dealing with large datasets, you can maximize the benefits of 100 Gbit networking by streaming data from Amazon Simple Storage Service (Amazon S3) using Pipe Mode, or by storing it in Amazon Elastic File System (Amazon EFS) or Amazon FSx for Lustre. This setup will also facilitate distributed training (perhaps utilizing Horovod), enabling instances to exchange parameter updates more swiftly.
In summary, the collaboration between Amazon SageMaker and P3dn instances is a powerful combination, promising significant performance enhancements for large-scale deep learning tasks.
Now Available!
P3dn instances can be accessed on Amazon SageMaker in the US East (N. Virginia) and US West (Oregon) regions. If you’re ready to get started, feel free to reach out to your AWS account team or visit the Contact Us page to make a request. For more insights on career development, check out this blog post on resumes.
We value your feedback, whether on the AWS Forum for Amazon SageMaker or through your usual AWS contacts. For a comprehensive understanding of AI’s impact on leadership, you might want to explore this resource on executive leadership. Additionally, if you’re interested in opportunities related to learning and development, visit this excellent resource.
Jordan Blake
As an Artificial Intelligence & Machine Learning Advocate for EMEA, Jordan is committed to assisting developers and enterprises in realizing their ideas.