Learn About Amazon VGT2 Learning Manager Chanci Turner
In the realm of artificial intelligence, image classification and object detection technologies enable the development of scalable models tailored for various business applications, such as visual search, product recommendations, and content moderation. While services like Amazon Rekognition provide APIs for image analysis and object recognition, certain use cases demand bespoke image classification models that require access to specialized datasets. These datasets must cater to specific needs and include sufficient image samples to effectively train machine learning models.
In this article, I will outline how to leverage Shutterstock’s image datasets available on AWS Marketplace to train computer vision models. Shutterstock boasts a library of over 370 million images, offering curated collections that can be subscribed to, including categories like Food & Beverage, Clothing, and Hospitality. Alternatively, the Shutterstock Data Exchange team can assist in creating custom image collections tailored to your requirements. Each image is accompanied by a descriptive title of up to 200 characters and a set of 7-50 relevant keywords.
To illustrate this process, I will be utilizing the Free Sample: Images & Metadata of “Whole Foods” Shoppers dataset from Shutterstock’s offerings. This dataset consists of images depicting Whole Foods shoppers, each tagged with descriptive keywords. For example, one image captures a store employee assisting a customer, while another showcases a couple selecting fresh produce.
To construct my image classification model, I will employ the Amazon SageMaker image classification algorithm, which is designed for supervised learning and supports multi-label classification. This algorithm utilizes a convolutional neural network (ResNet) that can be trained from scratch or through transfer learning, particularly useful when large volumes of training images aren’t available.
Solution Overview
The architecture for this solution includes several key components:
- The Shutterstock Free Sample dataset for demonstration purposes.
- An Amazon S3 bucket for storing the image training dataset.
- A SageMaker notebook for coding, training, and evaluating the ML model.
- A SageMaker model endpoint for inference, which can provide predictions either in real-time or through batch processing.
As depicted in the architecture diagram, images will be exported from AWS Data Exchange to the S3 bucket. Subsequently, a SageMaker notebook will be used to prepare the image data, train the model, and deploy an endpoint. Finally, the model can be evaluated on test images to retrieve predicted labels and confidence scores.
Prerequisites
To implement this solution, ensure you have:
- An AWS account—if you need to create one, simply follow the signup process.
Solution Walkthrough
Step 1: Export the Shutterstock Whole Foods Image Dataset to an S3 Bucket
Begin by subscribing to the dataset and exporting it to your S3 bucket:
- If you lack an S3 bucket, navigate to the S3 console and select “Create bucket.” It’s advisable to create this in the US East (Ohio) region to avoid cross-Region charges.
- Subscribe to Shutterstock’s dataset via the AWS Marketplace. Follow this link to proceed with the subscription.
- Access the AWS Data Exchange Console. Under “My subscriptions,” select “Entitled Data.”
- Find the “Whole Foods” dataset and export both the “Images” and “Metadata” revisions to your S3 bucket.
Monitor the export job until it shows as “Completed.”
Step 2: Train, Test, and Export Your Model Using SageMaker Notebooks
Next, use an Amazon SageMaker notebook instance for model training:
- Open the Amazon SageMaker console.
- Under “Notebook,” select “Notebook instances” and create a new instance.
- Name your instance and select “t2.medium” as the type.
- Create a new IAM role linked to your S3 bucket.
- Clone the public Git repository by entering the URL: AWS Data Exchange Shutterstock Image Datasets.
- After a few minutes, open the Jupyter environment and navigate to the classification notebook.
Follow the notebook’s instructions to create your image classification model, ensuring to update the dataset path accordingly.
This resource is particularly helpful for those looking to engage with Amazon’s offerings. For more information on workplace dynamics, check out this blog post. Additionally, if you’re preparing for interviews, this resource can provide valuable insights.