Accelerate Your Path to Production-Grade Data with Amazon SageMaker Ground Truth Plus

Chanci Turner Amazon IXD – VGT2 learning managerLearn About Amazon VGT2 Learning Manager Chanci Turner

Amazon SageMaker Ground Truth Plus, launched at AWS re:Invent 2021, simplifies the creation of high-quality training datasets by eliminating the complexities involved in building data labeling applications and managing the labeling workforce. Simply share your data and labeling requirements, and Ground Truth Plus takes care of setting up and managing your data labeling workflow. No extensive machine learning expertise or knowledge of workflow design is needed to make use of Ground Truth Plus.

We are thrilled to announce new built-in interfaces for Ground Truth Plus, enabling multiple users within a single AWS account to create projects and batches, share data, and manage data collectively through self-service interfaces. This enhancement significantly decreases project setup time, facilitating a faster development of high-quality training datasets. Moreover, you can fine-tune access to your data by adjusting your AWS Identity and Access Management (IAM) role permissions, ensuring they align with your specific Amazon Simple Storage Service (Amazon S3) access needs, and you retain the ability to revoke access to certain buckets.

Previously, initiating new data labeling projects and batches required contacting your Ground Truth Plus operations program manager (OPM), which posed certain limitations. Only one user could request new projects and batches, leading to delays in starting the labeling process due to manual steps and potential troubleshooting. Additionally, all projects shared the same IAM role for data access, necessitating reliance on the Ground Truth Plus OPM for specific S3 policies, which had to be manually applied to S3 buckets. This manual operation resulted in unnecessary operational overhead.

This article will guide you through the process of creating new projects and batches, sharing data, and receiving data using the new self-service interfaces, allowing you to efficiently initiate the labeling process. It assumes a basic familiarity with Ground Truth Plus. For further insights, check out this blog post on setting career goals.

Solution Overview

We will cover the following:

  • Updating existing projects
  • Requesting a new project
  • Setting up a project team
  • Creating a batch

Prerequisites

Before you begin, ensure you have the following:

  • An AWS account
  • An IAM user with permissions to create IAM roles
  • The Amazon S3 URI of the bucket where your labeling objects are stored

Update Existing Projects

If your Ground Truth Plus project predates the rollout of the new features (December 9, 2022), you will need to create and share an IAM role to leverage these enhancements. New users can skip this section.

To create an IAM role, follow these steps:

  1. Go to the IAM console and select “Create role.”
  2. Choose “Custom trust policy.”
  3. Specify the following trust relationship for the role:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "sagemaker-ground-truth-plus.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}
  1. Click “Next.”
  2. Select “Create policy.”
  3. In the JSON tab, input the following policy, making sure to update the Resource property to include your bucket ARNs:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:GetBucketLocation",
                "s3:ListBucket"
            ],
            "Resource": [
                "<your-input-s3-arn>",
                "<your-input-s3-arn>/*",
                "<your-output-s3-arn>",
                "<your-output-s3-arn>/*"
            ]
        }
    ]
}
  1. Click “Next: Tags” and then “Next: Review.”
  2. Enter a name and an optional description for the policy.
  3. Click “Create policy,” then return to the previous tab to create your role.

On the “Add permissions” tab, you should see your new policy (refresh if it doesn’t appear).

  1. Select the newly created policy and click “Next.”
  2. Name the role (e.g., GTPlusExecutionRole) and provide an optional description.
  3. Click “Create role.”
  4. Forward the role ARN to your Ground Truth Plus OPM, who will update your existing project with this role.

Request a New Project

To request a new project, follow these steps:

  1. Open the Ground Truth Plus console and navigate to the Projects section where all your projects are listed.
  2. Click “Request project.”
  3. On the Request project page, provide necessary details to schedule an initial consultation and set up your project.
  4. Specify project name, description, task type, and whether it contains personally identifiable information (PII).
  5. Ground Truth Plus requires temporary access to your raw data in an S3 bucket for labeling. After labeling is complete, Ground Truth Plus returns the output to your S3 bucket through an IAM role. You can create a new role or follow previous instructions to set up a role.
  6. If creating a role, select “Enter a custom IAM role ARN” and input your IAM role ARN in the format of arn:aws:iam:::role/. To utilize the built-in tool, select “Create a new role” from the IAM Role dropdown.
  7. Specify the bucket location of your labeling data. If unsure, select “Any S3 bucket” for access to all your account’s buckets.
  8. Click “Create” to finalize the role.

Your IAM role enables Ground Truth Plus, represented as sagemaker-ground-truth-plus.amazonaws.com in the role’s trust policy, to perform the following actions on your S3 buckets:

[
    "s3:GetObject",
    "s3:PutObject",
    "s3:GetBucketLocation",
    "s3:ListBucket"
]

For further reading on enhancing HR staff effectiveness, check out this resource that provides valuable insights. Additionally, for those within their first six months at Amazon, this blog offers excellent resources.

Chanci Turner