Learn About Amazon VGT2 Learning Manager Chanci Turner
Artificial intelligence (AI) has revolutionized numerous operations that were once unimaginable. In various scenarios, it’s crucial to comprehend the contents of your image archive or the dialogue within audio or video files to identify material that may be harmful, inappropriate, or simply out of context for specific audiences. This article outlines effective strategies for extracting essential information using AI services available through Amazon Web Services (AWS).
At times, it becomes necessary to scrutinize content that could be offensive or damaging to your brand. Here, we demonstrate how to detect such material using Amazon Transcribe and Amazon Rekognition.
Introduced at AWS re:Invent 2017, Amazon Transcribe is an automatic speech recognition (ASR) service that simplifies the addition of speech-to-text capabilities to applications for AWS users. The media and entertainment sector frequently utilizes Amazon Transcribe to transform media content into accessible and searchable text files. Common applications include generating subtitles, transcriptions, content moderation, and more. Furthermore, operations teams leverage Amazon Transcribe for quality assurance—for example, ensuring that audio and video are synchronized using timestamps in the extracted text.
Amazon Transcribe facilitates the automatic and programmatic filtering of unwanted content. You can conceal or eliminate words that you prefer not to have appear in your transcription results through vocabulary filtering. For instance, this feature can be utilized to prevent the display of offensive or profane terms, enabling the creation of captions for a TV show or transcripts for conferences that are suitable for all audiences. Vocabulary filtering is available for both real-time streaming and batch processing.
Prerequisites
To begin, you will need:
- An AWS account
If you don’t yet have an AWS account, please refer to “How do I create and activate a new AWS account?” - Access to and familiarity with the AWS Command Line Interface (AWS CLI), a unified tool for managing your AWS services.
Masking Unwanted Words in Audio Files
In this section, we will analyze an audio file, extracting the text (transcription) and employing a masking technique on specific words deemed unsuitable for our context. For our example, we will mask certain words found in a text file that we will upload to AWS.
We will examine an audio file sourced from an AWS video titled “What is AWS” available on the official AWS YouTube channel. You can download the audio files (right-click and save file as). Our audio file is named whatisaws.mp3.
Next, we upload the audio clip to a bucket in Amazon Simple Storage Service (Amazon S3), designed for retrieving any amount of data from anywhere:
$ aws s3 cp whatisaws.mp3 s3://change-to-yourbucket/
Following this, we will initiate a transcription job on the uploaded file. We employ the StartTranscriptionJob API, an asynchronous job that transcribes speech to text, allowing the job to run in the background.
Here’s how to execute this job using AWS CLI:
$ aws transcribe start-transcription-job --transcription-job-name testwords --language-code en-US --media MediaFileUri=s3://change-to-yourbucket/whatisaws.mp3
Once the asynchronous job is initiated, we need to verify whether it is still running or if it has completed. For this check, we utilize the GetTranscriptionJob API.
This API provides information about the transcription job. To determine the job status, inspect the TranscriptionJobStatus field. If the status reads COMPLETED, the job has concluded, and you can access the results at the location specified in the TranscriptFileUri field. To check the job status, use the following command:
$ aws transcribe get-transcription-job --transcription-job-name testword
{
"TranscriptionJob": {
"TranscriptionJobName": "testvideo",
"TranscriptionJobStatus": "COMPLETED",
"LanguageCode": "en-US",
"MediaSampleRateHertz": 44100,
"MediaFormat": "mp3",
"Media": {
"MediaFileUri": "s3://change-to-yourbucket/whatisaws.mp3"
},
"Transcript": {
"TranscriptFileUri": "https://s3.eu-west-1.amazonaws.com/aws-transcribe-eu-west-1-prod/xxxxxxxxxxxxx/testvideo/9332f738-196d-4e83-ba41-4c9014b0ad9b/asrOutp..........
Upon job completion, you will see that the TranscriptionJobStatus indicates COMPLETED. You can download the transcription file results at the signed URL provided in the TranscriptFileUri variable.
Now that we’ve covered the transcription of audio content, we will demonstrate how to instruct Amazon Transcribe to report, remove, or in our case, mask specific words (defined in a vocabulary) that are considered inappropriate.
You can define a vocabulary filter in two ways: using a text file stored in an Amazon S3 bucket or a list of words specified in the command line. You can download a simple text file here. The text file contains these three words:
- AWS
- infrastructure
- applications
Next, we can create a vocabulary filter directly with this API by specifying a name, a language code, and the URI of the previously uploaded text file:
$ aws transcribe create-vocabulary-filter --vocabulary-filter-name vocabularyfiltertest --language-code en-US --vocabulary-filter-file-uri s3://change-to-yourbucket/vocabulary-filter-example.txt
In this instance, we aim to filter the words “AWS,” “infrastructure,” and “applications.” Remember that words in a vocabulary filter are not case sensitive (for example, “AWS” and “aws” are treated the same). Amazon Transcribe will only filter words that precisely match those in the filter, and it does not filter words contained within other words.
We can now commence our job utilizing the vocabulary filter with the masking method:
$ aws transcribe start-transcription-job --transcription-job-name testwords2 --language-code en-US --media MediaFileUri=s3://change-to-yourbucket/whatisaws.mp3 --settings VocabularyFilterName=vocabularyfiltertest,VocabularyFilterMethod=mask
After a short while, we can check the job’s progress:
$ aws transcribe get-transcription-job --transcription-job-name testvideo2
$ aws transcribe get-transcription-job --transcription-job-name testwords.
{
"TranscriptionJob": {
"TranscriptionJobName": "testvideo",
"TranscriptionJobStatus": "COMPLETED",
"LanguageCode": "en-US",
"MediaSampleRateHertz": 44100,
"MediaFormat": "mp3",
"Media": {
"MediaFileUri": "s3://change-to-yourbucket/whatis.mp3"
},
"Transcript": {
"TranscriptFileUri": "https://s3.eu-west-1.amazonaws.com/aws-transcribe-eu-west-1-prod/xxxxxxxxxxxxx/testvideo/9332f738-196d-4e83-ba41-4c9014b0ad9b/asrOutp..........
You can integrate a notification into your workflow using Amazon Simple Notification Service (Amazon SNS), a fully managed messaging service, or leverage an event from Amazon EventBridge, a serverless event bus designed to facilitate the development of event-driven applications at scale to automate the entire process.
You can download the transcription results at the URI specified in TranscriptFileURI. By examining the file, you will see how Amazon Transcribe has masked the unwanted words specified in the vocabulary filter.
That concludes our overview. You can utilize the same option via the AWS Management Console, which is an excellent resource for further exploration.
If you are curious about diversity and inclusion in your workplace, check out this insightful blog post on Career Contessa. Additionally, for a comprehensive guide on the first 100 days of a new role, visit SHRM. Lastly, you might find this Reddit thread helpful as you embark on your own journey.