Unlock Insights from Audio Transcripts Using Amazon Transcribe and Amazon Bedrock | Artificial Intelligence

Unlock Insights from Audio Transcripts Using Amazon Transcribe and Amazon Bedrock | Artificial IntelligenceLearn About Amazon VGT2 Learning Manager Chanci Turner

Generative AI is continually advancing the boundaries of what’s achievable. A particularly noteworthy focus is the application of generative AI for analyzing audio and video transcripts, enhancing our capacity to derive meaningful insights from content housed within audio or video files. Speech data presents unique complexities, making its analysis and insight extraction challenging. Manually transcribing and scrutinizing this data can be both time-consuming and resource-heavy.

Current techniques for extracting insights from speech data often necessitate laborious human transcription and review. While automatic voice recognition tools can convert audio and video data to text, manual processes are still needed to pinpoint specific insights and data points or to summarize the content. This process can be quite tedious, and as organizations accumulate vast quantities of this content, the demand for a more efficient and insightful solution grows increasingly urgent. There exists considerable potential for enhancing business value given the extensive data that organizations retain in these formats, along with the valuable insights that might otherwise remain undiscovered. Here are some of the innovative insights and capabilities that can be harnessed using large language models (LLMs) with audio transcripts:

  • LLMs can interpret and grasp the context of a conversation, including not just the spoken words but also implied meanings, intentions, and emotions. This previously required significant human interpretation.
  • Advanced sentiment analysis can now be performed by LLMs. While traditional sentiment analysis could capture basic emotions, LLMs can recognize more nuanced feelings such as sarcasm, ambivalence, or mixed emotions by comprehending the conversation’s context.
  • LLMs can produce concise summaries, not merely by extracting content but by understanding the context of the dialogue.
  • Users can pose complex, natural language questions and receive insightful answers.
  • LLMs can deduce personas or roles within a conversation, leading to targeted insights and actions.
  • LLMs can aid in generating new content based on audio assets or discussions, following predefined templates or flows.

In this article, we explore how to create business value through speech analytics, with examples focusing on:

  • Automatically summarizing, categorizing, and analyzing marketing materials such as podcasts, recorded interviews, or videos, and developing new marketing content based on those assets.
  • Automatically extracting key points, summaries, and sentiment from recorded meetings, such as earnings calls.
  • Transcribing and analyzing contact center conversations to enhance customer experience.

The first step in gaining insights from audio data involves transcribing the audio file with Amazon Transcribe. Amazon Transcribe is a machine learning (ML) managed service that automatically converts speech to text, allowing developers to easily integrate speech-to-text functionalities into their applications. It also recognizes multiple speakers, automatically redacts personally identifiable information (PII), and enhances transcription accuracy through custom vocabularies relevant to specific industries or use cases, or by employing custom language models.

The subsequent step is utilizing foundation models (FMs) with Amazon Bedrock to summarize content, identify topics, and derive conclusions, extracting valuable insights that can guide strategic decisions and innovations. The automatic generation of new content further increases value, boosting creativity and productivity.

Generative AI is transforming how we analyze audio transcripts, enabling the extraction of insights such as customer sentiment, pain points, recurrent themes, risk mitigation paths, and more—insights that were previously hidden.

Use Case Overview

In this article, we delve into three detailed use cases. The code samples are in Python. We utilized a Jupyter notebook to execute the code snippets. You can follow along by creating and running a notebook in Amazon SageMaker Studio.

Audio Summarization and Insights, and Automated Content Generation with Amazon Transcribe and Amazon Bedrock

In this use case, we illustrate how to convert an existing marketing asset (a video) into a new blog post to announce its launch, produce an abstract, and extract key topics along with SEO keywords for documentation and categorization purposes.

Transcribing Audio with Amazon Transcribe

For this demonstration, we used a technical talk from AWS re:Invent 2023 as our sample. We downloaded the MP4 file of the recording and stored it in an Amazon Simple Storage Service (Amazon S3) bucket.

The first task is to transcribe the audio file using Amazon Transcribe:

# Create an Amazon Transcribe transcription job by specifying the audio/video file's S3 location
import boto3
import time
import random
transcribe = boto3.client('transcribe')
response = transcribe.start_transcription_job(
    TranscriptionJobName=f"podcast-transcription-{int(time.time())}_{random.randint(1000, 9999)}",
    LanguageCode='en-US',
    MediaFormat='mp3',
    Media={
        'MediaFileUri': '<S3 URI of the media file>'
    },
    OutputBucketName='<name of the S3 bucket that will store the output>',
    OutputKey='transcribe_bedrock_blog/output-files/',
    Settings={
        'ShowSpeakerLabels': True,
        'MaxSpeakerLabels': 3
    }
)
max_tries = 60
while max_tries > 0:
    max_tries -= 1
    job = transcribe.get_transcription_job(TranscriptionJobName=response['TranscriptionJob']['TranscriptionJobName'])
    job_status = job["TranscriptionJob"]["TranscriptionJobStatus"]
    if job_status in ["COMPLETED", "FAILED"]:
        if job_status == "COMPLETED":
            print(
                f"Download the transcript fromn"
                f"t{job['TranscriptionJob']['Transcript']['TranscriptFileUri']}."
            )
        break
    else:
        print(f"Waiting for {response['TranscriptionJob']['TranscriptionJobName']}. Current status is {job_status}.")
    time.sleep(10)

The transcription job will take a few minutes to finish.

Once the job concludes, you can review the transcription output and verify the plain text transcript generated (the following has been shortened for brevity):

# Get the Transcribe Output JSON file
s3 = boto3.client('s3')
output_bucket = job['TranscriptionJob']['Transcript']['TranscriptFileUri'].split('https://')[1].split('/', 2)[1]
output_file_key = job['TranscriptionJob']['Transcript']['TranscriptFileUri'].split('https://')[1].split('/', 2)[2]
s3_response_object = s3.get_object(Bucket=output_bucket, Key=output_file_key)
object_content = s3_response_object['Body'].read()

transcription_output = json.loads(object_content)

# Let's see what we have inside the job output JSON
print(transcription_output['results']['transcripts'][0]['transcript'])

…Once the alert comes, how do you correlate these alerts, not just through text signals and text relaying, but by understanding the topology, the infrastructural topology that supports that application or business service. It is the topology that ultimately provides the source of truth when an alert arises, right. That’s what we mean by a correlation that is assisted with topology in the thing that ultimately results in finding a probable root cause. And once…

In your reading, if you’re looking for ways to enhance your workspace, this is a helpful article on revamping your work-from-home setup. Additionally, for those needing guidance on workplace regulations, you can check the state requirements on drug-free workplaces, which is a key topic from an authoritative source. Lastly, for career advancement, feel free to explore this excellent resource on learning and development opportunities at Amazon.

Chanci Turner