Streamlining Media Asset Analysis with FFprobe on AWS Lambda

Overview

Chanci Turner Amazon IXD – VGT2 learning

Managing a rapidly evolving film library can be quite challenging, especially when numerous assets are added in a short timeframe. Utilizing a NoSQL database for centralized access to asset information such as title, location, and metadata can significantly enhance efficiency. Amazon DynamoDB, a fully managed, serverless key-value NoSQL database designed to operate high-performance applications at scale, is an ideal solution for this scenario.

To automate the analysis of each asset, this guide outlines the steps to invoke the automatic extraction of media asset metadata using ffprobe (part of the FFmpeg project) alongside various AWS services:

Amazon DynamoDB to store asset details
AWS Lambda—a serverless, event-driven compute service—for executing ffprobe on the media file and updating the Amazon DynamoDB record
Amazon S3—an object storage service known for its scalability and performance—for storing the asset files

Each asset detail, including identifier (ID), title, and location, is recorded in an Amazon DynamoDB table. The addition of a new asset to the table triggers a Lambda function that reads the media file. This process is referred to as the “analysis,” and the results are stored in the DynamoDB table. The Lambda function operates using the Python 3.8 runtime.

IMPORTANT LEGAL NOTICE: Before proceeding, ensure you understand the licensing terms of FFmpeg and the legal considerations as outlined here. Additionally, the FFmpeg static build utilized in this demonstration is licensed under the GNU General Public License version 3 (GPLv3), as indicated here.

Prerequisites

To follow this guide, you will need access to the following:

A Linux system for executing commands (Shell and Python)
AWS Lambda
Amazon DynamoDB
Amazon S3
AWS Identity and Access Management (AWS IAM) for securely managing access to AWS services and resources

Getting Started

FFmpeg

First, download the FFmpeg project, opting for a static build to ensure all libraries are included. Create a ZIP file containing the ffprobe binary with the following commands:

wget https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz
wget https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz.md5
md5sum -c ffmpeg-release-amd64-static.tar.xz.md5 && 
mkdir ffmpeg-release-amd64-static
tar xvf ffmpeg-release-amd64-static.tar.xz -C ffmpeg-release-amd64-static
mkdir -p ffprobe/bin
cp ffmpeg-release-amd64-static/*/ffprobe ffprobe/bin/
cd ffprobe
zip -9 -r ../ffprobe.zip .

This process can be conducted on a local Linux system or an Amazon Elastic Compute Cloud (Amazon EC2) instance, which provides secure, resizable compute capacity in the cloud. The resulting ZIP archive will be used to create an AWS Lambda layer.

AWS IAM

An AWS Identity and Access Management (IAM) user is required to insert asset movie IDs and titles, Amazon S3 bucket, and object names into Amazon DynamoDB through the Lambda function. To add an entry to Amazon DynamoDB, consider the following example.

The Lambda function must have permissions to manage resources associated with the Amazon DynamoDB stream, create entries on Amazon CloudWatch (a monitoring and observability service), and access Amazon S3 objects. If you need help creating policies or roles, refer to this documentation:

Add the following permissions to your function’s execution role (referred to as “lambda_media_ddb” in this scenario):

dynamodb:DescribeStream
dynamodb:GetRecords
dynamodb:GetShardIterator
dynamodb:ListStreams
dynamodb:UpdateItem
cloudwatch:CreateLogGroup
cloudwatch:CreateLogStream
cloudwatch:PutLogEvents
s3:GetObject

Amazon DynamoDB

Create an Amazon DynamoDB table named “my_movies.” Choose a primary key; in this example, we’ll use “movie_id” with a type set to Number.

Amazon DynamoDB streams capture a time-ordered sequence of item-level modifications in any DynamoDB table. Applications can access this log in near real-time to view data items as they were before and after being modified. We will use this log to trigger our Lambda function, utilizing the log data as input.

Enable the stream on the table, selecting the type “New and old images” in the stream management settings.

From the DynamoDB stream details section, copy the latest stream ARN (Amazon Resource Name), which is necessary to trigger the Lambda function. After uploading an asset to your Amazon S3 bucket, manually insert a new row in your DynamoDB table, my_movies, following the provided example.

Lambda Setup

Create a Lambda layer and import ffprobe.zip into it, following the setup instructions. Then, create a Lambda function using Python 3.8.

For our test asset, which is 34 MB, a large amount of memory isn’t necessary; however, this might be required for larger files. Attach the appropriate role (lambda_media_ddb).

Once the function is created, in the Designer section, follow these steps:

Click on Layers.
Click Add a layer.
Choose Custom layers, select ffprobe and the correct version, and click Add.
In Designer, click Add trigger.
Select DynamoDB, paste the latest stream ARN, and validate by clicking Add.

The designer setup is now complete.

In the Edit basic settings menu, specify a timeout that allows sufficient time for ffprobe to retrieve the file, analyze it, and update the DynamoDB table entry. For files smaller than 790 MB, set the timeout to 1 second and memory to 200 MB. For larger files, ensure these values meet your needs.

Note that our tests (which included files under 790 MB) took less than 1 second and required less than 200 MB of memory. Additional insights on memory and duration considerations can be found later in this post.

Now, simply copy and paste the following Python code into your Lambda function and deploy it:

import json
import subprocess
import boto3

SIGNED_URL_TIMEOUT = 60

def lambda_handler(event, context):
    error = False
    s3_sign_url_returns = list()
    ddb_insert_returns = list()
    ffprobe_returns = list()
    dynamodb_client = boto3.client('dynamodb')
    steps_messages = dict()
    
    for record in event['Records']:
        if not record['eventName'] == 'INSERT':
            print('Not an insert, skipping')
            continue
        movie_id = record['dynamodb']['NewImage']['movie_id']['N']
        s3_source_bucket = record['dynamodb']['NewImage']['S3_bucket']['S']
        s3_source_key = record['dynamodb']['NewImage']['S3_object']['S']
        s3_client

This guide provides a comprehensive approach to analyzing media files using ffprobe in AWS Lambda, and with the right tools and setup, you can streamline your media asset management effectively.