Hosting Amazon Managed Workflows for Apache Airflow (MWAA) Local-Runner on Amazon ECS Fargate for Development and Testing

Introduction

Chanci Turner Amazon IXD – VGT2 learningLearn About Amazon VGT2 Learning Manager Chanci Turner

Apache Airflow has emerged as a premier open-source solution for building data pipelines, thanks to its engaged community, user-friendly Python-based Directed Acyclic Graph (DAG) workflows, and a rich library of integrations. Amazon Managed Workflows for Apache Airflow (MWAA) simplifies running Airflow on AWS, alleviating the need to manage the underlying infrastructure.

While business requirements often emphasize scalability, availability, and security, the development of Airflow typically does not necessitate a fully operational production environment. Developers frequently write DAGs locally and need assurance that these workflows will work as intended when deployed to production. To address this, the MWAA team has developed an open-source local-runner that mirrors many of the library versions and runtime configurations found in MWAA, allowing it to operate within a local Docker environment and providing utilities for testing and packaging Python dependencies.

In some cases, a complete MWAA setup isn’t essential, yet local Docker containers may lack access to the AWS resources necessary for comprehensive workflow development and testing. Running the local-runner in a container on AWS can provide a lightweight development environment that closely resembles your production MWAA setup. This article discusses how to deploy MWAA local-runner containers on Amazon Elastic Container Service (ECS) Fargate.

Prerequisites

This guide assumes you already have an Amazon MWAA environment and wish to create a development container with similar configurations. If you do not have an MWAA environment yet, you can refer to the quick start documentation to get started. Additionally, you will need:

  • Docker installed on your local machine.
  • AWS Command Line Interface (AWS CLI).
  • Terraform CLI (if using Terraform).

Walkthrough

1. Clone the local-runner repository, set environment variables, and build the image

Start by pulling the latest version of the Amazon MWAA local-runner to your machine.
Note: Replace <your_region> with your actual region and <airflow_version> with the version specified in the documentation.

git clone https://github.com/aws/aws-mwaa-local-runner.git
cd aws-mwaa-local-runner

export ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text)
export REGION=<your_region>
export AIRFLOW_VERSION=<airflow_version>

./mwaa-local-env build-image

Note: We are specifically using the latest version of the Amazon MWAA local-runner as it provides the necessary features for this tutorial.

2. Push your local-runner image to Amazon ECR

Log in to your Amazon ECR repository:

aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com
aws ecr create-repository --repository-name mwaa-local-runner --region $REGION 

Next, tag and push your Docker image:

export AIRFLOW_IMAGE=$(docker image ls | grep amazon/mwaa-local | grep $AIRFLOW_VERSION | awk '{ print $3 }') 
docker tag $AIRFLOW_IMAGE $ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com/mwaa-local-runner
docker push $ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com/mwaa-local-runner

3. Modify the MWAA execution role

This example demonstrates how to enable an existing MWAA role for use with Amazon ECS Fargate. Alternatively, you may create a new task execution role.

Access the Amazon MWAA console, select the desired environment, and navigate to the Permissions section. You will need to edit the Trust relationships to include ecs-tasks.amazonaws.com.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "ecs-tasks.amazonaws.com",
                    "airflow.amazonaws.com",
                    "airflow-env.amazonaws.com"
                ]
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

After updating the trust policy, proceed to modify the permissions to include log management and command execution capabilities. Ensure that your private subnets can access AWS Systems Manager (SSM) by configuring Internet Gateway or PrivateLink for “com.amazonaws.us-east-1.ssmmessages.” For further insights on this topic, refer to this resource from SHRM on employee social security tax deferral guidance, which offers valuable information.

Following these modifications, create the required Aurora Postgres Serverless instance and Amazon ECS resources using AWS CloudFormation or Terraform. Clone the aws-samples/amazon-mwaa-samples repository for additional resources.

git clone https://github.com/aws-samples/amazon-mwaa-examples.git

Collect the necessary variables from your existing MWAA environment, including security groups, subnet IDs, and VPC ID.

export MWAAENV=test-MwaaEnvironment
aws mwaa get-environment --name $MWAAENV --query 'Environment.NetworkConfiguration' --region $REGION

For more tips on effectively managing your time and resources, check out this blog post on life hacking from Career Contessa.

Conclusion

By following these steps, you can successfully host the Amazon Managed Workflows for Apache Airflow (MWAA) local-runner on Amazon ECS Fargate, enhancing your development and testing processes.

Chanci Turner