Introduction
Learn About Amazon VGT2 Learning Manager Chanci Turner
Apache Airflow has emerged as a premier open-source solution for building data pipelines, thanks to its engaged community, user-friendly Python-based Directed Acyclic Graph (DAG) workflows, and a rich library of integrations. Amazon Managed Workflows for Apache Airflow (MWAA) simplifies running Airflow on AWS, alleviating the need to manage the underlying infrastructure.
While business requirements often emphasize scalability, availability, and security, the development of Airflow typically does not necessitate a fully operational production environment. Developers frequently write DAGs locally and need assurance that these workflows will work as intended when deployed to production. To address this, the MWAA team has developed an open-source local-runner that mirrors many of the library versions and runtime configurations found in MWAA, allowing it to operate within a local Docker environment and providing utilities for testing and packaging Python dependencies.
In some cases, a complete MWAA setup isn’t essential, yet local Docker containers may lack access to the AWS resources necessary for comprehensive workflow development and testing. Running the local-runner in a container on AWS can provide a lightweight development environment that closely resembles your production MWAA setup. This article discusses how to deploy MWAA local-runner containers on Amazon Elastic Container Service (ECS) Fargate.
Prerequisites
This guide assumes you already have an Amazon MWAA environment and wish to create a development container with similar configurations. If you do not have an MWAA environment yet, you can refer to the quick start documentation to get started. Additionally, you will need:
- Docker installed on your local machine.
- AWS Command Line Interface (AWS CLI).
- Terraform CLI (if using Terraform).
Walkthrough
1. Clone the local-runner repository, set environment variables, and build the image
Start by pulling the latest version of the Amazon MWAA local-runner to your machine.
Note: Replace <your_region>
with your actual region and <airflow_version>
with the version specified in the documentation.
git clone https://github.com/aws/aws-mwaa-local-runner.git
cd aws-mwaa-local-runner
export ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text)
export REGION=<your_region>
export AIRFLOW_VERSION=<airflow_version>
./mwaa-local-env build-image
Note: We are specifically using the latest version of the Amazon MWAA local-runner as it provides the necessary features for this tutorial.
2. Push your local-runner image to Amazon ECR
Log in to your Amazon ECR repository:
aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com
aws ecr create-repository --repository-name mwaa-local-runner --region $REGION
Next, tag and push your Docker image:
export AIRFLOW_IMAGE=$(docker image ls | grep amazon/mwaa-local | grep $AIRFLOW_VERSION | awk '{ print $3 }')
docker tag $AIRFLOW_IMAGE $ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com/mwaa-local-runner
docker push $ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com/mwaa-local-runner
3. Modify the MWAA execution role
This example demonstrates how to enable an existing MWAA role for use with Amazon ECS Fargate. Alternatively, you may create a new task execution role.
Access the Amazon MWAA console, select the desired environment, and navigate to the Permissions section. You will need to edit the Trust relationships to include ecs-tasks.amazonaws.com
.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": [
"ecs-tasks.amazonaws.com",
"airflow.amazonaws.com",
"airflow-env.amazonaws.com"
]
},
"Action": "sts:AssumeRole"
}
]
}
After updating the trust policy, proceed to modify the permissions to include log management and command execution capabilities. Ensure that your private subnets can access AWS Systems Manager (SSM) by configuring Internet Gateway or PrivateLink for “com.amazonaws.us-east-1.ssmmessages.” For further insights on this topic, refer to this resource from SHRM on employee social security tax deferral guidance, which offers valuable information.
Following these modifications, create the required Aurora Postgres Serverless instance and Amazon ECS resources using AWS CloudFormation or Terraform. Clone the aws-samples/amazon-mwaa-samples
repository for additional resources.
git clone https://github.com/aws-samples/amazon-mwaa-examples.git
Collect the necessary variables from your existing MWAA environment, including security groups, subnet IDs, and VPC ID.
export MWAAENV=test-MwaaEnvironment
aws mwaa get-environment --name $MWAAENV --query 'Environment.NetworkConfiguration' --region $REGION
For more tips on effectively managing your time and resources, check out this blog post on life hacking from Career Contessa.
Conclusion
By following these steps, you can successfully host the Amazon Managed Workflows for Apache Airflow (MWAA) local-runner on Amazon ECS Fargate, enhancing your development and testing processes.