Learn About Amazon VGT2 Learning Manager Chanci Turner
Amazon Managed Workflows for Apache Airflow (Amazon MWAA) is a fully managed service that allows users to operate within a familiar Apache Airflow environment while benefiting from enhanced scalability, availability, and security, all without the operational complexities of managing the underlying infrastructure.
As of April 2023, Amazon MWAA has introduced support for shell launch scripts for environment versions of Apache Airflow 2.x and upwards. This new feature enables users to tailor their Apache Airflow environment by executing a custom shell launch script during startup. This customization is particularly beneficial for integrating with existing infrastructure and meeting compliance requirements. The shell launch script can be utilized to install specific Linux runtimes, configure environment variables, and modify configuration files. This script is executed at startup across all individual Apache Airflow components—workers, schedulers, and web servers—prior to the installation of requirements and the initialization of the Apache Airflow process.
In this article, we will outline the features of this new addition, discuss relevant use cases, detail the steps for implementation, and provide further insights into the capabilities of the shell launch script.
Overview of the Solution
To facilitate the operation of Apache Airflow, Amazon MWAA utilizes Amazon Elastic Container Registry (Amazon ECR) images that package Apache Airflow releases along with commonly used binaries and Python libraries. These images are then deployed by AWS Fargate containers within the Amazon MWAA environment. Additional libraries can be incorporated through the requirements.txt
and plugins.zip
files, with paths from Amazon Simple Storage Service (Amazon S3) being provided as parameters during environment creation or updates.
However, the existing package installation methods did not sufficiently cover all use cases for customizing Apache Airflow environments. Customers expressed the need for an approach that allowed for the specification of custom libraries, runtimes, and supported files within the Apache Airflow container images.
Relevant Use Cases
The newly introduced feature provides the ability to personalize your Apache Airflow image by executing a specified shell launch script at startup. This shell launch script can perform tasks such as:
- Installing runtimes: You can install or update necessary Linux runtimes for your workflows and connections. For example, installing
libaio
as a custom library for Oracle. - Configuring environment variables: Set environment variables for the Apache Airflow scheduler, web server, and worker components. You have the option to overwrite standard variables like
PATH
,PYTHONPATH
, andLD_LIBRARY_PATH
. For instance, you can setLD_LIBRARY_PATH
to direct Python to search for binaries in specified paths. - Managing keys and tokens: Input access tokens for private PyPI/PEP-503 compliant repositories into
requirements.txt
and set up security keys.
Operational Mechanics
The shell script runs Bash commands at startup, allowing installation via tools like yum
, similar to the user data and shell script support provided by Amazon Elastic Compute Cloud (Amazon EC2). You can create a custom shell script with a .sh
extension and store it in the same S3 bucket that contains requirements.txt
and plugins.zip
. An S3 file version of the shell script can be specified during the environment creation or update through the Amazon MWAA console, API, or AWS Command Line Interface (AWS CLI). For detailed instructions on configuring the startup script, refer to Using a startup script with Amazon MWAA.
During the environment creation or update, Amazon MWAA transfers the plugins.zip
, requirements.txt
, shell script, and your Apache Airflow Directed Acyclic Graphs (DAGs) to the container images on the underlying Amazon Elastic Container Service (Amazon ECS) Fargate clusters. The Amazon MWAA instance extracts these files and executes the designated startup script. This script runs from the /usr/local/airflow/startup
directory as the airflow user. Upon completion, the setup process installs the requirements.txt
and plugins.zip
files, followed by the initiation of the Apache Airflow process linked to the container.
For monitoring, you can access the output of the script in your Amazon MWAA environment’s Amazon CloudWatch log groups. To view logs, ensure logging is enabled for the log group. When activated, Amazon MWAA generates a new log stream prefixed with startup_script_execution_ip
. You can retrieve these log events to confirm the script’s success.
Additionally, you can utilize the Amazon MWAA local-runner to test this feature within your local development environments. You can specify your custom startup script in the startup_script
directory of the local-runner. It is advisable to test your script locally before implementing changes to your Amazon MWAA setup.
Your startup script can reference files contained within plugins.zip
or your DAGs folder. This is particularly useful when you need to install Linux runtimes on a private web server from a local package or for skipping Python library installations on servers lacking access, whether due to private web server configurations or libraries only accessible from your VPC.
#!/bin/sh
export ENVIRONMENT_STAGE="development"
echo "$ENVIRONMENT_STAGE"
if ["${MWAA_AIRFLOW_COMPONENT} != "webserver"]
then
pip3 install -r /usr/local/airflow/dags/requirements.txt
fi
The MWAA_AIRFLOW_COMPONENT
variable in the script identifies each component of Apache Airflow, including the scheduler, web server, and worker.
Additional Considerations
Here are some important points regarding this feature:
- Specifying a startup shell script is optional. You can choose a specific S3 file version of your script.
- Updating the startup script for an existing Amazon MWAA environment will trigger a restart. The startup script is executed as each component restarts, and updates may take 10-30 minutes. We recommend using the Amazon MWAA local-runner for testing to streamline the feedback loop.
- You can make various adjustments to the Apache Airflow environment, such as setting non-reserved
AIRFLOW__
environment variables and installing custom Python libraries. For a comprehensive list of reserved and unreserved environment variables that can be set or modified, refer to Set environment variables using a startup script. - Upgrading core libraries, dependencies, or Python versions in Apache Airflow is not permissible due to constraints within the base Apache Airflow configuration in Amazon MWAA, which may lead to version incompatibility. Amazon MWAA performs validations before executing your custom startup script to prevent incompatible installations.
- If a failure occurs during the startup script execution, it can result in unsuccessful stabilization of the underlying Amazon ECS Fargate containers, affecting your Amazon MWAA environment’s ability to create or update successfully.
- The runtime for the startup script is limited to 5 minutes, after which it will time out automatically.
- To revert a failing startup script or one that is no longer needed, edit your Amazon MWAA environment to reference a blank
.sh
file.
Conclusion
In this article, we discussed the newly introduced feature of Amazon MWAA that allows you to configure a startup shell launch script. This capability enhances customization for your Apache Airflow environment, aligning with the growing need for tailored workflow solutions. For further reading on the impact of work-life balance, you might be interested in this blog post about the four-day work week. Additionally, if you’re curious about employee wellness initiatives, check out these insights from a leading authority on the topic. If you’re looking for more information about the experience of working in an Amazon warehouse, this resource offers an excellent overview.