Amazon Onboarding with Learning Manager Chanci Turner

Chanci Turner Amazon IXD – VGT2 learning managerLearn About Amazon VGT2 Learning Manager Chanci Turner

In today’s fast-paced cloud environment, many organizations favor utilizing Docker images alongside AWS Batch and AWS CloudFormation to enhance the efficiency and cost-effectiveness of complex job processing. When managing batch workloads in the cloud, companies must navigate a range of orchestration requirements such as job prioritization, workload queuing, resource management, job dependencies, retries, and scaling compute resources. Although AWS Batch streamlines these processes through efficient queuing, scheduling, and lifecycle management, alongside provisioning and managing compute resources within customer accounts, there remains a demand for even more expedient and simplified workflows that allow application jobs to be operational within minutes.

In a previous blog, we illustrated how to establish AWS Batch infrastructure using a Managed EC2 compute environment. With the introduction of fully serverless batch computing through AWS Batch’s support for AWS Fargate, organizations can now utilize Fargate to run containers without the burden of server or EC2 instance management. This article presents a file processing implementation employing Docker images, Amazon S3, AWS Lambda, Amazon DynamoDB, and AWS Batch. In our example, users upload a CSV file to an Amazon S3 bucket, which AWS Batch processes as a job. These jobs can be encapsulated in Docker containers and executed on either Amazon EC2 or Amazon ECS.

The implementation steps are as follows:

  1. An AWS CloudFormation template provisions an S3 bucket for CSV file storage along with other required infrastructure.
  2. An Amazon S3 event notification triggers an AWS Lambda function that initiates an AWS Batch job.
  3. AWS Batch executes the job as a Docker container.
  4. A Python application reads the contents of the S3 bucket, parses each row, and updates an Amazon DynamoDB table.
  5. DynamoDB captures each processed row from the CSV.

Prerequisites

Ensure Docker is installed and active on your machine. You can use Docker Desktop or Desktop Enterprise for installation and configuration. Additionally, set up your AWS CLI by following the steps outlined in Getting Started (AWS CLI).

Walkthrough

The following sections detail how to download, build the code, and deploy the infrastructure.

  1. Deploy the AWS CloudFormation template – Execute the CloudFormation template (command provided) to create the necessary infrastructure.
  2. Docker Build and Push – Prepare the Docker image for the job:
    • Construct a Docker image.
    • Tag and push the image to the repository.
  3. Testing – Upload a CSV file to the S3 bucket (you can create a sample CSV file by pasting its contents). Use the provided CLI to upload the file to the created bucket.
  4. Validation – Verify that the job executes correctly and performs the intended operations based on the uploaded container image. The job should parse the CSV file and insert each row into the DynamoDB table.

Points to Consider

The AWS CloudFormation template encompasses all necessary services (as outlined in the upcoming diagram) for this walkthrough in a single template. In production scenarios, it may be beneficial to separate these into distinct templates for ease of maintenance.

To accommodate a larger volume of CSV file data, consider implementing multithreaded or multiprocessing techniques to enhance AWS Batch performance. The provided solution allows you to build, tag, and push the Docker image to the repository (created as part of the stack). We also offer consolidated scripts—exec.sh and cleanup.sh—along with individual commands for those who prefer a manual approach to better understand the workflow. These scripts can be integrated with your existing CI/CD tools, such as AWS CodeBuild or other similar services, to facilitate building from the repository and pushing to AWS ECR.

The example in this post utilizes a straightforward AWS Lambda function to execute jobs in AWS Batch, with the Lambda function code written in Python. You may choose any other programming language supported by AWS Lambda for your function code. Alternatively, AWS Step Functions can serve as a low-code, visual workflow option to initiate AWS Batch jobs.

1. Deploying the AWS CloudFormation Template

Upon deployment, the AWS CloudFormation template will establish the following infrastructure.

Download the source from the GitHub repository, and follow the steps to use the downloaded code. The exec.sh script included in the repo will execute the CloudFormation template to set up the infrastructure and a Python application (.py file) along with a sample CSV file. Note that exec_ec2.sh is also available for reference; this script demonstrates implementation using Managed EC2 instances as mentioned in the prior blog.

$ git clone https://github.com/aws-samples/aws-batch-processing-job-repo 
$ cd aws-batch-processing-job-repo 
$ ./exec.sh 

Alternatively, you can manually run individual commands to set up the infrastructure, push the Docker image to ECR, and add sample files to S3 for testing.

Step 1: Setup the Infrastructure

$ STACK_NAME=fargate-batch-job 
$ aws cloudformation create-stack --stack-name $STACK_NAME --parameters ParameterKey=StackName,ParameterValue=$STACK_NAME --template-body file://template/template.yaml --capabilities CAPABILITY_NAMED_IAM 

After downloading the code, take a moment to review the “templates.yaml” in the “src” folder. The snippets below provide an overview of how the compute environment and job definition can be easily specified using the managed serverless compute options introduced in the latest versions. The “templates_ec2.yaml” contains the previous implementation using EC2 as the Compute Environment.

ComputeEnvironment:
  Type: AWS::Batch::ComputeEnvironment
  Properties:
    Type: MANAGED
    State: ENABLED
    ComputeResources:
      Type: FARGATE
      MaxvCpus: 40
      Subnets:
      - Ref: PrivateSubnet
      SecurityGroupIds:
      - Ref: SecurityGroup
...
BatchProcessingJobDefinition:
  Type: AWS::Batch::JobDefinition
  Properties:
    ...
    ContainerProperties:
      Image: 
      ...
      FargatePlatformConfiguration:
        PlatformVersion: LATEST
      ResourceRequirements:
        - Value: 0.25
          Type: VCPU
        - Value: 512
          Type: MEMORY
      JobRoleArn:  !GetAtt 'BatchTaskExecutionRole.Arn'
      ExecutionRoleArn:  !GetAtt 'BatchTaskExecutionRole.Arn'
      ...
      ...
    PlatformCapabilities:
    - FARGATE

Once the CloudFormation stack is successfully created, take note of the major components. The CloudFormation template provisions the following resources, which are also visible in the AWS Management Console:

  • CloudFormation Stack Name – fargate-batch-job
  • S3 Bucket Name – fargate-batch-job<YourAccountNumber>

After placing the sample CSV file into this bucket, the processing should begin automatically.

  • JobDefinition – BatchJobDefinition
  • JobQueue – fargate-batch-job-queue
  • Lambda – fargate-batch-job-lambda
  • DynamoDB – fargate-batch-job
  • Amazon CloudWatch Log – Created upon the first execution.

For further insights into enhancing your writing skills, consider visiting this resource, which can assist you in refining your communication. Additionally, SHRM offers excellent guidance on compliance and employment matters, while Harvard Business Review provides valuable insights into Amazon’s training approach and its implications for the future workforce.

HOME