Managing Your AWS Glue Studio Development Interface with the AWS Glue Job Mode API Property

Chanci Turner Amazon IXD – VGT2 learningLearn About Amazon VGT2 Learning Manager Chanci Turner

As the significance of big data continues to rise, the effectiveness of data processing and analysis emerges as a critical factor in a company’s competitive edge. AWS Glue, a serverless data integration service that facilitates the integration of data from various sources at scale, meets these processing demands. Among its many features, the AWS Glue Jobs API is particularly remarkable.

The AWS Glue Jobs API offers a comprehensive interface for data engineers and developers to manage and execute ETL jobs programmatically. This API enables the automation, scheduling, and monitoring of data pipelines, which streamlines the execution of extensive data processing tasks.

To enhance the user experience with the AWS Glue Jobs API, we have introduced a new property that specifies the job mode—whether it is a script, visual, or notebook. This article delves into the workings of the updated AWS Glue Jobs API and showcases the new functionalities it offers.

JobMode Property

The newly added JobMode property indicates the mode of AWS Glue jobs (script, visual, or notebook), thus improving your user interface experience. Users of AWS Glue can select the mode that best suits their preferences. Some ETL developers favor visual mode to create jobs using the AWS Glue Studio visual editor, while data scientists might lean towards notebook jobs in AWS Glue Studio notebooks. On the other hand, data engineers and developers often prefer scripting through the AWS Glue Studio script editor or their chosen integrated development environment (IDE). Once the job is established in the selected mode, you can filter and easily locate it within your saved AWS Glue jobs page. Furthermore, if you’re transitioning existing iPython notebook files to AWS Glue Studio notebook jobs, you can now set the job mode for multiple jobs using this new API property, as illustrated in this post.

How the CreateJob API Works with the JobMode Property

You can utilize the CreateJob API to generate AWS Glue jobs in script, visual, or notebook formats. Below is an example of how to create a visual job using the AWS SDK for Python (Boto3). Make sure to replace <your-bucket-name> with your actual S3 bucket name.

CODE_GEN_JSON_STR = '''
{
  "node-1": {
    "S3ParquetSource": {
      "Name": "Amazon S3",
      "Paths": [
        "s3://aws-bigdata-blog/generated_synthetic_reviews/data/product_category=Books/"
      ],
      ...
    }
  },
  ...
}
'''

glue_client = boto3.client('glue')
codeGenJson = json.loads(constants.CODE_GEN_JSON_STR, strict=False)

# Call the create_job method
try:
    glue_client.create_job(
        Name="glue-visual-job",
        Description="Glue Visual ETL job",
        Command={'Name': 'glueetl', 'ScriptLocation': "s3://aws-glue-assets-<account-id>-<region>/scripts/glue-visual-job", 'PythonVersion': "3"},
        WorkerType=constants.WORKERTYPE,
        NumberOfWorkers="G.1X",
        Role=<role-arn>,  
        GlueVersion="4.0",        
        CodeGenConfigurationNodes=codeGenJson,
        JobMode="VISUAL"
    )
    print("Successfully created Glue job")
except Exception as e:
    print(f"Error creating Glue job: {str(e)}") 

The CODE_GEN_JSON_STR represents the visual nodes for the AWS Glue Job. There are three nodes: node-1 uses S3 source, node-2 performs transformation, and node-3 utilizes the S3 target. The script creates an AWS Glue Boto3 client, loads the JSON, and calls the create_job method with JobMode set to VISUAL.

After executing the Python script, a new job is established. The following screenshot illustrates how the created job appears in the AWS Glue visual editor.

There are three nodes in the visual directed acyclic graph (DAG): node-1 sources product review data for the product_category book from the public S3 bucket, node-2 eliminates unnecessary fields for downstream systems, and node-3 saves the transformed data into a local S3 bucket.

How CloudFormation Works with the JobMode Property

AWS CloudFormation can also be utilized to create different types of AWS Glue jobs by specifying the JobMode parameter with the AWS::Glue::Job resource. The available job modes include:

  • SCRIPT
  • VISUAL
  • NOTEBOOK

For instance, you can create an AWS Glue notebook job using AWS CloudFormation by setting the JobMode parameter to NOTEBOOK.

First, create a Jupyter Notebook file containing your logic and code, saving it with a descriptive name, such as my-glue-notebook.ipynb. You can download the notebook file and rename it to my-glue-notebook.ipynb. Next, upload the notebook file to the notebooks/ folder in the aws-glue-assets-<account-id>-<region> S3 bucket.

Then, create a new CloudFormation template to generate a new AWS Glue job, specifying the NotebookJobName parameter to match the name of the Notebook file. Below is a sample excerpt of the CloudFormation template:

AWSTemplateFormatVersion: '2010-09-09'
Description: CloudFormation template for creating an AWS Glue ETL job using a Jupyter Notebook

Parameters:
  NotebookJobName:
    Type: String
    Description: Name of the AWS Glue ETL Notebook job

Resources:
  GlueJobRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: !Sub ${AWS::StackName}-GlueJobRole
      ...

Whether you are crafting a job using the visual editor or transitioning notebook files, leveraging the new JobMode property can streamline your AWS Glue Studio development experience. If you’re looking to redirect your career path, check out this insightful blog post for some guidance. Also, for detailed information on FMLA coordination, refer to this authoritative source. Lastly, if you’re seeking excellent resources for job onboarding at Amazon, consider this link for more guidance.

HOME