Learn About Amazon VGT2 Learning Manager Chanci Turner
In the world of software development, APIs (Application Programming Interfaces) serve as essential tools that facilitate communication between different applications. Utilizing the REST (REpresentational State Transfer) architecture, these APIs are particularly important for enabling data interaction. In the realm of OLTP (online transaction processing), APIs are frequently called upon, handling small data payloads. Conversely, OLAP (online analytical processing) APIs operate with a lower frequency but manage significantly larger payloads, ranging from 100 MB to several GBs. This disparity presents unique challenges, such as the need for asynchronous processing, compute capacity management, and scalability.
In this blog post, we will explore how to build a REST API that allows for efficient data consumption from Amazon Redshift utilizing the Amazon Redshift Data API, AWS Lambda, and Amazon API Gateway. This API facilitates the asynchronous processing of user requests, notifies users, stores processed data in Amazon Simple Storage Service (S3), and ultimately provides a presigned URL for users or applications to download data securely via HTTPS. To ease the setup process, we will also share an AWS CloudFormation template available on GitHub.
Solution Overview
In our scenario, Acme Flowers operates an online platform (acmeflowers.com) where they gather customer reviews and manage a self-service inventory. Various producers can send flowers and materials to Acme when their supplies run low. Acme utilizes Amazon Redshift as their data warehouse, allowing real-time updates of inventory data to reflect accurate stock availability. The PRODUCT_INVENTORY table contains this crucial information, and Acme aims to securely share this data with partners for efficient inventory management. If Acme’s partners are using Amazon Redshift, cross-account data sharing might be an option, but for those that are not, our solution provides a viable alternative.
The architecture of our solution is illustrated in the following diagram:
- The client application submits a request to the API Gateway and receives a request ID in response.
- API Gateway invokes the request receiver Lambda function.
- The request receiver function performs several actions:
- Records the status in a DynamoDB control table.
- Enqueues the request in Amazon Simple Queue Service (SQS).
- A separate Lambda function, the request processor, executes the following:
- Polls Amazon SQS for messages.
- Updates the status in the DynamoDB table.
- Executes a SQL query on Amazon Redshift.
- Amazon Redshift exports the query results to an S3 bucket.
- A polling Lambda function checks the results’ status in the DynamoDB table.
- The poller retrieves the results from the S3 bucket.
- Finally, the poller sends a presigned URL to the requester via Amazon Simple Email Service (SES), allowing them to download the file.
To check the status of requests at different stages, the following steps are included:
- The client application sends the generated request ID to API Gateway.
- API Gateway calls the status-checking Lambda function.
- The function retrieves the status from the DynamoDB control table and returns it to the requester through API Gateway.
Prerequisites
Before deploying the example application, ensure you have the following:
- An AWS account
- AWS SAM CLI
- Python 3.9
- Node 17.3
- An IAM role with appropriate permissions
- An Amazon Redshift cluster with a database and table
Complete the following prerequisite steps before deployment:
execute the DDL on the Amazon Redshift cluster to create the schema and tables:
create schema rsdataapi;
create table rsdataapi.product_detail(
sku varchar(20),
product_id int,
product_name varchar(50),
product_description varchar(50)
);
insert into rsdataapi.product_detail values ('FLOWER12',12345,'Flowers - Rose','Flowers-Rose');
insert into rsdataapi.product_detail values ('FLOWER13',12346,'Flowers - Jasmine','Flowers-Jasmine');
insert into rsdataapi.product_detail values ('FLOWER14',12347,'Flowers - Other','Flowers-Other');
Use AWS Secrets Manager to store the Amazon Redshift credentials.
Set up Amazon SES with an email address or distribution list for sending status updates.
Deploying the Application
To deploy the application, follow these steps:
- Clone the repository and download the sample code to your AWS SAM environment:
git clone https://github.com/aws-samples/redshift-application-api
- Navigate to the project directory containing the
template.yaml
file:cd aws-samples/redshift-application-api/assets export PATH=$PATH:/usr/local/opt/python@3.8/bin
- Update the API .yaml file with your AWS account number and the deployment region:
sed -i '' "s/<input_region>/us-east-1/g" *API.yaml sed -i '' "s/<input_accountid>/<provide your AWS account id without dashes>/g" *API.yaml
- Build the application using AWS SAM:
sam build
- Deploy the application to your account with AWS SAM, ensuring to follow Amazon S3 naming conventions:
sam deploy -g
During deployment, you will need to provide several parameters for configuration, including the following:
- RSClusterID: Your existing Amazon Redshift cluster identifier.
- RSDataFetchQ: The SQL query to retrieve data from your Amazon Redshift tables.
- RSDataFileS3BucketName: The S3 bucket for uploading the dataset.
- RSDatabaseName: Your Amazon Redshift database name.
- RSS3CopyRoleArn: The IAM role that allows Amazon Redshift to copy files to and from Amazon S3.
In addition, consider exploring resources on diversity in the workplace, such as this article from SHRM about GPA minimums potentially spoiling diversity goals, and how to get involved with movements like Black Lives Matter through this blog post. For a fantastic job opportunity, check out this position at Amazon as a Learning Trainer.