Learn About Amazon VGT2 Learning Manager Chanci Turner
On January 20, we launched a feature that empowers developers to utilize their credentials for comprehensive API access to the sync store, enabling them to read and write user profile data, along with a data browser within the Amazon Cognito console. Today, we are thrilled to introduce an enhancement that offers customers even more control and visibility over their data stored in Cognito: Amazon Cognito streams. Customers can now set up an Amazon Kinesis stream to receive events as data is updated and synchronized. In this article, I will detail how this feature operates and present an example application that utilizes events from the stream to visualize your application data in Amazon Redshift.
Configuring Streams
Setting up Amazon Cognito streams is a breeze. Simply navigate to the console, select your identity pool, and click on “Edit Identity Pool.”
On the edit screen, expand the “Cognito Streams” section to configure your settings. You will need to provide an IAM role and a Kinesis stream; however, the Cognito console can guide you through the creation of both resources.
Once you have successfully configured Amazon Cognito streams, all future updates to datasets in this identity pool will be transmitted to the stream.
Stream Contents
Every record sent to the stream represents a single synchronization. Here’s a sample record that might be sent to the stream:
{
"identityPoolId": "Pool Id",
"identityId": "Identity Id",
"dataSetName": "Dataset Name",
"operation": "(replace|remove)",
"kinesisSyncRecords": [
{
"key": "Key",
"value": "Value",
"syncCount": 1,
"lastModifiedDate": 1424801824343,
"deviceLastModifiedDate": 1424801824343,
"op": "(replace|remove)"
},
...
],
"lastModifiedDate": 1424801824343,
"kinesisSyncRecordsURL": "S3Url",
"payloadType": "(S3Url|Inline)",
"syncCount": 1
}
For updates exceeding the Kinesis maximum payload size of 50 KB, a presigned Amazon S3 URL will be included to contain the full contents of the update.
Now that your data updates will be streaming, what about your existing data?
Bulk Publishing
After configuring Amazon Cognito streams, you can perform a bulk publish operation for the existing data in your identity pool. Once you trigger a bulk publish operation, either through the console or directly via the API, Cognito will begin publishing this data to the same stream that is receiving your updates.
You can only have one ongoing bulk publish operation at a time and one successful bulk publish request every 24 hours. Cognito does not guarantee the uniqueness of data sent to the stream during the bulk publish operation. Thus, you may receive the same update both as a regular update and part of a bulk publish, which you should consider when processing records from your stream.
Example Streams Connector for Amazon Redshift
As part of today’s announcement, we are also providing an example application that consumes records from a Kinesis stream associated with a Cognito identity pool, then stores them in an Amazon Redshift cluster for querying. The source code is available in our awslabs GitHub repository, and we have also provided an AWS CloudFormation template that will create all necessary assets for this sample, including:
- Amazon Redshift cluster
- Amazon DynamoDB table used by the Kinesis client library
- Amazon S3 bucket for intermediate data staging
- IAM role for EC2
- Elastic Beanstalk application to run the code
Click the button below to launch this stack in the US East (Virginia) region:
Click the button below to launch this stack in the EU (Ireland) region:
Click the button below to launch this stack in the Asia Pacific (Tokyo) region:
Once your stack has been created, the output tab in the CloudFormation console will contain a JDBC connection string you can use to connect directly to your Amazon Redshift cluster:
jdbc:postgresql://amazoncognitostreamssample-redshiftcluster-xxxxxxxx.xxxxxxxx.REGION.redshift.amazonaws.com:PORT/cognito?tcpKeepAlive=true
Schema
The example stores all event data in a table named cognito_raw_data
with the following schema:
Column Name | Type |
---|---|
identityPoolId | varchar(1024) |
identityId | varchar(1024) |
datasetName | varchar(1024) |
operation | varchar(64) |
key | varchar(1024) |
value | varchar(1024) |
op | varchar(64) |
syncCount | int |
deviceLastModifiedDate | timestamp |
lastModifiedDate | timestamp |
Extracting Data
Since every key-value update will create a new row in the cognito_raw_data
table, obtaining the current state of a dataset requires some additional effort. The following query will fetch the state of a specific dataset for a given user:
SELECT distinct temp.*, value FROM
(select distinct identityid,
datasetname,
key,
max(synccount) over (partition by identityid, datasetname, key) as max_synccount
FROM cognito_raw_data) as temp
INNER JOIN cognito_raw_data raw_data ON
(temp.identityid = raw_data.identityid and temp.datasetname = raw_data.datasetname and temp.key = raw_data.key and temp.max_synccount = raw_data.synccount)
WHERE raw_data.identityid = 'IDENTITY_ID'
AND raw_data.datasetname = 'DATASET_NAME'
AND op <> 'remove'
ORDER by datasetname, key
You may also want to establish daily extracts of the data to enhance the efficiency of your regular queries.
Conclusions
As demonstrated, Amazon Cognito streams can provide a complete export of your data as well as a real-time view of how your data evolves over time. We are eager to hear how you plan to leverage this feature in your applications. Please feel free to share your thoughts in the comments. For further reading on allyship, check out this blog post on white women allyship. If you have questions or encounter any issues, please visit our forums; we’re here to help. For insights on salary surveys, you might also explore SHRM’s resources.
For more information on the Amazon employee onboarding process, see this excellent resource.