Learn About Amazon VGT2 Learning Manager Chanci Turner
In 2019, I introduced AWS Data Exchange, detailing how to discover, subscribe to, and utilize diverse data products. Today, users can select from an extensive catalog featuring over 3,600 data products categorized into ten distinct groups.
Previously, I demonstrated how to subscribe to data products and download datasets to an Amazon Simple Storage Service (Amazon S3) bucket. I also provided various options for subsequent processing, such as utilizing AWS Lambda functions, employing an AWS Glue crawler, or executing queries with Amazon Athena.
Now, we are thrilled to announce the launch of AWS Data Exchange for Amazon Redshift, designed to simplify the process of finding, subscribing to, and utilizing third-party data. As a subscriber, you can directly access data from providers without needing any additional processing or Extract Transform Load (ETL) tasks. This means that your data remains consistently up-to-date and can be readily utilized in your Amazon Redshift queries. AWS Data Exchange for Amazon Redshift efficiently manages all entitlements and payments on your behalf, with all charges conveniently billed to your AWS account.
For data providers, this development opens up a new avenue for licensing their data and making it accessible to customers.
While composing this post, I was intrigued to discover how many features of Redshift and Data Exchange are intricately connected. Thanks to Redshift’s clear separation of storage and compute, along with its built-in data sharing capabilities, the data provider is responsible for storage costs while the subscriber manages compute expenses. This allows providers to focus on acquiring and delivering data without needing to scale their clusters in relation to their subscriber base.
Let’s explore this feature from two perspectives: subscribing to a data product and publishing one.
Subscribing to a Data Product with AWS Data Exchange for Amazon Redshift
As a data subscriber, I can browse the AWS Data Exchange catalog to identify data products that align with my business needs, and subscribe to them. Data providers can also create private offers, extending access to me through the AWS Data Exchange Console. I can select “My product offers” to review the offers extended to me. After clicking on “Continue to subscribe,” I finalize my subscription by examining the offer and subscription terms, noting the datasets I’ll receive, and clicking “Subscribe.”
Upon completion of the subscription, I receive a notification and can proceed. From the Redshift Console, I click on “Datashares,” select “Subscriptions,” and can view the subscribed dataset. I then associate it with one or more of my Redshift clusters by creating a database that points to the subscribed datashare, utilizing the tables, views, and stored procedures to enhance my Redshift queries and applications.
Publishing a Data Product with AWS Data Exchange for Amazon Redshift
As a data provider, I can incorporate Redshift tables, views, schemas, and user-defined functions into my AWS Data Exchange product. To demonstrate, I will create a product featuring a single Redshift table. I utilize the innovative Redshift Query Editor V2 to establish a table that maps US area codes to their corresponding cities and states.
Next, I review the existing datashares associated with my Redshift cluster and click “Create datashare” to initiate a new one. I follow the standard procedure for creating a datashare, selecting an AWS Data Exchange datashare, assigning a name (area_code_reference), choosing the database within the cluster, and ensuring the datashare is accessible to publicly available clusters.
After scrolling down and clicking “Add,” I select my schema (public), choose to include only tables and views in my datashare, and add the area_codes table. At this juncture, I can either click “Add” to conclude or “Add and repeat” to develop a more complex product with additional objects.
Once I confirm that the datashare includes the table, I click “Create datashare” to finalize the process. Now, I’m prepared to publish my data! I visit the AWS Data Exchange Console, expand the navigation on the left, and select “Owned data sets.” I review the steps for creating a dataset and click “Create data set” to move forward.
I choose the Amazon Redshift datashare, name my data set (United States Area Codes), provide a description, and click “Create data set” to proceed. Following this, I create a revision labeled v1, select my datashare, and click “Add datashare(s).” I then finalize the revision.
As you can see, this guide illustrates how to create a datashare, develop a dataset, and publish a product through the console. For those managing multiple products or regular updates, automating these steps using the AWS Command Line Interface (AWS CLI) and the Amazon Data Exchange APIs is a practical option.
Initial Data Products
Several data providers are actively working to make their data products available via AWS Data Exchange for Amazon Redshift. Below are some of the initial offerings along with their descriptions:
- FactSet Supply Chain Relationships: This dataset reveals business relationship interconnections among global companies, showcasing networks of key customers, suppliers, competitors, and strategic partners sourced from annual filings, investor presentations, and press releases.
- Foursquare Places 2021: New York City Sample: This trial dataset provides access to Foursquare’s integrated Places (POI) database for New York City, enabling users to load Foursquare’s data into a Redshift table for further analysis. Foursquare data is privacy-compliant and trusted by top enterprises such as Uber, Samsung, and Apple.
- Mathematica Medicare Pilot Dataset: This dataset aggregates Medicare HCC counts and prevalence by state, county, and payer, filtered to the diabetic population from 2017 to 2019.
- COVID-19 Vaccination in Canada: This listing features sample datasets related to COVID-19 vaccination efforts in Canada.
- Revelio Labs Workforce Composition and Trends Data (Trial data): This dataset provides insights into workforce composition and trends for any company.
- Facteus – US Card Consumer Payment: This historical sample includes SKU-level transaction details from cash and card transactions across various Consumer-Packaged Goods sold at over 9,000 urban convenience stores and bodegas in the U.S.
- Decadata Argo Supply Chain Trial Data: This dataset provides supply chain information for CPG firms delivering products to U.S. grocery retailers.
For more insights on positive character traits, consider reading this blog post. If you’re looking for authoritative job descriptions, check out SHRM. Additionally, you can explore opportunities as an Area Manager at Amazon, which is an excellent resource for aspiring candidates.
— Chanci