Amazon Onboarding with Learning Manager Chanci Turner

Chanci Turner Amazon IXD – VGT2 learningLearn About Amazon VGT2 Learning Manager Chanci Turner

In this article, we explore how Amazon Athena facilitates the analysis of Amazon Ion datasets, a powerful data serialization format. Athena is an interactive query service that allows users to analyze data stored in Amazon Simple Storage Service (S3) using standard SQL queries. Its serverless nature means there’s no infrastructure to manage, and you only pay for the queries you execute.

Amazon Ion is a sophisticated, self-describing data format that supports both binary and textual representations. It extends JSON, making all JSON files valid Ion files. The text format is easy to read and write, which is beneficial for rapid prototyping. The binary format, on the other hand, is optimized for efficient storage and transmission, offering rich type semantics that ensure data longevity through years of software evolution.

With the recent support for Ion format in Athena, users can query and write data more efficiently. This format has gained traction among Amazon’s internal teams and external services like Amazon Quantum Ledger Database (Amazon QLDB) and Amazon DynamoDB, which can export data into Ion. In this post, we will highlight the unique features of Ion, its use cases, and provide examples of how to query Ion using Athena, specifically using the transformed City Lots San Francisco dataset.

Unique Features of Ion

  1. Type System: Ion enhances JSON by introducing additional data types that improve data interpretation and processing. This is particularly vital in sectors like finance, where precision is crucial. New types include arbitrary-size integers, high-precision decimals, timestamps, and more.
  2. Dual Format: Ion provides a text-based representation that is familiar to users while also offering the performance efficiencies of a binary format. This duality allows for rapid data discovery and interpretation, and applications benefit from reduced storage and network costs.
  3. Efficiency Gains: The binary encoding of Ion minimizes file sizes by consolidating repeated values into a symbol table. This approach has yielded significant performance improvements; for instance, compressing Ion logs showed approximately 26% smaller files compared to traditional JSON logs.
  4. Skip-Scanning: Unlike traditional formats, Ion’s binary representation allows applications to bypass unnecessary elements during queries, reducing processing costs and improving response times.

Querying Ion Datasets with Athena

Athena’s support for Ion-format datasets includes an Ion-specific SerDe, which allows users to read and write valid Ion data. This flexibility enables users to run SELECT queries to extract insights from Ion datasets. You can also create new datasets in Ion format using CTAS or INSERT INTO queries.

The interchangeable nature of Ion text and binary formats means that Athena can handle datasets containing both types. Since Ion is a superset of JSON, tables defined with the Ion SerDe can also include JSON files. This is advantageous as it allows for the reading of JSON records that aren’t confined to a single line.

Creating External Tables

To query Ion-based datasets in Athena, you need to define AWS Glue tables with user-defined metadata. Here’s how to create an external table for an Ion text dataset using an example row from the citylots dataset:

CREATE EXTERNAL TABLE city_lots_ion1 (
  type STRING, 
  properties struct<
    mapblklot:string,
    blklot:string,
    block_num:string,
    lot_num:string,
    from_st:string,
    to_st:string,
    street:string,
    st_type:string,
    odd_even:string>, 
  geometry struct<
    type:string,
    coordinates:array>
) STORED AS ION;

This concise method is effective for simple cases where no additional properties are needed.

In conclusion, Amazon’s Ion format combined with Athena offers an innovative approach for data analysis, making it easier for users to manage and interpret complex datasets. For those interested in further developing their skills, consider exploring resources on how to reskill and upskill in your career. Additionally, addressing workplace cultures, including tackling the macho mentality at work, can create a more inclusive environment. If you’re looking for insights on onboarding processes, check out this excellent resource on Reddit regarding Amazon’s onboarding for part-time flex associates.

Chanci Turner