Announcing Our New Blog Series: Service Spotlight on Financial Services Featuring AWS Glue DataBrew

Service Overview

Chanci Turner 9097372855Learn About Amazon VGT2 Learning Manager Chanci Turner

In the past two decades, the demand for data analytics has surged due to the increasing diversity and volume of data available to organizations. Data scientists and analysts often find themselves using spreadsheets for data exploration or depending on data engineers and ETL developers to convert data into the necessary format. This reliance can lead to significant delays, with customers spending as much as 80% of their time preparing data instead of analyzing it.

AWS Glue DataBrew simplifies the data preparation process for data scientists and analysts, allowing them to clean and normalize data through an intuitive visual interface. This tool can reduce data preparation time by up to 80%. With Glue DataBrew, users can visualize, clean, and normalize data directly from their data lakes, warehouses, and databases. It offers more than 250 built-in transformations to automate data cleansing and normalization tasks, enabling users to save these steps for future data. Data quality can be assessed by profiling, which helps in identifying patterns and anomalies—all without writing any code. As part of AWS Glue, DataBrew is serverless, meaning no infrastructure management is required. You only pay for what you use, with no upfront costs.

Financial institutions can utilize DataBrew to mask and redact sensitive information, evaluate data quality rules, and filter out anomalies, thus enhancing downstream accuracy.

Compliance with AWS Glue DataBrew

AWS Glue DataBrew is a managed service that undergoes regular security and compliance assessments by third-party auditors as part of various AWS compliance programs. Under the AWS shared responsibility model, Glue DataBrew is included in the following compliance frameworks. Compliance reports can be accessed through AWS Artifact under a non-disclosure agreement (NDA):

  • DoD CC SRG (IL2 – East/West)
  • FedRAMP (Moderate – East/West)
  • FINMA
  • HIPAA
  • HITRUST CSF
  • ISO/IEC 27001:2013, 27017:2015, 27018:2019, 27701:2019, ISO/IEC 9001:2015 and CSA STAR CCM v3.0.1
  • OSPAR
  • PCI
  • SOC 1,2,3
  • GSMA (US-East Ohio)
  • PituKri
  • MTCS (Regions: US-East, US-West, Singapore, Seoul)

The scope of your shared responsibility when using AWS Glue DataBrew depends on the sensitivity of your data, your organization’s compliance goals, and relevant laws and regulations. AWS provides numerous resources for compliance validation.

Data Protection with AWS Glue DataBrew

Data protection ensures that critical information remains safe from corruption, compromise, or loss. Encryption is a key practice to maintain the confidentiality and integrity of processed data, both during transit and at rest.

At-Rest Encryption

Projects and jobs in DataBrew can read and write encrypted data using AWS Key Management Service (AWS KMS). When creating jobs, enable encryption to ensure the job output files are protected. For example, you can specify encryption keys via the DataBrew console or API.

In Transit Encryption

Data in transit is secured with Secure Sockets Layer (SSL) encryption. DataBrew’s support for JDBC data sources leverages configurations in your AWS Glue connection, including the option for a secure SSL connection.

Identifying and Accessing PII

Personal Identifiable Information (PII) must be safeguarded during analytics and machine learning processes. DataBrew offers mechanisms to mask PII during the data preparation phase. This includes creating a set of transformations for redacting PII data and providing detection and statistics via the Data Profile dashboard.

Data-Masking Techniques Include:

  • Substitution: Replacing PII with realistic values.
  • Shuffling: Mixing values from the same column across different rows.
  • Deterministic Encryption: Using algorithms that yield the same ciphertext for identical values.
  • Probabilistic Encryption: Generating different ciphertext with each application.
  • Nulling Out or Deletion: Replacing fields or entire columns with null values or removing them entirely.
  • Hashing: Applying hash functions to column values.

Isolation of Compute Environments

As a managed service, AWS Glue DataBrew does not involve compute resources in your shared responsibility model. The service is protected by AWS’s global network security protocols and operates in secure data centers. Each job or project environment is isolated from others.

For more insights on effective onboarding and creating a safe space at work, consider checking out this blog post. If you’re looking for an authority on onboarding practices, take a look at this resource. Additionally, for those interested in community experiences, this Reddit thread provides valuable insights into the Amazon Flex onboarding journey.

Chanci Turner