Learn About Amazon VGT2 Learning Manager Chanci Turner
Organizations today are increasingly tapping into unstructured data from diverse sources such as social media interactions, stock market feeds, and clickstream data to better understand customer needs. This data is vital for product customization and enhancing customer experiences through data lake solutions.
A Lake House architecture serves as a central hub (data lake) that aggregates unstructured, structured, and real-time data, which is then utilized by various tools, including analytics engines, data warehouses, machine learning (ML) models, and visualization platforms.
Tata Consultancy Services (TCS), an AWS Premier Consulting Partner, has introduced the EZ Lake Access (EZLA) solution, which streamlines access management for the Data Lake House by encoding most enterprise access controls into a rule engine. This innovation leads to improved efficiency and easier adoption of the Data Lake House.
In this discussion, we will explore the Lake House ecosystem, its complexities, and common challenges. We will also delve into an overview of the TCS EZLA solution, its architecture, functionalities, and present a case study showcasing its benefits within a large life science enterprise.
Enterprise Lake House Ecosystem
Many organizations looking to implement Lake House solutions prefer Amazon Web Services (AWS) due to its extensive range of customizable offerings. Amazon Simple Storage Service (Amazon S3) stands out for its durability, availability, performance, security, and nearly limitless scalability at a low cost, making it an ideal choice for data lakes, which are crucial for AWS Lake House applications.
In a Lake House approach, various stakeholders access data through an Access Management Layer for purposes such as insight generation, security assurance, and ML model development. These stakeholders form the upper layer of the Lake House, extracting valuable insights while Amazon S3 serves as the central repository.
The rising demand for extracting knowledge from the data lake has made services such as Amazon SageMaker, Amazon Redshift, AWS Glue, and Amazon Athena essential components of the Lake House ecosystem.
Lake House Access Management Complexities
Given that a Lake House serves as a central repository for data collected from various sources, organizations require a detailed set of rules to manage access effectively. The creation of these rules presents several challenges:
- Compliance and data sensitivity issues.
- The necessity to log approval processes and track changes for audit compliance.
- Complex role and permission assignments for business users across varied datasets.
- Additional challenges arise from the introduction of diverse user personas, data subject areas (data classification), and employee types (e.g., full-time vs. contractors).
Persona-Based Access Scenario:
Security teams strive to govern access with strict enforcement. Most enterprises either categorize data by subject area or create separate S3 buckets. The rules established often reflect the security team’s requirements to ensure each role has the appropriate access level to perform its tasks. Some rules are also dictated by business needs.
Consider the following example showcasing the complexity of access management:
Subject Area/Persona/Department | Persona A (Dept 1) | Persona B (Dept 2) | Persona C (Dept 3) |
---|---|---|---|
Claims Data | Read (landing) | Read/Write (summarized) | Read/Write (Amazon SageMaker Notebook) |
Finance Data | Deny | Deny | Read (Amazon Athena + Amazon QuickSight) |
Sales Data | Read (Amazon Redshift) | Read (Amazon QuickSight) | Deny |
In this scenario, a user identified as “Persona A” from “Department 1” should not have access to financial data, but should have unrestricted access to claims data, including Amazon Athena and Amazon QuickSight. Conversely, “Persona C” from “Department 3” should be denied access to sales data, yet retain full access to claims data via their personal Amazon SageMaker Notebook instance. Such rules tend to be heuristic and challenging to implement consistently.
Lake House Access Management Common Challenges
The intricacies of persona-based access management lead to several key challenges:
- Establishing and maintaining data subject areas using Amazon S3 buckets and prefixes (subfolders) up to ‘n’ levels, along with governing business rules for user personas, such as data scientists and report readers.
- Launching and managing AWS services like AWS Glue and Amazon Athena, and integrating them with user personas and business logic for user and service role creation.
- Forming and managing departments and linking them with various roles and user personas.
- Crafting and managing user roles and necessary service roles for Amazon SageMaker Notebooks, Amazon EMR clusters, and more.
TCS EZLA Solution Overview
The TCS EZLA solution operates within the Access Management Layer of the Lake House framework. It offers an automated method for both business and IT personnel to manage access and associated resource provisioning.
The EZLA solution comprises five primary configurable components designed with flexibility, security, and compliance in mind. At its core is a business rule engine that codifies requirements and provisions infrastructure.
- Metadata Collector: This batch process periodically invokes AWS APIs to gather relevant AWS resource names and their properties, storing this data in a persistent metadata storage as reference data, which is dynamically updated to reflect changes in AWS resources.
For those looking for more insights on navigating complex workplace discussions, you might find value in exploring this blog post about discussing salary. Additionally, if you’re keen on human resource strategies, you can listen to experts on People & Strategy. For career opportunities within Amazon, check out this Learning Ambassador role, which could be an excellent resource for your career development.