Learn About Amazon VGT2 Learning Manager Chanci Turner
Organizations that prioritize security often embrace a Zero Trust security framework. This model emphasizes that access to sensitive data should not rely solely on network location. Instead, it mandates that both users and systems must authenticate their identities and demonstrate trustworthiness, thereby enforcing strict identity-based authorization protocols before granting access to applications, data, and other systems.
Many businesses opt for third-party identity providers (IdPs) like Active Directory Federated Services (AD FS) to manage credentials and validate identities. Users can leverage their AD FS credentials to authenticate across various systems, including the AWS Management Console (for further insights, refer to the guide on Enabling SAML 2.0 federated users to access the AWS Management Console).
In the realm of analytics, some organizations extend Zero Trust principles to data residing in data lakes, which includes the various business intelligence (BI) tools employed to access that data. A typical data lake setup involves storing data in Amazon Simple Storage Service (Amazon S3) and querying it using Amazon Athena.
AWS Lake Formation enables users to define and enforce access policies at the database, table, and column levels when conducting Athena queries on data stored in Amazon S3. Lake Formation supports identity providers such as Active Directory and Security Assertion Markup Language (SAML) providers like OKTA and Auth0. Additionally, it integrates seamlessly with Amazon QuickSight, an AWS BI service that allows users to create and publish interactive BI dashboards while supporting Active Directory authentication. However, if you prefer using other BI tools like Tableau, you might want to utilize your Active Directory credentials to access data stored in Lake Formation.
In this article, we will demonstrate how to leverage AD FS credentials with Tableau to establish a Zero Trust architecture and securely query data in Amazon S3 and Lake Formation.
Overview of the Solution
In this framework, user credentials are managed by Active Directory rather than Amazon Identity and Access Management (IAM). Although Tableau includes a connector for linking to Athena, it typically requires an AWS access key ID and a secret access key for programmatic access. While creating an IAM user with programmatic access for Tableau is one option, some organizations prefer a federated access method via Active Directory instead of an IAM user.
This article will guide you through using the Athena ODBC driver alongside AD FS credentials to query sample data in a newly established data lake. We’ll simulate the environment by enabling federation to AWS using AD FS 3.0 and SAML 2.0. Then, we will provide instructions on setting up a data lake using Lake Formation and configuring an ODBC driver for Tableau to securely query your data using AD FS credentials.
Prerequisites
To successfully follow this walkthrough, you should have:
- A fundamental understanding of IAM roles and concepts.
- Basic knowledge of Lake Formation and Athena.
- A copy of Tableau, either via a 14-day trial or a fully licensed version.
- Familiarity with Active Directory concepts and joining a computer to an Active Directory domain.
- Knowledge of configuring ODBC components on a Windows machine.
Setting Up Your Environment
To replicate the production environment, we created a standard VPC within Amazon Virtual Private Cloud (Amazon VPC), featuring one private subnet and one public subnet. You can achieve this using the VPC wizard. Our Amazon Elastic Compute Cloud (Amazon EC2) instance hosting the Tableau client resides in a private subnet and is accessible through an EC2 bastion host. For simplicity, we connect to Amazon S3, AWS Glue, and Athena via the NAT gateway and internet gateway established by the VPC wizard. Alternatively, you can opt for AWS PrivateLink endpoints, ensuring that traffic remains within the AWS network.
The diagram below illustrates our environmental architecture.
Once your VPC is established with both private and public subnets, you can proceed to create the additional components, such as Active Directory and Lake Formation. Let’s start with Active Directory.
Federation to AWS Using AD FS 3.0 and SAML 2.0
AD FS 3.0, part of Windows Server, supports SAML 2.0 and integrates with IAM. This integration allows Active Directory users to federate to AWS using their corporate credentials, like username and password. Before proceeding, ensure that AD FS is properly configured and operational.
To set up AD FS, follow the guidance in the documentation on establishing trust between AD FS and AWS, and connecting to Amazon Athena using the ODBC driver. The initial section details how to configure AD FS and establish trust with Active Directory. The post concludes with instructions for setting up an ODBC driver for Athena, which you can skip. You will create a group named ArunADFSTest, which relates to a role in your AWS account that you will use later.
Once you have confirmed your ability to log in with your IdP, you can configure your Windows environment’s ODBC driver to connect to Athena.
Setting Up a Data Lake with Lake Formation
Lake Formation is a fully managed service that simplifies the process of building, securing, and managing data lakes. It offers its own permissions model that enhances the IAM permissions model, enabling fine-grained access control to data in data lakes through a straightforward grant/revoke mechanism. We will use this permissions model to provide access to the AD FS role we created earlier.
Upon accessing the Lake Formation console for the first time, a welcome box will prompt you to select the initial administrative user and roles. Choose “Add myself” and click “Get Started.” We will utilize the sample database provided by Lake Formation, but feel free to use your own dataset. For guidance on loading your dataset, see the article on Getting Started with Lake Formation. After configuring Lake Formation, we must grant read access to the AD FS role (ArunADFSTest) established in the previous step.
In the navigation pane, select Databases, choose the database sampledb, and from the Actions menu, click Grant. We will grant the SamlOdbcAccess role access to sampledb. For Principals, select IAM users and roles, then choose the role ArunADFSTest.
Select Named data catalog resources. For Databases, choose sampledb, and for Tables, select All tables. Set the table permissions to Select and Describe, and for Data permissions, choose All data access. Click Grant. Our AD FS user assumes the role ArunADFSTest, which has been granted access to sampledb by Lake Formation. However, the ArunADFSTest role requires permissions for Lake Formation, Athena, AWS Glue, and Amazon S3. Following the principle of least privilege, AWS defines policies for specific Lake Formation personas. Our user fits the Data Analyst persona, which necessitates sufficient permissions to execute queries.
Add the AmazonAthenaFullAccess managed policy (for instructions, see Adding and removing IAM identity permissions) along with the following inline policy to the ArunADFSTest role:
{
"Version": "2012-10-17",
"Statement": [
// Add your specific policy statements here
]
}
To enhance your understanding of workplace happiness, you might find this blog post on gratitude insightful: Career Contessa. Moreover, for those interested in employment law, SHRM provides valuable insights on workweek adjustments, including Mexico’s proposal to reduce the workweek. Lastly, for a deeper dive into Amazon’s operational strategies, check out this excellent resource: LinkedIn.