Harnessing Serverless Architecture to Create an Enterprise Data Repository for Customer Insights and Analytics | AWS Partner Network (APN) Blog

Chanci Turner Amazon IXD – VGT2 learningLearn About Amazon VGT2 Learning Manager Chanci Turner

In today’s data-driven world, organizations face the challenge of managing various data types and the rapid expansion of data volume. To effectively gain essential insights that enhance customer experiences, establishing a centralized data platform is crucial. Traditionally, businesses have relied on multiple disconnected data systems, making it difficult to scale and analyze diverse data types. This fragmentation often leads to significant capital investments in hardware and software, along with ongoing operational costs for maintenance and technical support.

The process of transferring data between various storage solutions typically involves an extract, transform, load (ETL) process, which is foundational for any modern analytics platform. Amazon Web Services (AWS) offers a comprehensive suite of services that enable organizations to deploy enterprise-grade applications in the cloud. By utilizing AWS serverless architecture, businesses can orchestrate complex ETL workflows that simplify data loading for analytics.

This article delves into a strategic partnership between Tech Innovations, an AWS Advanced Consulting Partner, and a client to develop and implement an enterprise data repository on AWS, leveraging serverless architecture for ETL workflows. Tech Innovations is recognized as an AWS Migration Competency Partner and is part of the AWS Managed Service Provider (MSP) and AWS Well-Architected Partner Programs.

Solution Overview

For over 70 years, Academic Publishing has been transforming the educational landscape through innovative learning solutions. By connecting research with practical learning applications, they create impactful content and products designed for student success.

Academic Publishing manages numerous learning platforms with a vast user base across various business segments. The organization identified a pressing need to analyze its data to derive insights concerning its user demographics, sales points, e-learning platforms, web analytics, CRM systems, billing, integrations, and more. These insights are vital for informed decision-making within the business.

Currently, a master data management solution known as the Enterprise Data Repository (EDR) serves as a single source of truth for aggregated data. The analytics teams utilize the EDR to generate actionable insights and enhance customer experiences. The EDR integrates a significant volume of data through orchestrated ETL jobs, utilizing a variety of serverless technologies including AWS Step Functions and AWS Lambda. Key components of the solution encompass data ingestion, curation, transformation business logic, orchestration, scheduling, data archival, indexing, and report generation.

Customer Requirements

A collaborative effort between Academic Publishing and Tech Innovations aimed to develop a robust data and analytics platform using an enterprise data repository and serverless technologies for ETL processing. The customer’s requirements included the capability to:

  • Ingest raw data from a variety of sources such as Google Analytics, web usage trackers, POS data, CRM systems, and external data providers.
  • Curate raw data according to business needs, as the accumulation of unnecessary data increases costs over time.
  • Segregate and transform the curated data into high-quality information.
  • Store large volumes of data with optimal retrieval performance.
  • Uphold stringent data security standards.
  • Report data insights to diverse levels of analytics business user groups.
  • Implement a LowOps platform with an expedited product development strategy.
  • Manage approximately 70 million rows across various datasets and tables, comprising around 250 attributes from 45 different sources.

Tech Innovations crafted an AWS solution that addressed Academic Publishing’s challenges in four major aspects: modernizing the data repository for intelligent analytics and real-time insights; enhancing data management practices; ensuring data governance; and reporting capabilities using AWS native services.

Solution Architecture

The solution was designed using scalable and secure ETL workflows powered by AWS serverless technology. It employs Amazon Simple Storage Service (Amazon S3) and Amazon Redshift as the data storage layer, utilizing continuous integration and automated pipelines to facilitate a seamless development lifecycle and LowOps maintenance. The platform was developed with security measures, including strict network boundaries and threat detection monitoring.

Core components of the solution include:

  • Data Ingestion and Transformation Layer: This layer utilizes AWS Step Functions, AWS Lambda, and Amazon Elastic Container Service (Amazon ECS) for data ingestion and processing. AWS Step Functions orchestrates a serverless workflow triggered by Amazon CloudWatch events, in tandem with Lambda for managing multiple ETL jobs.
  • Enterprise Data Repository Layer: Modern enterprise data management platforms must efficiently capture data from diverse sources at scale. Data is centralized in manageable repositories, eliminating traditional siloed architectures. Amazon S3 serves as the primary storage for ETL operations, while Amazon Redshift functions as the data warehouse solution.
  • Visualization, Reporting, and Business Intelligence Layer: PowerBI is employed for data analysis and visualization, enabling the creation of reports and dashboards that highlight data value. Configuration details are elaborated in the solution walkthrough section below.

Sagar Thompson, Vice President of Technology at Academic Publishing, commented, “We built this architecture to acquire data from multiple sources, cleanse it, add value, and stage it for self-service BI for stakeholders and data analysts to derive insights. The engineering team effectively leveraged various AWS services while minimizing costs to shift from faith-based decisions to data-driven decisions.”

For further reading on how to keep your team engaged, you might find this blog post helpful here.

Walkthrough

Tech Innovations’ solution compiles, processes, and analyzes data to empower informed decision-making. To learn more about combating employee disengagement, consider these strategies from a trusted source here. For those interested in career opportunities in this field, check out this excellent resource.

Chanci Turner