Learn About Amazon VGT2 Learning Manager Chanci Turner
In our previous post, we outlined a comprehensive strategy for migrating a sizable data warehouse from IBM Netezza to Amazon Redshift without incurring downtime. This article focuses on how a prominent European enterprise customer executed their migration strategy, which spanned multiple environments, utilizing the AWS Schema Conversion Tool (AWS SCT) to expedite both schema and data migration. We will also guide you through the validation process to ensure that the schema and data content were transferred accurately, adhering to Amazon Redshift best practices.
Solution Overview
Creating a tailored migration plan that matches your organization’s unique processes and non-functional requirements is crucial. Below, we present a real-world case study from a large European enterprise customer. It outlines the different environments involved in the migration and details the tasks, tools, and scripts used throughout the process:
- Assess Migration Tasks
- Understand the migration scope
- Document objects to be migrated in a migration runbook
- Set Up the Migration Environment
- Install AWS SCT
- Configure AWS SCT for Netezza source environments
- Migrate to the Development Environment
- Create users, groups, and schema
- Convert schema
- Migrate data
- Validate data
- Transform ETL, UDF, and procedures
- Migrate to Other Pre-Production Environments
- Create users, groups, and schema
- Convert schema
- Migrate data
- Validate data
- Transform ETL, UDF, and procedures
- Migrate to the Production Environment
- Create users, groups, and schema
- Convert schema
- Migrate data
- Validate data
- Transform ETL, UDF, and procedures
- Conduct Business validation (including optional dual-running)
- Cut over
Assessing Migration Tasks
To effectively plan and monitor migration tasks, it’s advisable to create a tracker that lists all Netezza databases, tables, and views involved in the process. This tracker will serve as a migration runbook, updated throughout the migration to log progress from Netezza to Amazon Redshift. For each table, record the number of rows and size in GB. Some Netezza systems may include two data warehouses—one for ETL loading and another for end-user reporting—so clarity on the scope is essential.
Setting Up the Migration Environment
The migration strategy employs AWS SCT to facilitate schema object conversion and data migration from the Netezza database to the Amazon Redshift cluster. It’s vital to ensure the following:
- Install AWS SCT on an Amazon EC2 instance within your AWS account to streamline migration operations and provide a user-friendly console for management.
- Position AWS SCT data extraction agents as close to the Netezza data warehouse as possible, ideally on-premises within the same subnet.
When transferring data from your on-premises data center to AWS, you have the option of using a direct connection or offline storage. For example, AWS Snowball serves as a petabyte-scale offline solution for large data transfers when direct bandwidth is insufficient. Alternatively, AWS Direct Connect simplifies the establishment of a dedicated network connection from your premises to AWS, enhancing bandwidth throughput and overall network consistency. This flexibility is crucial, especially if extract jobs need to be rerun.
Configuring AWS SCT for the Netezza Source Environment
AWS SCT is installed on an EC2 instance running Microsoft Windows 10 with administrator privileges, allowing users to easily manage project creation, modify profiles, and monitor migration progress. A general-purpose EC2 instance with four virtual CPUs, 16 GB of memory, and 100 GB of storage is adequate for this task.
To maximize data transfer efficiency, configure multiple AWS SCT data extraction agents based on the volume of data being transferred and the number of available Netezza connections. For optimal performance, it’s recommended to have one data extraction agent for each TB of compressed Netezza data being migrated in parallel.
During the migration process, work closely with the DBA team to ensure maximum concurrent Netezza connections are accessible to the data extraction agents. For instance, if you have 21 available connections, this can be a suitable configuration while still allowing for parallel workloads.
In this specific case, we deployed seven extraction agents for a project phase that involved extracting 6 TB of Netezza data. The DBA team configured 21 Netezza concurrent connections, allowing each agent to manage three parallel data extraction processes.
To effectively tune each data extraction agent for optimal throughput, modifications to specific configuration files are necessary. For instance, adjust the connection pool size and thread pool size parameters to enhance performance.
If you’re interested in enhancing your office culture, you might want to check out this insightful article on office politics here. For those curious about employment law trends, New York has joined the pay transparency movement, which is an important development for job seekers. Plus, if you’re looking to become a part of Amazon’s Learning team, consider checking out this job listing for further opportunities.