Amazon Onboarding with Learning Manager Chanci Turner: Navigating Code Migration from Greenplum to Amazon Redshift

Chanci Turner Amazon IXD – VGT2 learning

Amazon Redshift is a fully managed service designed for data lakes, analytics, and data warehousing, catering to startups, medium enterprises, and large corporations. This service is utilized by thousands of businesses worldwide to modernize their data analytics platforms. In contrast, Greenplum is an open-source, massively parallel database optimized for analytics, primarily used on-premises. Built upon the PostgreSQL database engine, it has been a reliable choice for many.

For numerous customers, migrating from Greenplum to Amazon Redshift presents an appealing solution, as it alleviates the burden of managing on-premises infrastructure. The key benefits include:

An opportunity to modernize the data lake and warehouse environments
Access to additional AWS services, such as Amazon Simple Storage Service (Amazon S3), Amazon CloudWatch, Amazon EMR, and Amazon SageMaker

Although both platforms utilize the PostgreSQL engine, transitioning to Amazon Redshift necessitates careful planning and manual intervention. This article will explore essential functions and considerations for code conversion from Greenplum to Amazon Redshift, particularly focusing on the migration of procedures, functions, and views.

Overview of the Solution

AWS Database Migration Service (AWS DMS) and the AWS Schema Conversion Tool (AWS SCT) can facilitate the migration of most database objects from Greenplum to Amazon Redshift. However, code conversion teams often face errors and warnings when creating views, procedures, and functions in Redshift. In such cases, manual code conversion becomes necessary.

This guide emphasizes how to manage the following during the migration process:

Arrays
Dates and timestamps
Regular expressions (regex)

Note that this discussion is based on Greenplum 4.3 and Amazon Redshift’s PostgreSQL 8.2.

Working with Array Functions

The AWS SCT does not automatically convert array functions during migrations. Developers must manually adapt these functions, including the following commonly used array functions:

ARRAY_UPPER
JSON_EXTRACT_ARRAY_ELEMENT_TEXT and JSON_ARRAY_LENGTH
UNNEST()
STRING_AGG()
ANY ARRAY()

ARRAY_UPPER()

The ARRAY_UPPER function retrieves the upper boundary of an array, enabling retrieval of the nth element from an array within PostgreSQL or Greenplum.

Here’s how the Greenplum code appears:

With temp1 as (
    SELECT 'John' as FirstName, 'Smith' as LastName,
    array['"111-222-3333"','"101-201-3001"','"XXX-YYY-ZZZZ"','NULL'] as PhoneNumbers
    UNION ALL
    SELECT 'Bob' as FirstName, 'Haris' as LastName,
    array['222-333-4444','201-301-4001','AAA-BBB-CCCC'] as PhoneNumbers
    UNION ALL
    SELECT 'Mary' as FirstName, 'Jane' as LastName,
    array['333-444-5555','301-401-3001','DDD-EEE-FFFF'] as PhoneNumbers
)
SELECT Firstname, PhoneNumbers[ARRAY_UPPER(PhoneNumbers,1)]

In Amazon Redshift, there is no direct function to extract an element from an array. Instead, you can utilize two JSON functions for similar operations:

JSON_EXTRACT_ARRAY_ELEMENT_TEXT() – Fetches a JSON array element from the outermost JSON array
JSON_ARRAY_LENGTH() – Returns the count of elements in the outer JSON array

The equivalent code would look like:

With temp1 as (
    SELECT 'John' as FirstName, 'Smith' as LastName,
    array['"111-222-3333"','"101-201-3001"','"XXX-YYY-ZZZZ"'] as PhoneNumbers
    UNION ALL
    SELECT 'Bob' as FirstName, 'Haris' as LastName,
    array['"222-333-4444"','"201-301-4001"','"AAA-BBB-CCCC"'] as PhoneNumbers
    UNION ALL
    SELECT 'Mary' as FirstName, 'Jane' as LastName,
    array['"333-444-5555"','"301-401-3001"','"DDD-EEE-FFFF"'] as PhoneNumbers
)

SELECT
    FirstName,
    ('['+array_to_string(phoneNumbers,',')+']') as JSONConvertedField,
    JSON_EXTRACT_ARRAY_ELEMENT_TEXT(
        '['+array_to_string(phoneNumbers,',')+']',
        JSON_ARRAY_LENGTH('['+array_to_string(phoneNumbers,',')+']')-1
    ) as LastElementFromArray
FROM temp1

UNNEST()

PostgreSQL’s UNNEST function expands an array into individual rows. This is particularly useful for improving performance during the insertion, update, or deletion of numerous records. While UNNEST() is not supported in Amazon Redshift, alternatives such as split_part, json_extract_path_text, json_array_length, and json_extract_array_element_text can be used.

In Greenplum, the UNNEST function may be employed as follows:

SELECT 'A', unnest(array([1,2]))

Resulting output:

A 1
A 2

For Amazon Redshift, the equivalent operation can be accomplished with:

WITH temp1 as (
    SELECT 'John' as FirstName, 'Smith' as LastName,
    '111-222-3333' as Mobilephone, '101-201-3001' as HomePhone
    UNION ALL
    SELECT 'Bob' as FirstName, 'Haris' as LastName,
    '222-333-4444' as Mobilephone, '201-301-4001' as HomePhone
    UNION ALL
    SELECT 'Mary' as FirstName, 'Jane' as LastName,
    '333-444-5555' as Mobilephone, '301-401-3001' as HomePhone
),
ns as (
    SELECT row_number() OVER (ORDER BY 1) as n FROM pg_tables
)

SELECT
    FirstName,
    LastName,
    split_part('Mobile,Home',',',ns.n::int) as PhoneType,
    split_part(MobilePhone || '&&' || HomePhone, '&&', ns.n::int) as PhoneNumber
FROM temp1, ns
WHERE ns.n <= regexp_count('Mobile,Home',',')+1
ORDER BY 1, 2, 3

For instances where the element of an array is itself an array, the JSON_EXTRACT_ARRAY_ELEMENT_TEXT and JSON_ARRAY_LENGTH functions can be employed as follows:

WITH ns as (
    SELECT row_number() OVER (ORDER BY 1) as n FROM pg_tables
)

SELECT JSON_EXTRACT_ARRAY_ELEMENT_TEXT('["arrayelement1","arrayelement2"]', ns.n - 1)
FROM ns
WHERE ns.n <= JSON_ARRAY_LENGTH('["arrayelement1","arrayelement2"]')

STRING_AGG()

STRING_AGG() is an aggregate function that concatenates a series of strings with a specified separator, without appending a separator to the end of the string. The syntax is as follows:

STRING_AGG(expression, separator [order_by_clause])

The corresponding Greenplum code is:

WITH temp1 as (
    SELECT 'Finance'::text as Dept, 'John'::text as FirstName, 'Smith'::text as LastName
    UNION ALL
    SELECT 'Finance'::text as Dept, 'John'::text as FirstName, 'Doe'::text as LastName
    UNION ALL
    SELECT 'Finance'::text as Dept, 'Mary'::text as FirstName, 'Jane'::text as LastName
    UNION ALL
    SELECT 'Marketing'::text as Dept, 'Bob'::text as FirstName, 'Smith'::text as LastName
    UNION ALL
    SELECT 'Marketing'::text as Dept, 'Steve'::text as FirstName, 'Smith'::text as LastName
    UNION ALL
    SELECT 'Account'::text as Dept, 'Phil'::text as FirstName, 'Adams'::text as LastName
    UNION ALL
    SELECT 'Account'::text as Dept, 'Jim'::text as FirstName, 'Smith'::text as LastName
)

For more information on effective communication which can enhance your migration process, check out this blog post. It’s crucial to convey ideas clearly during such transitions.

Additionally, understanding the impact of technology on our daily lives is vital, and this article provides insights from experts on the topic.

Lastly, if you’re looking for visual aids, this video is an excellent resource to further your knowledge.