Amazon Onboarding with Learning Manager Chanci Turner

Chanci Turner Amazon IXD – VGT2 learningLearn About Amazon VGT2 Learning Manager Chanci Turner

In this article, we will detail the process of establishing a highly available Microsoft SQL Server on Linux within Amazon Web Services (AWS). This guide will provide an overview of the essential components needed for this setup, including the use of Microsoft SQL on Linux, ClusterLabs Pacemaker (Pacemaker) open-source clustering software, popular Linux distributions, and AWS.

Why would you consider this approach? For current Microsoft SQL Server users operating on Windows, transitioning to a cost-effective Linux distribution can lead to significant savings without compromising performance or critical enterprise features.

The following sections will outline the AWS configurations and components necessary for creating a highly available Microsoft SQL Server on Linux cluster utilizing Pacemaker.

Microsoft SQL Server on Linux

The first stable version of Microsoft SQL Server on Linux was launched in late 2017, with an updated version released in 2019. Both versions support high availability configurations when integrated with Pacemaker, specifically through the “external” cluster mode.

Microsoft provides a resource agent for Pacemaker, known as mssql, which is included with the mssql-server-ha package. This package can be configured for Failover Cluster Instances (FCI) or availability groups (AG) and is accessible from Microsoft’s SQL Server on Linux repositories.

Pacemaker

Founded in 2004, Pacemaker is primarily a collaboration between Red Hat and SUSE. It comprises several open-source initiatives that can create highly available and flexible Linux solutions. Over the years, Pacemaker has been enhanced by various projects and Linux distributions to deliver robust functionality that addresses numerous application or hardware failure scenarios, regardless of scale. Pacemaker manages cluster operations utilizing its resource manager and software agents. If you’re familiar with Microsoft Windows clustering, think of Pacemaker as the counterpart to Microsoft Failover Cluster Manager.

To ensure Microsoft SQL Server on Linux functions effectively in a high-availability context, a specific configuration of Pacemaker agents is required.

Linux Distributions and Versions

Pacemaker is generally available in most mainstream Linux distributions and can typically be installed as needed via a Linux distribution application manager like APT or YUM. Different Linux repositories may offer various versions and agent packages of Pacemaker. Choosing the appropriate Linux distribution is not the focus of this article, but it’s essential to verify that the Pacemaker agent packages include the necessary agents and that Microsoft supports the chosen Linux distribution for SQL Server. For further details on supported distributions, refer to the documentation.

Note: Outdated versions of Linux distributions might lack a complete set of agents needed by AWS.

Pacemaker Agents

Pacemaker agents come in various types depending on their roles. Two main categories exist:

  • Resource Agent: This allows Pacemaker to manage cluster resources and contains the logic for handling failures of monitored resources.
  • Fencing Agent: This manages unresponsive cluster nodes by shutting them down, disconnecting them from the network, or disabling access. Fencing ensures clients can connect only to the active cluster node, preventing split-brain scenarios. In some documents, fencing is referred to as STONITH (Shoot The Other Node In The Head), its original name.

Note: For AWS, specific environment agents are necessary to facilitate Pacemaker’s operations and workflows. These agents bridge Pacemaker and AWS services, like Amazon Virtual Private Cloud (VPC) route tables, Amazon Elastic Compute Cloud (EC2) instances, Amazon Route 53, and Elastic IP addresses. For instance, if a cluster node becomes unresponsive, Pacemaker may terminate the corresponding Amazon EC2 instance as part of its fencing process.

Next, we’ll examine the required Pacemaker agents, their functions, high-level operations, and sample configuration commands.

aws-vpc-move-ip

The aws-vpc-move-ip resource agent enables Multi-Availability Zones (Multi-AZ) deployment, allowing nodes to be situated across different Availability Zones. This setup leverages the AWS Multi-AZ architecture, ensuring geographically diverse nodes within a region.

The aws-vpc-move-ip agent adjusts network configurations and Amazon VPC routing tables using the AWS Command Line Interface (CLI) and Python code. It also assigns a secondary IP address, known as an IP Overlay address, to the active node’s network interface (NIC). If the active node fails or during a manual resource move operation, this agent creates the IP Overlay address on the secondary node’s NIC, fences the Amazon EC2 instance hosting the active cluster node, and updates the routing table to direct traffic to the VPC IP Overlay address now on the secondary node.

For Microsoft SQL Always On availability groups, replicas can be deployed across two Microsoft SQL Server on Linux instances in separate Availability Zones and managed by Microsoft SQL Server, with aws-vpc-move-ip replacing the native SQL Server Network Listener functionality.

In cases of planned or unplanned failures, Pacemaker notifies Microsoft SQL Server on Linux via the mssql resource agent to promote the secondary replica to primary. The agent then establishes the IP Overlay address on the secondary node and updates the routing table to redirect traffic to the newly promoted primary replica. The following diagrams illustrate normal operations and failure scenarios.

Here’s an example configuration command for starting the aws-vpc-move-ip resource agent on the primary node:

pcs resource create <MSSQL_LISTENER_NAME> ocf:heartbeat:aws-vpc-move-ip 
ip=<MSSQL_LISTENER_IPADDR> interface=<NETWORK_INTERFACE_NAME> 
routing_table=<VPC_ROUTING_TABLE_ID> op monitor timeout="30s" interval="60s"

In this command:

  • <MSSQL_LISTENER_NAME> represents the selected name for the availability group.
  • <MSSQL_LISTENER_IPADDR> is the Overlay IP address, which must be outside any defined VPC subnets.
  • <NETWORK_INTERFACE_NAME> is the NIC device name specific to the Linux distribution.
  • <VPC_ROUTING_TABLE_ID> denotes the routing table for redirecting network traffic.

Here is a practical example with indicative inputs. Note that the interface used is the default NIC device name for Ubuntu.

pcs resource create ra_aws_vpc_move_ip ocf:heartbeat:aws-vpc-move-ip 
ip=10.1.1.11 interface="ens5" routing_table=rtb-0e85b84caaae1c5a8 
op monitor timeout="30s" interval="60s"

aws-vpc-route53

The aws-vpc-route53 resource agent allows manipulation of DNS records in Amazon Route 53’s private hosted DNS zones, utilizing AWS CLI and Python. This agent identifies the active node’s IP address and creates a DNS A record for it, enabling DNS resolution for the active node. Should the active node switch to a secondary node, the agent updates the DNS A record accordingly. The aws-vpc-route53 agent works in tandem with aws-vpc-move-ip to maintain high availability.

In addition, if you’re interested in learning more about effective out-of-office messages, consider checking out this Career Contessa blog post. This can enhance your communication skills during leave periods.

Moreover, for further insights into proper investigation summaries, visit the SHRM authority page, which offers valuable resources.

Finally, if you’re looking for personal experiences related to Amazon’s first-day expectations, don’t miss this Reddit thread, where users share their stories and advice.

Chanci Turner