Amazon Onboarding with Learning Manager Chanci Turner

Chanci Turner Amazon IXD – VGT2 learning managerLearn About Amazon VGT2 Learning Manager Chanci Turner

In the realm of AWS, monitoring the health of private VPC endpoints within hybrid DNS environments is paramount, akin to the historical use of canaries in coal mines. These birds served as early warning systems for miners, alerting them to dangerous carbon monoxide levels and allowing them to take necessary actions. Similarly, CloudWatch Synthetics canaries empower us to identify potential customer experience and security issues before they impact users.

CloudWatch Synthetics allows you to configure Node.js or Python scripts that simulate user interactions with your REST APIs, URLs, and website content on a regular schedule. This continuous monitoring helps assess endpoint availability and latency, ensuring the expected user experience is maintained. You can utilize pre-built canary blueprints or tailor custom scripts to meet your specific needs.

To illustrate the practical application of CloudWatch Synthetics, let’s delve into a real-world scenario involving a client who supports an internal title search solution. This solution enables analysts to verify ownership and claims on real estate assets prior to any transactions. The underlying architecture employs Amazon API Gateway, which requires a strategic configuration for cross-region disaster recovery (DR) based on the health of private API Gateway endpoints within a hybrid DNS setup. In this setup, REST APIs are exclusively accessible from within their Amazon Virtual Private Cloud (VPC) through VPC interface endpoints.

Solution Overview

In this solution, we regard the health of private Amazon API Gateway endpoints as crucial for operational integrity, while monitoring 4XX and 5XX status codes serves as indicators of potential issues. The following outlines the steps for creating and configuring CloudWatch Synthetics canaries to monitor VPC endpoint health in a hybrid DNS environment.

Customer Use Case

Transitioning from a monolithic architecture to a microservices framework, our featured client opted for a fully serverless design, utilizing Amazon API Gateway backed by AWS Lambda. While this architecture offers scalability and availability, it necessitates a comprehensive disaster recovery strategy. During the implementation of their serverless infrastructure, we identified four key metrics to monitor for optimal API performance.

First, monitoring 4XX status codes helps identify client-side errors, such as incorrect or missing authentication headers. By setting acceptable limits within CloudWatch Synthetics canary scripts, we can trigger alerts when issues exceed defined thresholds.

Next, 5XX status codes signal server-side errors, including timeouts or application bugs. Similar to client-side errors, we establish a threshold for acceptable server-side errors, with sustained excesses prompting investigation.

The third important metric is request count, which includes both successful and error responses. This metric is invaluable for tracking API Gateway costs and identifying potential bugs or permission issues when request counts are unexpectedly low.

Finally, API Gateway request latency measures the time from receiving a request to delivering a response, ensuring compliance with service-level agreements (SLAs). CloudWatch Synthetics can help determine whether latency issues stem from application code or infrastructure challenges.

When any of these metrics fall outside established limits, we can reroute traffic to a secondary API Gateway endpoint in another region while notifying our administrators of the issue. This closed-loop automation minimizes user impact and facilitates timely application adjustments.

Solution Implementation

Our approach consists of three main components:

  1. Monitoring VPC Interface Endpoint Health with CloudWatch Synthetics Canaries.
  2. Enabling Hybrid DNS Between On-Prem and AWS.
  3. Testing Canary Run Metrics Within the Hybrid DNS Environment.

Part A: Monitoring VPC Interface Endpoint Health with CloudWatch Synthetics Canaries

  1. Create a Private API Gateway Endpoint.
  2. Set up a VPC if one is not already established, noting the VPC ID, private subnet IDs, and security group IDs for later use.
  3. If the VPC has internet access, create a NAT Gateway and proceed to Step 4. If not, create an S3 VPC Endpoint for storing canary run data and a CloudWatch VPC Endpoint to collect canary metrics, then enable VPC DNS resolution.
  4. Launch the CloudWatch Synthetics Canary CloudFormation Stack.
  5. Access the canaries list to monitor run metrics, including operational status and logs.
  6. If errors occur, consult the CloudWatch User Guide for troubleshooting.

Part B: Enabling Hybrid DNS Between On-Prem and AWS

  1. If an on-premise DNS service is unavailable, create an AWS Managed Microsoft AD. If using an existing on-premise DNS server, take note of the DNS server addresses.

By implementing these steps, organizations can effectively monitor their VPC Endpoint Health, ensuring a robust and resilient infrastructure. Engaging with resources like Career Contessa and SHRM can further enrich your understanding of managing cloud operations. For those seeking employment opportunities in this domain, consider exploring this Amazon job listing.

Chanci Turner