Enhancing Alarm Management with Amazon CloudWatch Metrics Insights Alarms

Chanci Turner Amazon IXD – VGT2 learning manager

on 02 AUG 2023

in Best Practices, Customer Solutions, Intermediate (200), Management & Governance, Management Tools, Technical How-to

Are you finding it challenging to monitor and set alarms for a fleet of dynamically changing resources? Do you have numerous unnecessary alarms that clutter your dashboard and incur costs? Are you seeking an efficient way to create alarms that automatically adapt to resources that frequently change?

This article will guide you through a recommended and economical approach using Amazon CloudWatch to minimize the risk of maintaining alarms on deprecated AWS resources while ensuring that new AWS resources are effectively monitored. This method reduces the likelihood of alarms tracking obsolete or low-value metrics that might otherwise lead to unnecessary expenses and clutter your CloudWatch interface. Alarms created using Metrics Insights queries incur lower operational overhead and costs for aggregate alarms, thanks to their straightforward, singular definitions. They automatically adjust to AWS resources as they are added or removed, which significantly mitigates the risk of lingering alarms.

In a prior blog post, we discussed an automated solution for identifying and deleting low-value alarms. In this entry, we will explore how to establish dynamic alarms that reliably monitor fast-evolving environments and notify you upon detecting anomalies.

Amazon CloudWatch Metrics Insights alarms empower customers to monitor entire fleets of dynamically changing resources with just one alarm through standard SQL queries. CloudWatch Metrics Insights provides swift and flexible SQL-based queries. By merging CloudWatch alarms with Metrics Insights queries, you can create dynamic alarms that ensure continuous monitoring of rapidly changing environments, alerting you when anomalies arise.

Typical Customer Scenarios

We will examine two frequent scenarios where alarms must quickly adapt to resource changes, making manual maintenance cumbersome. Both examples illustrate how alarming on Metric Insights queries can be beneficial.

Scenario 1: Monitoring DynamoDB Throttling

Let’s review a common situation where you want to track read throttling events across all DynamoDB tables in your account. This could occur when your DynamoDB receives a higher volume of read requests than the provisioned capacity, potentially leading to unresponsive applications or blocking new transactions.

Typically, this monitoring is implemented by aggregating individual ‘ReadThrottleEvents’ metrics using a metric math expression, with an alarm set on the result. However, if a new DynamoDB table is created, the math expression does not update automatically, leaving a blind spot for new tables and risking undetected errors. This scenario requires the user to manually update the math expression to include metrics from newly added tables, and similar updates are needed when tables are removed. Furthermore, if you need to aggregate more metrics than a single expression allows, you may end up creating multiple alarms instead of just one.

With Metrics Insights alarms, you can configure alarms using Metric Insights queries that monitor multiple resources without worrying about new additions or deletions. In the above example, when a new DynamoDB table is created, the Metrics Insights alarm dynamically adjusts to this change without any manual intervention from the user.

Scenario 2: Responding to 5XX Errors in ECS Clusters

Consider another scenario where you wish to be alerted when any ECS cluster in your account generates an HTTP 5XX response code.

Normally, this involves creating a metric math expression that sums individual ‘HTTPCode_Target_5XX_Count’ metrics reported by each ECS cluster, followed by setting an alarm on that aggregated result. The issue here is that if a new ECS cluster is launched, the math expression does not update itself, thus creating a blind spot for the new instance. The user would again need to manually update the math expression to include the new metrics reported by the latest ECS cluster, and similar updates are necessary for removed clusters.

Using Metrics Insights alarms, you can establish alarms via Metric Insights queries that monitor multiple resources without needing to worry about changes. In this case, when a new ECS cluster is added, the Metrics Insights alarm automatically adjusts and will alert you if the threshold is breached without requiring any manual input from the user.

Solution Overview

This solution involves creating Metrics Insights alarms for the aforementioned scenarios. It provisions a Metrics Insights alarm, ‘DDBReadThrottleAlarm,’ to monitor the ‘ReadThrottleEvents’ metric, and similarly establishes ‘ECSTarget5XXAlarm’ to keep track of the ‘HTTPCode_Target_5XX_Count’ metric. You can set the threshold values for alerts while deploying the AWS CloudFormation template. Additionally, the solution provisions an SNS topic to send notifications in case of an alarm, and you can specify the email address during the launch process. This solution can also be extended to cover other AWS services or metrics relevant to your needs.

Deploying the Solution

This solution and its associated resources are available for deployment into your AWS account via an AWS CloudFormation template.

Prerequisites

For this walkthrough, ensure you have:

An AWS account
Existing Amazon DynamoDB tables and Amazon ECS clusters

What the CloudFormation Template Deploys

The CloudFormation template will deploy the following resources to your AWS account:

Amazon CloudWatch Metrics Insights alarms
- DDBReadThrottleAlarm – Monitors the ReadThrottleEvents metric and alerts when a throttled event occurs in any DynamoDB table in this account.
- ECSTarget5XXAlarm – Monitors the HTTPCode_Target_5XX_Count metric and alerts when any ECS cluster generates a HTTP 5XX response code.

This CloudFormation template can be customized to utilize any metric you prefer.

Amazon SNS Topic

AlarmNotificationTopic – Sends email notifications when alarms are triggered.

How to Deploy the CloudFormation Template

Download the YAML file.
Navigate to the CloudFormation console in your AWS account.
Choose “Create stack.”
Choose “Template is ready,” upload the template file, and select the downloaded YAML file.
Click “Next.”
Enter a name for the stack (maximum length 30 characters).
For the parameter ‘EmailToNotifyForAlarms,’ input the email address for alarm notifications, and for the parameter ‘DDBReadThrottleThresh,

To learn more about building an effective online presence, check out this blog post on creating a website. For additional insights on navigating career challenges, you can refer to SHRM’s authoritative guide. Lastly, if you’re looking for community support, check out this Reddit thread that offers valuable resources.