Boost Predictive Maintenance with Generative AI Agents on AWS

Chanci Turner Amazon IXD – VGT2 learning manager

In the intricate landscape of modern industrial operations, predictive maintenance poses a significant challenge, particularly within the Energy, Utilities, and Manufacturing sectors. Conventional methods of detecting anomalies are often labor-intensive and slow, frequently leaving potential equipment failures unaddressed for extended periods—resulting in substantial unplanned downtimes and financial losses. According to McKinsey, mid-sized refineries can incur annual losses ranging from $20 to $50 million due to reliability concerns.

Typical strategies to tackle this issue involve training time-series anomaly detection models or setting rules for each type of equipment and sensor combinations. However, these approaches often fall short in several key areas:

Maintaining the accuracy of prediction models over time
Rapidly verifying anomalies
Quickly conducting root cause analysis
Efficiently routing maintenance requests

A groundbreaking advancement in generative AI is the emergence of agentic workflows, wherein systems autonomously perform tasks, adapt to changes, and make real-time decisions with minimal human intervention. This approach directly addresses the limitations of traditional anomaly detection methods, enabling proactive management of issues before they escalate, thereby reducing downtime and enhancing operational efficiency.

This article presents an agentic solution for predictive maintenance using generative AI agents. We will concentrate on a single generative AI agent responsible for anomaly detection and alerting in a predictive maintenance context. In this scenario, various electric motors in the field continuously send sensor data. Each motor is assigned a unique ID registered in a company asset database, complemented by vendor PDF manuals for each motor make and model. An agent evaluates the sensor readings and alerts the on-call personnel if any anomalies are detected.

Sensor data is received in JSON format, including a motor ID field. Our agent utilizes this ID to identify the motor’s make and model, subsequently employing the Retrieval Augmented Generation (RAG) technique to query the motor’s operating manual from a knowledge base and compare the sensor readings with the manual’s specifications. If any readings exceed the acceptable range, the system identifies the on-call personnel for that time and notifies them via email and text.

Solution Architecture

We leverage Amazon Bedrock Agents to construct our agent. Amazon Bedrock Agents enable the automation of multi-step tasks by seamlessly integrating with company systems, APIs, and data sources. Additionally, we utilize Amazon Bedrock Knowledge Bases, a fully managed capability, to facilitate the entire RAG workflow from ingestion to retrieval and prompt augmentation.

The architecture diagram below illustrates the solution using a Bedrock agent for anomaly detection in equipment telemetry data:

[Insert Diagram Here]

The equipment manuals, stored in PDF format within an Amazon S3 bucket, are indexed by Amazon Bedrock Knowledge Bases to manage unstructured data for RAG. This service synchronizes data from the S3 bucket, parses and segments documents into manageable chunks, generates vector embeddings for these chunks with the chosen embedding models in Amazon Bedrock, and stores the embeddings in a vector repository using Amazon OpenSearch Serverless Vector Index.

The inference flow begins with applications invoking the agent published on Amazon Bedrock, providing JSON input data that includes aggregated sensor readings from industrial equipment. The Amazon Bedrock Agent employs foundational model (FM) reasoning, APIs, and input data to dissect requests and execute tasks per the agent’s instructions.

The agent decides to perform a lookup action from the defined action groups to retrieve data about the specific equipment from the company asset databases. The agent also invokes retrieval from the knowledge base to gather vendor-supplied operating guidance for the particular equipment type. Using all available context, the Amazon Bedrock Agent determines if an anomaly exists in the sensor data. If an anomaly is detected, the agent generates a summary and triggers a notification action from the defined action groups to alert field operation teams.

The solution’s performance is monitored via Amazon CloudWatch and secured using AWS Identity and Access Management (IAM) roles and policies.

You can try this solution practically by following the detailed instructions in this AWS workshop. If you’re looking to enhance your skills further, consider setting learning goals that align with your career aspirations, as discussed in this insightful blog post.

Exploring Agentic Workflow

In this section, we will examine how our agentic workflow employs Chain-of-Thought (CoT) reasoning to effectively identify anomalies and notify the relevant service group.

CoT prompting enhances the reasoning capabilities of FMs by breaking down complex tasks into smaller, more manageable steps, mimicking human reasoning. Unlike traditional prompting, where a language model directly provides answers based on a prompt, CoT directs the model to articulate its step-by-step thought process, known as a reasoning chain, before arriving at a conclusion. This method increases the transparency and interpretability of the model’s reasoning.

Here are the detailed agent instructions for our solution:

Extract Data:
- Parse incoming sensor data (JSON or text) to identify the motor ID (motorId) and relevant measurements.
- Use the motor ID to retrieve the corresponding motor specifications and service group information from the knowledge base.
Analyze Data:
- Compare the sensor readings against the motor’s specified operating parameters.
- Identify any values that fall outside the acceptable ranges defined in the equipment manual. Ensure that the units in the provided sensor readings align with those in the equipment manual.
Evaluate and Summarize:
- Determine if any detected deviations constitute an anomaly.
- If an anomaly is found, concisely summarize:
  1. The nature of the anomaly
  2. Which parameters are affected
  3. Potential root causes
  4. Severity of the issue
Notify:
- If an anomaly is detected, use the provided notification tool to alert the appropriate field service group.
- Include in the notification:
  1. Motor ID and model
  2. Anomaly summary
  3. Relevant sensor data
  4. Recommended next steps (if applicable)

Always prioritize accuracy in data interpretation and clear, concise communication in your responses and notifications.

Now that we have outlined the agent’s instructions, let’s walk through how CoT reasoning is utilized to complete these steps.

[Insert Figure Here]

Before we delve into this walkthrough, let’s clarify three essential concepts:

Rationale: This encompasses the reasoning, based on the input, guiding the agent in executing an action group or retrieving information from a knowledge base.
Action: Contains details regarding the action group.

For more in-depth insights into hiring considerations in the UAE, refer to this comprehensive article from the Society for Human Resource Management. This resource can significantly enhance your understanding of the region’s employment landscape.

If you’re looking for a great resource on what to expect in your first six months at Amazon, check out this helpful guide.

Boost Predictive Maintenance with Generative AI Agents on AWS | AWS for Industries

Solution Architecture

Exploring Agentic Workflow

Related Topics: