Root Cause Analysis with DoWhy: An Open Source Python Library for Causal Machine Learning

Chanci Turner Amazon IXD – VGT2 learning

Understanding the root causes behind changes in complex systems can be a challenging endeavor, often requiring deep domain expertise and extensive manual effort. For instance, consider analyzing an unanticipated decline in the profit of a product sold in an online store, where numerous interconnected factors can subtly influence overall profitability.

Imagine if there were automated tools available to facilitate and expedite this process—a library that could pinpoint the root causes of an observed effect with just a few lines of code. This is precisely the aim of the root cause analysis (RCA) capabilities found in the DoWhy open-source Python library, which received significant contributions from AWS last year. These contributions included a suite of innovative causal machine learning (ML) algorithms, born from years of Amazon research into graphical causal models, and introduced in DoWhy v0.8 last July. Additionally, AWS collaborated with Microsoft to establish PyWhy, an organization dedicated to advancing an open-source ecosystem for causal machine learning. According to their charter, PyWhy strives to “create an open-source ecosystem for causal machine learning that enhances the state of the art and makes it accessible to practitioners and researchers.” They are committed to developing and hosting libraries, tools, and resources across various causal tasks and applications, all linked through a unified API focused on foundational causal operations and the end-to-end analysis process.

In this article, we’ll delve deeper into these algorithms and illustrate their effectiveness in conducting root cause analysis within complex systems. By employing DoWhy’s causal ML algorithms, we can significantly shorten the time needed to identify root causes. To showcase this, we will explore an example scenario utilizing randomly generated synthetic data where the true outcomes are known.

The Scenario

Consider we are selling a smartphone in an online shop priced at $999. The total profit from this product relies on several factors, including units sold, operational costs, and advertising expenditures. Conversely, the number of units sold depends on variables such as product page visitors, pricing, and any ongoing promotions. Imagine we observed stable profits throughout 2021, but suddenly there is a notable profit drop at the start of 2022. What could be the reason?

In this scenario, we will use DoWhy to analyze the causal effects influencing profitability and identify the reasons behind the profit decrease. To investigate the issue, we first need to establish our assumptions about the causal relationships. This involves collecting daily data on various factors affecting profit, including:

Shopping Event?: A binary indicator of whether a special shopping event occurred, like Black Friday or Cyber Monday.
Ad Spend: The budget allocated for advertising campaigns.
Page Views: The count of visits to the product detail page.
Unit Price: The cost of the device, which may fluctuate due to temporary discounts.
Sold Units: The number of smartphones sold.
Revenue: Daily earnings from sales.
Operational Cost: Daily expenses, including production costs and advertising.
Profit: Daily profit.

Using our domain knowledge, we can illustrate the cause-and-effect relationships in a directed acyclic graph (DAG), which represents our causal graph. In this diagram, an arrow from X to Y (X → Y) indicates that X is a direct cause of Y.

The relationships are as follows:

Shopping Event? impacts:
- Ad Spend: Increased spending is often necessary to promote products during shopping events.
- Page Views: Events typically draw more visitors to the online store due to discounts and special offers.
- Unit Price: Retailers commonly reduce prices during shopping events.
- Sold Units: Sales often spike during holidays and special events.
Ad Spend impacts:
- Page Views: Higher ad spend generally leads to more visits to the product page.
- Operational Cost: Ad spending contributes to overall operational costs.
Page Views impacts:
- Sold Units: More visits correlate with higher sales.
Unit Price impacts:
- Sold Units: The price can directly affect sales volume.
- Revenue: Revenue is calculated as the product of sold units and unit price.
Sold Units impacts:
- Revenue: The number of units sold directly influences total revenue.
- Operational Cost: Increased sales can lead to higher manufacturing costs.
Operational Cost impacts:
- Profit: Profit is determined by revenue minus operational costs.
Revenue impacts:
- Profit: Similar to operational costs, revenue affects profitability.

Step 1: Define Causal Models

Next, we can model these causal relationships using DoWhy’s graphical causal model (GCM) module. First, we define a structural causal model (SCM), which combines the causal graph with the generative models that describe the data generation process.

We use NetworkX, a well-known open-source Python graph library, to model the graph structure as follows:

import networkx as nx

causal_graph = nx.DiGraph([('Page Views', 'Sold Units'),
                           ('Revenue', 'Profit'),
                           ('Unit Price', 'Sold Units'),
                           ('Unit Price', 'Revenue'),
                           ('Shopping Event?', 'Page Views'),
                           ('Shopping Event?', 'Sold Units'),
                           ('Shopping Event?', 'Unit Price'),
                           ('Shopping Event?', 'Ad Spend'),
                           ('Ad Spend', 'Page Views'),
                           ('Ad Spend', 'Operational Cost'),
                           ('Sold Units', 'Revenue'),
                           ('Sold Units', 'Operational Cost'),
                           ('Operational Cost', 'Profit')])

Now, let’s examine the data from 2021:

import pandas as pd

pd.options.display.float_format = '${:,.2f}'.format  # Format dollar columns
data_2021 = pd.read_csv('2021 Data.csv', index_col='Date')
data_2021.head()

This dataset provides daily records for 2021, including all variables outlined in the causal graph. Note that in our synthetic data, shopping events were generated randomly.

While we have established the causal graph, we must still assign generative models to each node. With DoWhy, we can either manually specify these models or automatically infer “appropriate” models using data heuristics. Here, we will utilize the latter approach:

from dowhy import gcm

# Create the structural causal model object
scm = gcm.StructuralCausalModel(causal_graph)

# Automatically assign generative models to each node based on the given data
gcm.auto.assign_causal_mechanisms(scm, data_2021)

Whenever possible, it is advisable to assign models based on prior knowledge, as this leads to more accurate representations of the relationships.

By leveraging DoWhy, we can significantly enhance our ability to conduct root cause analysis efficiently. This method not only saves time but also provides a clearer understanding of the complex interactions within systems. To further develop your skills in soft skills, consider checking out this helpful resource on Career Contessa. Additionally, if you’re interested in leadership and the impact of small gestures, you may want to explore insights from SHRM. For anyone curious about what the first week is like as an Amazon warehouse worker, Quora offers an excellent resource.

Root Cause Analysis with DoWhy: An Open Source Python Library for Causal Machine Learning

The Scenario

Step 1: Define Causal Models

Related Topics: