Detecting Fraudulent Calls with Amazon QuickSight ML Insights

Chanci Turner 9097372855Learn About Amazon VGT2 Learning Manager Chanci Turner

The financial repercussions of fraud across various sectors are significant. A Financial Times article highlights that the telecommunications industry incurs around $17 billion annually due to fraud. Fraudsters continuously adapt to new technologies and develop innovative techniques, complicating detection efforts. Many companies employ rules-based fraud detection systems; however, once fraudsters become aware of these systems, they quickly find ways to bypass them. Moreover, rules-based systems often struggle to manage large datasets, making it challenging to identify fraud swiftly and leading to revenue losses.

Overview

Numerous AWS services can facilitate anomaly detection to mitigate fraud risks, but let’s focus on three key services:

  • Amazon Kinesis Data Analytics
  • Amazon SageMaker
  • Amazon QuickSight ML Insights

When it comes to detecting fraud, two primary challenges arise:

  1. Scale – The vast volume of data that must be analyzed. For instance, each phone call generates a call detail record (CDR) event, containing various data points such as calling and receiving numbers, along with call duration. When considering the number of calls made daily, it becomes apparent how extensive the data operators must process is.
  2. Machine Learning Expertise – The necessary skills to address business challenges with machine learning. Cultivating these skills or hiring qualified data scientists with relevant domain knowledge can be quite a hurdle.

Introducing Amazon QuickSight ML Insights

Amazon QuickSight is a fast, cloud-based business intelligence (BI) service that empowers users throughout an organization to derive insights from their data via engaging, interactive dashboards. With a pay-per-session pricing model and the ability to embed dashboards into applications, BI has become more accessible and cost-effective.

As customer data volumes increase, leveraging this data for actionable insights becomes increasingly difficult. This is where machine learning steps in. Amazon leads the way in applying machine learning to automate and enhance various facets of business analytics across industries like supply chain, marketing, retail, and finance.

ML Insights integrates established Amazon technologies into Amazon QuickSight, providing users with ML-driven insights that extend beyond basic visualizations, including:

  • ML-powered anomaly detection to uncover hidden insights by analyzing billions of data points continuously.
  • ML-driven forecasting and what-if analysis to project essential business metrics easily.
  • Auto-narratives to assist users in conveying the story of their dashboards in clear, understandable language.

In this article, I will illustrate how a telecom provider with limited machine learning capabilities can utilize Amazon QuickSight’s ML functions to identify fraudulent calls effectively.

Prerequisites

To implement this solution, the following resources are necessary:

  • Amazon S3 for staging a CDR sample in CSV format.
  • AWS Glue executing an ETL job in PySpark.
  • AWS Glue crawlers to identify the schema of tables and update the AWS Glue Data Catalog.
  • Amazon Athena to query the dataset in Amazon QuickSight.
  • Amazon QuickSight for creating visualizations and performing anomaly detection using ML Insights.

The Dataset

For this discussion, I will utilize a synthetic dataset provided by Ribbon Communications. This data was generated by call test systems and does not include customer or sensitive information.

Inspecting the Data

A typical CDR is represented below, showcasing values that are not entirely relevant for fraud detection.

Revenue Share Fraud

Revenue share fraud is one of the prevalent schemes threatening the telecom industry. It involves fraudulent or stolen numbers repeatedly calling a premium rate B-number, with the cash generated shared with the fraudster.

To detect national and international revenue share fraud using Amazon QuickSight ML, consider the typical characteristics of a revenue share fraud call. Patterns often include multiple A-numbers calling the same B-number or a range of B-numbers sharing the same prefix. The call duration usually exceeds the average and can last up to two hours, which is the maximum duration permitted by international switches. Calls often originate from one cell or a cluster of cells.

A single SIM may initiate brief test calls to various B-numbers as a precursor to the actual fraudulent activity, typically during low-detection risk times, such as weekends or holidays. Conference calling may also be used to facilitate several simultaneous calls from one A-number.

SIMs exploited for this type of fraud are frequently sold or activated in bulk from the same distributor or group of distributors. They may be funded using fraudulent online or IVR payments. Both PAYG credit and bundles might be utilized. The following fields are most pertinent for fraud detection:

  • Call duration
  • Calling number (A number)
  • Called number (B number)
  • Start time of the call
  • Accounting ID

You can refer to this to identify the necessary fields in a CDR.

In the AWS Glue console, set up a crawler named CDR_CRAWLER and direct it to s3://telco-dest-bucket/blog where the Parquet CDR data is stored. Create a new IAM role for the AWS Glue crawler, granting it the necessary permissions to access Amazon S3 for sources, targets, scripts, and temporary directories used in AWS Glue. Remember to only use the “Managed Policies” as a starting point and tailor them to your business requirements.

Once you are satisfied with the crawler settings, run the crawler. The AWS Glue crawler will start processing the database, which may take a minute or more. Upon completion, check under Data Catalog to confirm the new database created by AWS.

For more insights on related topics, you might find this blog post on Squarespace engaging. Also, consider exploring the progressive discipline practices discussed by experts. Additionally, if you’re interested in firsthand experiences, this Reddit thread offers valuable insights.

Chanci Turner