Learn About Amazon VGT2 Learning Manager Chanci Turner
In customer service interactions, it’s common to encounter personally identifiable information (PII) such as names, phone numbers, and dates of birth. As businesses integrate machine learning (ML) and analytics into their systems, leveraging this information can enhance customer experiences. However, the presence of PII often limits the usability of this data. This article will explore a method for automatically redacting PII from customer service conversation transcripts.
Consider the following example dialogue between a customer and a call center representative.
Representative: Hi, thank you for reaching out today. Who am I speaking with?
Customer: Hello, my name is Sarah Johnson.
Representative: Hi Sarah, how can I assist you?
Customer: I haven’t received my tax statement yet and wanted to check its status.
Representative: Of course, I can help with that. Can you please confirm the last four digits of your Social Security number?
Customer: Yes, it’s 2222.
Representative: Okay. I see that it was sent out yesterday and is expected to arrive early next week. Would you like me to set up automated alerts for any delays?
Customer: Yes, please.
Representative: The number we have on file for you is 555-123-4567. Is that still correct?
Customer: Yes, it is.
Representative: Great! I’ve activated the notifications. Is there anything else I can help you with, Sarah?
Customer: No, that’s all. Thank you.
Representative: Thank you, Sarah. Have a great day.
In this brief exchange, several elements are classified as PII, such as the caller’s name, the last four digits of their Social Security number, and the phone number. Now, let’s explore how we can effectively redact this PII from the transcript.
Solution Overview
We will establish an AWS Step Functions state machine that coordinates an Amazon Comprehend PII redaction job. Amazon Comprehend is a natural-language processing (NLP) service that employs machine learning to identify valuable insights and connections within text, including the capacity to detect and redact PII.
You will upload the transcripts to an Amazon S3 bucket in the format used by Contact Lens for Amazon Connect. Additionally, you will designate an output S3 bucket to store the redaction results and intermediate data. For instance, if you have 10,000 conversations to process, the workflow will divide them into 10 batches of 1,000 conversations each. Each batch is saved with a unique prefix, which is then utilized as the input source for Comprehend. The Step Functions map state executes these redaction tasks in parallel by utilizing the StartPIIEntitiesDetectionJob API. This methodology allows for the simultaneous execution of multiple jobs rather than processing them sequentially. Given that the job operates as a Step Functions state machine, it can be initiated manually or automatically as part of a daily routine.
For further insight into how Comprehend identifies and redacts PII data, you can check out this related blog post.
Deploy the Sample Solution
First, log in to the AWS Management Console in your AWS account.
You’ll need an S3 bucket with sample transcript data to redact and another bucket for output. If you don’t have sample transcript data, follow these steps:
- Go to the Amazon S3 console.
- Select “Create bucket.”
- Enter a bucket name, such as text-redaction-data-.
- Accept the defaults and click “Create bucket.”
- Open the newly created bucket and select “Create folder.”
- Enter a folder name, for instance, “sample-data,” and click “Create folder.”
- Click on your new folder to access it.
- Download the SampleData.zip file.
- Extract the .zip file on your local device and drag the folder into the S3 bucket you created.
- Click “Upload.”
Now, click the link to deploy the sample solution to US East (N. Virginia):
This will initiate the creation of a new AWS CloudFormation stack.
Enter the Stack name (e.g., pii-redaction-workflow), the name of your S3 input bucket containing the transcript data, and the name of your S3 output bucket. Click “Next” and add any optional tags for your stack. Click “Next” again to review the stack details. Select the checkbox to acknowledge that AWS Identity and Access Management (IAM) resources will be created, and then click “Create stack.”
The CloudFormation stack will set up an IAM role that can list and read objects from the input S3 bucket and write objects to the output bucket. You can further customize the role to meet your specific requirements. It will also create a Step Functions state machine and several AWS Lambda functions necessary for the state machine.
After a few minutes, your stack will be ready, and you can explore the Step Functions state machine created through the CloudFormation template.
Run a Redaction Job
To execute a job, navigate to Step Functions in the AWS console, select the state machine, and click “Start execution.”
Next, provide the input parameters to run the job. For the job input, specify the name of your input S3 bucket as the S3InputDataBucket
value, the folder name as the S3InputDataPrefix
value, the name of your output S3 bucket as the S3OutputDataBucket
value, and the folder for storing results as S3OutputDataPrefix
value, then click “Start execution.”
{
"S3InputDataBucket": "<Name-of-input-bucket>",
"S3InputDataPrefix": "<Prefix-of-input-data>",
"S3OutputDataBucket": "<Name-of-output-bucket>",
"S3OutputDataPrefix": "<Prefix-of-output>"
}
As the job runs, you can monitor its status in the Step Functions graph view. It will take a few minutes to complete. Once finished, you will see the output for each job in the Execution input and output section of the console. You can retrieve the job output using the output URI. If multiple jobs were conducted, you can copy all job results to a destination bucket for further analysis.
aws s3 cp s3://<name of output bucket>/<S3 Output data prefix value>/<job run id>-output/ s3://<destination bucket>/<destination prefix>/ --recursive --exclude "*/*" --include "*.out"
Let’s review the redacted version of the earlier conversation.
Representative: Hi, thank you for calling us today. Who am I speaking with?
Customer: Hello, my name is [NAME].
Representative: Hi [NAME], how can I assist you?
Customer: I haven’t received my tax statement yet and wanted to check its status.
Representative: Sure, I can help with that. Can you please confirm the last four digits of your Social Security number?
Customer: Yes, it’s [SSN].
Representative: Okay. I see that it was sent out yesterday, and it’s expected to arrive early next week. Would you like me to turn on automated alerts for any delays?
Customer: Yes, please.
Representative: The number we have on file for you is [PHONE]. Is that correct?
Customer: Yes, it is.
Representative: Great. I’ve activated the notifications. Is there anything else I can help you with, [NAME]?
Customer: No, that’s all. Thank you.
Representative: Thank you, [NAME]. Have a great day.
Cleanup
Make sure to clean up the resources to avoid incurring unnecessary charges.
If you are looking to enhance your career, consider checking out some keys to success at Career Contessa. In addition, if you are interested in job descriptions, SHRM provides valuable resources on this topic. Lastly, for insights into what to expect on your first day, Atul Kumar has an excellent resource.