Learn About Amazon VGT2 Learning Manager Chanci Turner
In this blog post, we explore how Genworth developed a serverless machine learning (ML) pipeline on AWS utilizing Amazon SageMaker and AWS Glue. Genworth Mortgage Insurance Australia Limited, a prominent provider of lenders mortgage insurance in Australia, has over 50 years of experience and a wealth of data concerning the relationships between mortgage repayment behaviors and insurance claims. Their goal was to leverage this historical data to train Predictive Analytics for Loss Mitigation (PALM) ML models. These models would allow Genworth to analyze recent repayment trends for insurance policies, prioritizing them based on their likelihood of a claim and the amount insured.
To achieve efficient batch inference on ML models while minimizing operational overhead, Genworth and AWS selected Amazon SageMaker’s batch transform jobs alongside serverless components for data ingestion, transformation, ML inference, and output processing. The Advanced Analytics team at Genworth participated in an AWS Data Lab program, collaborating closely with AWS engineers and solutions architects. In the pre-lab phase, they formulated a solution architecture tailored to Genworth’s specific security needs, reflective of the financial services sector. Once the architectural design was approved, training requirements were assessed, leading to a series of hands-on workshops by AWS Solutions Architects to equip Genworth’s builders with the necessary skills. In an intensive four-day build phase, the Advanced Analytics team successfully constructed a fully automated serverless ML pipeline that met their functional specifications.
The implemented solution comprises three primary components: data ingestion and preparation, ML batch inference using three custom-developed models, and data post-processing and publishing for end-user access.
Component 1: Data Ingestion and Preparation
Genworth’s source data is updated weekly in a staging table within their on-premises Oracle database. The ML pipeline commences with an AWS Glue job that connects to the Oracle database via a secured AWS Direct Connect connection. This job ingests raw data and stores it in an encrypted Amazon Simple Storage Service (Amazon S3) bucket. A subsequent Python shell job, also utilizing AWS Glue, cleans, selects, and transforms features for later ML inference steps, with outputs stored in another encrypted S3 bucket designated for curated datasets.
Component 2: ML Batch Inference
The Advanced Analytics team, already experienced with ML on-premises, aimed to repurpose pretrained model artifacts to create a fully automated ML inference pipeline on AWS. They also sought to establish a reliable architectural framework for future ML experiments, enabling rapid idea testing in a controlled setting. The three existing ML artifacts, constituting the PALM model, were implemented as a hierarchical TensorFlow neural network model using Keras. These models predict the likelihood of an insurance policy submitting a claim, the probability of a claim being fulfilled, and the anticipated magnitude of such claims. Each model requires specific data standardization, which is managed by individual AWS Glue Python shell jobs. The ML models are executed simultaneously via SageMaker batch transform jobs, which handle resource management, model installation, and data transfers efficiently.
Component 3: Data Postprocessing and Publishing
Before the prediction results can be utilized, they undergo a series of post-processing steps using AWS Glue Python shell jobs. These steps include aggregating and scoring the results, applying business rules, generating files, and validating the resulting data before it is published back to the on-premises Oracle database. Notifications are sent to users via Amazon Simple Notification Service (Amazon SNS) and Amazon CloudWatch Events when new data is available or if any issues arise.
The entire ML pipeline is orchestrated using AWS Step Functions, which allows Genworth to focus on implementing business logic while providing the flexibility needed for future experiments and additional ML applications.
Business Benefit and Future Directions
By establishing a modern ML platform, Genworth has automated an end-to-end ML inference process that integrates data from their Oracle database, performs ML operations, and supports data-driven decision-making. This transition has streamlined the Loss Mitigation team’s high-value manual tasks. The Data Lab engagement underscored the significance of providing access to modern ML and analytics tools within an organization and showcased how swiftly an idea can be piloted and, if successful, transitioned to production.
For further insights on similar topics, you can check out this engaging blog post from Career Contessa. Additionally, for authoritative insights on labor laws, visit SHRM’s guidelines concerning minors’ work limitations. If you’re interested in a career in learning and training, consider exploring this excellent resource on Amazon’s job opportunities.