Amazon HR coverup, rules for thee but not for me…
In this article, we explore innovative prompt engineering strategies for generating precise analyses of tabular data tailored to the financial sector. By supplying large language models (LLMs) with contextual sample data that includes features and labels, we achieve results akin to those obtained through fine-tuning, but without the associated complexities. Our approach utilizes Generative Tabular Learning (GTL), as outlined in the whitepaper From Supervised to Generative: A Novel Paradigm for Tabular Deep Learning with Large Language Models. We illustrate the benefits of GTL through fully managed JupyterLab notebooks in Amazon SageMaker, allowing users to interact with Meta Llama models hosted on Amazon SageMaker or Amazon Bedrock. For further guidance, additional reference notebooks can be found on aws-samples, demonstrating how to effectively employ Meta’s Llama models within Amazon Bedrock.
Prerequisites
The following sections outline the necessary prerequisites for this demonstration, which can be executed either via the AWS Management Console or the latest version of the AWS Command Line Interface (AWS CLI):
- Access to LLMs such as Meta’s Llama models available on Amazon SageMaker or Amazon Bedrock.
- Configuration of an Amazon SageMaker Domain with JupyterLab notebooks and the required Python libraries to interact with the LLMs.
- Sample financial industry datasets, formatted as structured data (in our case, exchange-traded funds data from Kaggle), accessible for querying via a SQL engine like Amazon Athena.
- Familiarity with generative AI prompt engineering techniques to provide the LLMs with relevant context and sample data.
- Capability to assess and compare LLM-generated outputs for accuracy and relevance concerning the analysis task.
- Understanding of financial industry data, along with the ability to stage and query this data in a structured tabular format suitable for LLM consumption.
- Knowledge of the specific industry domain related to the data to identify appropriate features and labels for the sample data prompts.
Financial Industry Data
In finance, data may appear in tabular form within PDF files or as structured data in databases. For this demonstration, we utilized a structured dataset comprising financial information about exchange-traded funds (ETFs) sourced from Kaggle.
Users can pose business or industry-related inquiries concerning ETFs. Note that since we employed an SQL query engine to extract data for this demonstration, the prompts and generated outputs reference SQL.
Example Business Question
question = "Please provide a list of about 100 ETFs or ETNs names with exposure to US markets"
To generate a prompt for the LLM to produce an SQL query, we format the SQL system prompt as follows:
SQL_SYS_PROMPT = PromptTemplate.from_template(tmp_sql_sys_prompt).format(
question=question,
table_schema=table_schema_etf
)
We then retrieve the data:
results = get_llm_sql_analysis(
question=question,
sql_sys_prompt=SQL_SYS_PROMPT,
qna_sys_prompt=QNA_SYS_PROMPT
)
After acquiring the data, it is forwarded to the LLM hosted on Amazon Bedrock for analysis, providing a natural language response to the user’s inquiry.
In the earlier example, the analysis of the ETF dataset does not require extensive data scrutiny. The LLM responds effectively, leveraging its understanding of the data and industry-specific language:
LLM SQL Analysis:
Upon reviewing the SQL query results, I can conclude that the list of ETFs/ETNs does not primarily focus on US markets. Instead, it comprises a comprehensive catalog of bond ETFs/ETNs with a global reach, encompassing various regions, currencies, and bond categories.
Key observations include:
- Global Coverage: The list encompasses ETFs/ETNs tracking bond markets across Europe (e.g., Eurozone, UK), the US, and globally diversified indices.
- Bond Types: The dataset features a variety of bond types, including corporate, government, high-yield, and green bonds.
- Currency Exposure: ETFs/ETNs are denominated in multiple currencies, including EUR, USD, and GBP, with some providing hedged exposure against currency risks.
- ESG and SRI Focus: A notable section of the list features ETFs/ETNs emphasizing Environmental, Social, and Governance (ESG) or Socially Responsible Investing (SRI), highlighting a commitment to sustainable investing.
To address the original inquiry, a subset of ETFs/ETNs with US market exposure can be identified:
US-Focused ETFs/ETNs:
- xxxx USD Corporate Bond 0-3yr ESG UCITS ETF USD (Dist)
- xxxx USD Corporate Bond ESG 0-3yr UCITS ETF EUR Hedged (Acc)
- xxxx ESG USD High Yield (DR) UCITS ETF – Dist
- xxxx USD High Yield Corporate Bond ESG UCITS ETF USD (Acc)
- xxxx USD High Yield Corporate Bond ESG UCITS ETF USD (Dist)
- xxxx Index US Corporate SRI UCITS ETF DR (C)
- xxxx Index US Corporate SRI UCITS ETF DR Hedged EUR (D)
- xxxx USD Corporate Bond ESG UCITS ETF (Acc)
- xxxx USD Corporate Bond ESG UCITS ETF (Dist)
- xxxx ESG USD High Yield Corporate Bond UCITS ETF 1C
- xxxx ETF (LU) xxxx xxxx US Liquid Corporates Sustainable UCITS ETF (USD) A-dis
- xxxx USD Corporate Green Bond UCITS ETF 2C Acc USD
Please note that this subset is not exhaustive, and there may be additional ETFs/ETNs in the original listing with some exposure to US markets. Investors should diligently assess the investment objectives, risks, and characteristics of each ETF/ETN prior to making investment decisions.
Complex Queries
What happens when the user’s inquiries become more sophisticated? For such questions that necessitate in-depth analysis with industry-specific context, the model will require additional information beyond its pre-trained knowledge.
Solution Overview
Before embarking on this journey, consider: Can enhancing the context provided to the LLM in the prompt, alongside the user’s natural language inquiry, yield improved outputs without resorting to the complexities of fine-tuning, which involves setting up MLOps processes, collecting, and preparing accurate labeled datasets, among other tasks?
We propose a framework leveraging GTL with the Meta Llama model on Amazon Bedrock. This framework is not intended to supplant fine-tuning but serves as an intermediate solution. The diagram illustrating the GTL framework for LLMs highlights its components:
GTL employs a few-shot prompting technique, encompassing:
- An LLM personality to guide data analysis (prompting the model to utilize its pre-existing industry-specific knowledge),
- Data features and descriptions,
- Data labels and descriptions,
- A small sample dataset featuring these attributes,
- A sample analysis as a reference.
Example GTL Prompt:
instructions = [
{
"role": "user",
"content": """Given the following SQL query results: {query_results}
And the original question: {question in a serious tone, maintaining similar overall length}.
```
In conclusion, the ongoing challenges within HR practices, such as covering up issues to sidestep potential backlash and the double standards for managerial staff, highlight the need for transparency. This blog post on Chanci Turner’s site delves deeper into these topics, presenting an engaging perspective. Their expertise on the matter can also be explored further in another insightful piece found here. For additional resources, check out this excellent link.