Leveraging Amazon Translate for Language Support in Amazon Kendra

Chanci Turner Amazon IXD – VGT2 learning manager

Amazon Kendra is a powerful and user-friendly intelligent search service driven by machine learning (ML). While Amazon Kendra primarily supports English, this article outlines methods to extend language support for non-English users. We will illustrate these techniques through a question-answer chatbot (Q&A bot) use case, enabling users to ask questions in any language supported by Amazon Translate. Amazon Kendra will search through various documents and provide results in the language of the query. The integration of Amazon Comprehend and Amazon Translate is crucial for offering non-English language functionalities.

Our Q&A bot utilizes Amazon Simple Storage Service (Amazon S3) to store documents before they are ingested into Amazon Kendra. We employ Amazon Comprehend to identify the dominant language of the query, ensuring accurate translation for both the query and the response. Amazon Translate is used to convert the query and response between English and the user’s language, while Amazon Lex facilitates the conversational interface and interactions.

All queries, except those in English, are translated from the user’s native language into English prior to submission to Amazon Kendra. The responses generated by Amazon Kendra are also translated back into the user’s language. We have predefined translations for responses in Spanish and perform real-time translation for all other languages. The metadata attributes associated with each ingested document guide us to these predefined Spanish translations.

To illustrate these techniques, we present three use cases, assuming all languages requiring translation are supported by Amazon Translate. For Spanish-speaking users, each document (small documents are used in this Q&A bot scenario) is translated by Amazon Translate into Spanish, followed by human review. This pre-translation serves as a description for Amazon Kendra’s document ranking model.

Secondly, on-the-fly translation occurs for responses generated by the reading comprehension model for all languages except English. We will discuss the implementation of on-the-fly translation for Amazon Kendra’s various models later in this article. Thirdly, for English-speaking users, no translation is needed, allowing both the query and Amazon Kendra’s responses to be transmitted without alteration.

The following dialogue showcases the three use cases, starting with English and followed by Spanish, French, and Italian.

Translation Considerations and Prerequisites

To prepare your documents, follow these steps:

Utilize Amazon Translate to produce a Spanish version of the document and its title.
Manually review the translation for any desired adjustments.
Create a metadata file that includes the Spanish translation of the document.
Ingest the English document and the accompanying metadata file into Kendra.

The following is an example of the metadata file structure:

{
    "Attributes": {
        "_created_at": "2020-10-28T16:48:26.059730Z",
        "_source_uri": "https://aws.amazon.com/kendra/faqs/",
        "spanish_text": "R: Amazon Kendra es un servicio de búsqueda empresarial muy preciso y fácil de usar que funciona con Machine Learning.",
        "spanish_title": "P: ¿Qué es Amazon Kendra?"
    },
    "Title": "Q: What is Amazon Kendra?",
    "ContentType": "PLAIN_TEXT"
}

In this example, we have predefined attributes like _created_at and _source_uri, along with custom attributes such as spanish_text and spanish_title. For queries in Spanish, these attributes are used to formulate the response to the user. The title of the document itself can serve as a potential user query, allowing control over the translations.

For documents in other languages, Amazon Translate must be used for translation into English before ingestion into Amazon Kendra. Although we’ve not experimented with translation in diverse scenarios where document types and answers can vary significantly, we believe the techniques described here can be adapted for broader applications.

Amazon Kendra Processing Overview

With our documents prepared, we can build a chatbot using Amazon Lex. The chatbot identifies the language through Amazon Comprehend, translates the user’s query into English, submits the query to the Amazon Kendra index, and then translates the results back into the original language. This method can be applied to any language that Amazon Translate supports.

We leverage the built-in Amazon S3 connector for document ingestion and the Amazon Kendra FAQ ingestion process for entering question-answer pairs into Amazon Kendra. The ingested documents are in English, and we have manually created Spanish descriptions attached as metadata attributes. Ideally, all documents should be in English.

If your documents contain overview sections, consider using Amazon Translate to generate the metadata description attribute. For documents in other languages, you must convert them to English using Amazon Translate before ingestion into Amazon Kendra. The architecture of our solution is illustrated in the following diagram.

Setting Up Your Resources

This section outlines the steps required to implement this solution. Refer to the appendix for detailed instructions. The AWS Lambda function is critical for understanding where and how to implement translations. More specifics on the translation process will be detailed in the next section.

Download the documents and metadata files, decompress the archive, and store them in an S3 bucket to serve as the source for your Amazon Kendra S3 connector.
Set up Amazon Kendra:
- Create an Amazon Kendra index. For guidance, see Getting started with the Amazon Kendra SharePoint connector.
- Establish an Amazon Kendra S3 data source.
- Add attributes.
- Ingest the sample data source from Amazon S3 into Amazon Kendra.
Configure the fulfillment Lambda function.
Set up the chatbot.

Understanding Translation in the Fulfillment Lambda Function

The Lambda function is divided into three main components to process and respond to user queries: language detection, query submission, and returning the translated result.

Language Detection

In the first component, Amazon Comprehend is used to detect the dominant language. User input is obtained from the key inputTranscript part of the event submitted by Amazon Lex. If Amazon Comprehend lacks sufficient confidence in the detected language, it defaults to English. Here’s a code snippet:

query = event['inputTranscript']
response = comprehend.detect_dominant_language(Text=query)
confidence = response["Languages"][0]['Score']
if confidence > 0.50:
    language = response["Languages"][0]['LanguageCode']
else:
    # Default to English if confidence is low
    language = "en"

Submitting a Query

Amazon Kendra currently supports queries in various languages. This adaptability allows for a more inclusive user experience. For those interested in learning more about budgeting for a livable income, check out this insightful blog post here. Additionally, if you’re looking into employee development opportunities, SHRM offers valuable insights.

For job opportunities in this field, consider checking out this excellent resource.