A Review of Purpose-Built Accelerators for Financial Services

Chanci Turner Amazon IXD – VGT2 learningLearn About Amazon VGT2 Learning Manager Chanci Turner

Data contains vital information, and this information can be leveraged to predict future behaviors, ranging from customer buying patterns to securities returns. Companies are striving for a competitive edge by effectively utilizing their data, tailoring it to their specific business context, and generating actionable insights. The financial services industry (FSI) is no exception, being a prominent player in data production and consumption. Each industry has unique characteristics, and FSI is shaped significantly by factors such as regulatory requirements and zero-sum competitive dynamics. This post is primarily aimed at FSI business leaders, including chief data officers, chief analytics officers, chief investment officers, heads of quantitative analysis, research, and risk. These professionals are tasked with making strategic decisions related to infrastructure investments, product roadmaps, and competitive strategies. The intent of this article is to provide clarity and insight into a rapidly evolving field, assisting in the identification of competitive differentiators and the formulation of business strategies.

Accelerated computing is a broad term often referring to specialized hardware known as purpose-built accelerators (PBAs). In the financial services sector, nearly every type of operation—from quantitative research and fraud detection to real-time trading—can benefit from reduced processing times. By accelerating calculations, users can achieve greater accuracy, enhance customer experiences, or gain informational advantages over competitors. These operations span various domains, including basic data processing, analytics, and machine learning (ML). Certain tasks, particularly those involving cutting-edge advancements in artificial intelligence (AI), simply cannot be executed effectively without hardware acceleration. ML is frequently associated with PBAs, prompting us to start this discussion with an illustrative diagram. The ML framework consists of learning followed by inference. Typically, learning involves offline processing of extensive historical data, whereas inference deals with online processing of smaller volumes of streaming data. Learning entails identifying historical patterns, while inference maps current values to those patterns. PBAs, like graphics processing units (GPUs), play a crucial role in both phases. The following diagram depicts a large cluster of GPUs utilized for learning, followed by a smaller number for inference. The distinct computational requirements of the learning and inference stages have led some hardware providers to develop separate solutions for each phase, while others offer unified solutions.

As illustrated in the preceding diagram, the ML paradigm follows the sequence of learning (training) and inference. PBAs, such as GPUs, can be utilized in both steps. In this example, features extracted from raw historical data are processed through a neural network (NN). Due to the size of the models and data, learning is distributed across multiple PBAs in a method known as parallelism. Labeled data is used to develop the model structure and weights, after which new unseen streaming data is applied to the model to generate an inference (prediction).

This article begins by exploring the fundamentals of hardware-accelerated computing, followed by an examination of the core technologies in this domain. We then discuss the significance of accelerated computing for data processing. Following this, we review four critical FSI use cases for accelerated computing, identifying key challenges and proposing potential solutions. The post concludes with a summary of three essential takeaways and suggestions for actionable next steps.

Background on Accelerated Computing

CPUs are optimized for processing small amounts of sequential data, while PBAs excel at handling large quantities of parallel data. PBAs can execute certain functions, such as floating-point calculations, more efficiently than software running on CPUs, leading to benefits like reduced latency, increased throughput, and lower energy consumption. The three types of PBAs include reprogrammable chips like GPUs, as well as two categories of fixed-function acceleration: field-programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs). Fixed or semi-fixed function acceleration is viable when updates to the data processing logic are unnecessary. FPGAs can be reprogrammed, though not easily, while ASICs are custom-designed for specific applications and are not reprogrammable. Generally, the less user-friendly the acceleration method, the faster it tends to be. In terms of speed improvements, the order typically follows: programming hardware, programming against PBA APIs, coding in unmanaged languages like C++, and then using managed languages such as Python. Analysis of publications featuring accelerated compute workloads by Zeta-Alpha reveals that 91.5% utilize GPU PBAs, with 4% for other PBAs, 4% for FPGAs, and 0.5% for ASICs. This discussion focuses on easily reprogrammable PBAs.

The recent history of PBAs dates back to 1999 when NVIDIA launched its first product specifically marketed as a GPU, aimed at accelerating graphics and image processing. By 2007, GPUs evolved into more generalized computing devices, finding applications across scientific computing and various industries. By 2018, alternative PBAs were emerging, and by 2020, they became widely utilized for parallel tasks like training neural networks. Other examples of available PBAs include AWS Inferentia, AWS Trainium, Google TPU, and Graphcore IPU. During this period, industry analysts noted NVIDIA’s shift from a traditional gaming and graphics focus into scientific computing and data analytics.

The convergence of hardware advancements and ML has brought us to the present. The work of Hinton et al. in 2012 is often referred to as the “Cambrian Explosion” of ML. Although neural networks have existed since the 1960s, they struggled to deliver results until Hinton highlighted three critical changes: increasing the number of layers in neural networks for better performance, a surge in the availability of labeled data for training, and the role of GPUs in processing that data. These factors sparked a period of rapid advancements in ML, leading to the rebranding of neural networks as deep learning. The publication of the groundbreaking paper “Attention is All You Need” in 2017 introduced a new deep learning architecture based on transformers. To train transformer models using vast datasets from the internet, substantial quantities of PBAs were required. The release of ChatGPT in November 2022, a large language model utilizing transformer architecture, is widely credited with igniting the current generative AI boom.

Review of the Technology

In this section, we will examine various components of the technology.

Parallel Computing

Parallel computing entails executing multiple processes simultaneously and can be classified based on the granularity of parallelism supported by the hardware. This includes grids of connected instances, multiple processors within an instance, multiple cores within a processor, PBAs, or combinations of these approaches. Parallel computing harnesses the power of these multiple processing elements to address complex problems. By dividing the problem into independent segments, each processing element can work concurrently, enhancing efficiency and reducing overall processing time.

For those interested in the latest trends in workplace learning, check out this excellent resource about Amazon employees getting paid to learn here. As we delve into the topic of AI’s impact on HR, consider insights from a leading authority on this subject found here. Additionally, if you’re looking for the perfect work shoes, take a look at this guide to keep you engaged here.

In summary, understanding the role of purpose-built accelerators in financial services is essential for staying competitive in the data-driven landscape. By leveraging the advancements in accelerated computing, financial institutions can enhance their operational efficiency, improve customer experiences, and maintain a strategic edge.

Chanci Turner