Amazon Onboarding with Learning Manager Chanci Turner

Chanci Turner Amazon IXD – VGT2 learningLearn About Amazon VGT2 Learning Manager Chanci Turner

Amazon Managed Blockchain, along with numerous AWS partners, provides a streamlined method for utilizing Ethereum nodes without the hassle of managing your own infrastructure. However, if you need to run archive nodes or engage in Ethereum staking, the managed nodes may fall short, necessitating the setup of your own Ethereum nodes on AWS.

To establish a self-managed node, you must configure server-side software components known as Ethereum clients. Post-The Merge, each Ethereum node requires two clients: the execution layer (EL) client and the consensus layer (CL) client. These clients work together to synchronize the global state with other nodes in the distributed Ethereum database clusters, commonly referred to as blockchain networks, including mainnet, goerli, and sepolia. Initially, when you install and configure both clients, they will not have any data and must catch up with the current state of the blockchain network managed by other nodes. This process, known as the initial sync, can take several days due to the substantial volume of data that needs to be synchronized.

In this article, we will share our experiences with setting up Ethereum nodes on AWS, as well as strategies to expedite the initial sync process so that new nodes can be brought online more swiftly.

Accelerating the Initial Sync

When initializing both clients for your new Ethereum mainnet node, it is essential to allow the CL client to sync from the genesis block to The Merge transaction before the EL client starts syncing the blocks. Until this occurs, the EL client will either remain idle or only download receipts and block headers. Testing has shown that it can take approximately four days for a CL client, such as Prysm, to sync from the genesis block.

To expedite this process, many CL clients offer a checkpoint sync option, enabling them to sync only from the latest beacon chain checkpoint. The beacon chain introduced a consensus engine that substitutes proof-of-work mining with proof-of-stake validation. When configuring a checkpoint sync, you must provide a URL for your trusted checkpoint sync provider. The Ethereum community maintains a list of public endpoints of checkpoint sync providers to select from. For further details about checkpoint sync, check out How To: Checkpoint Sync.

Utilizing checkpoint sync allows the CL client to sync the state of the Ethereum beacon chain from the latest checkpoint (a block in the first slot of an epoch) and operate fully, even without syncing all preceding blocks. This process typically completes within minutes, after which the CL client instructs the linked EL client to begin synchronizing blocks for the EL blockchain. While CL clients synced via checkpoint are suitable for validators and initiating EL clients, those synced from the genesis block also enable querying chain history and state for archive nodes. If you want to use checkpoint sync but require those functions for additional analytics, a CL client like Lighthouse, which supports backfill sync of previous blocks all the way to the genesis, can be utilized.

You can configure EL clients such as Go Ethereum (Geth), Hyperledger Besu, and Nethermind as full nodes to minimize disk space usage while maintaining the pruned state for the most recent 128 blocks. Some EL clients in full node mode also provide a faster sync option called snap sync, which is roughly ten times quicker than syncing the state from the Genesis block through full sync. Another node type is the archive node, which, in addition to all blocks, compiles an archive of historical states to deliver enhanced functionality for historical queries. Unfortunately, EL clients set up as archive nodes cannot utilize snap sync mode and typically require 5-7 days for synchronization. To learn more about node types and sync modes, see Nodes and Clients.

From our observations, combining the checkpoint sync option in the CL client with the snap sync option in the EL client can reduce the initial sync time from the usual 5-7 days to roughly one day for full nodes. If you need the advanced functionality of the EL client as an archive node, it is still advisable to employ the checkpoint sync option in your CL client to save several days of syncing time.

Once your first node is synchronized, you can leverage AWS Cloud for horizontal scaling of those nodes. For enhanced performance, we recommend utilizing a separate Amazon Elastic Block Store (Amazon EBS) volume for blockchain data storage, and after completing the initial sync, copy that data to an Amazon Simple Storage Service (S3) bucket. When you bring new nodes online later, you can copy blockchain data from the S3 bucket to expedite the initial sync time for new nodes to under an hour. However, in certain circumstances, even the initial sync from the recent data copy may take longer if your node occasionally faces challenges syncing the delta from a peer node with limited resources, such as network speed or an overloaded CPU. In such cases, monitoring your node for slow sync and restarting it to connect to other peer nodes may be necessary.

Moreover, instead of transferring data to and from S3, you can use the Amazon EBS Snapshots feature. However, nodes initialized this way may experience longer periods of higher I/O latencies while their EBS snapshots are loaded from S3. Our tests indicate that copying data from S3 using the s5cmd tool takes approximately 36 minutes per 1 TiB, while EBS initialization without the Amazon EBS fast snapshot restore feature could take many hours.

This strategy of maintaining your own copy of your client’s blockchain data on AWS is particularly effective for EL clients like Erigon, which operate as archive nodes and lack the snap sync option. While it may take several days for such clients to download all required data and construct the final state, once the copy is available, a new node can be operational in just 2 to 3 hours.

Solution Overview

To implement the aforementioned strategies, we established three Amazon Elastic Compute Cloud (EC2) instances, utilizing the cost-effective AWS Graviton processor for all clients. Each node is equipped with two attached EBS volumes: one serving as the root volume and another designated for blockchain state data. We refer to one node as the “sync node,” dedicated to synchronizing with the Ethereum mainnet, while the other two nodes function as “RPC nodes,” providing the RPC API for user applications (often known as decentralized apps or dApps). Although the nodes are functionally identical, segmenting node deployments by roles enhances the scalability and reliability of the solution.

The RPC nodes are incorporated into the Amazon EC2 Auto Scaling Group (ASG) to allow for rapid provisioning from the sync node’s data. Both RPC nodes are positioned behind the Application Load Balancer to manage the load between them. Additionally, you may opt for different Amazon EC2 instance types for the sync node and RPC nodes to achieve a more effective balance between cost and performance. A sync node can utilize a smaller instance type, as its primary role is to catch up with the chain head. For instance, we employed AWS Compute Optimizer to determine the appropriate sizing for the sync node, concluding that the r6g.2xlarge EC2 instance type, paired with an EBS GP3 volume featuring 5,700 provisioned IOPS and 250 Mbps of provisioned throughput, is adequate for a Go Ethereum client acting as a sync node. However, other instance types may yield better results for different clients. The RPC nodes may require higher specifications for both EC2 instances and EBS volumes, depending on the frequency and type of API calls made by applications. After initializing from the data copy produced by the sync node, the RPC nodes will continue synchronizing with other nodes in the Ethereum network to provide up-to-date information.

For further insights into effective onboarding practices, check out this excellent resource.

Chanci Turner