The Case for Parallel Processing Chains

Mohamed Fouda
Alliance
Published in
12 min readSep 6, 2022

--

When looking at the evolution of blockchain technology, we can recognize a strong trend of new L1s that focus on parallel execution. The idea is not new and is currently used in Solana’s Sealevel execution environment. However, the last bull market which has seen impressive activity in DeFi and NFTs has shown that there is a pressing need for improvement. Some of the prominent projects adopting parallel execution philosophy are Aptos, Sui, Linera, and Fuel.

This article discusses the similarities and differences between these projects and the challenges that face them.

If you are a founder building in this domain please reach out or join the Alliance community, we would love to help.

The Problem

Smart contract platforms enable the creation of a wide spectrum of decentralized applications. To execute these applications, a shared computing engine is needed. Each node in the network runs this computing engine and executes the applications and the users’ interactions with the application. As nodes get the same result from the execution, they achieve consensus and progress the chain.

The Ethereum Virtual machine is the most dominant smart contract (SC) execution engine with about 20 different implementations. Since the invention of the EVM, it has established a critical mass of adoption by developers. In addition to Ethereum and Ethereum’s L2s, several other chains including Polygon, BNB Smart Chain, and Avalanche C-chain have adopted the EVM as the execution engine and focused on changing the consensus mechanism to improve the network throughput.

A major limiting characteristic of the EVM is sequential execution of transactions. The EVM essentially executes a single transaction at a time, places all the other transactions on hold until the execution of the transaction is finished, and the blockchain state is updated. Even when two transactions are independent, e.g., a payment from Alice to Bob and another from Carol to Dave, the EVM cannot execute these transactions in parallel. While this execution model allows for interesting use cases such as flash loans, it is neither efficient, nor scalable.

This sequential execution of transactions is one of the main bottlenecks to the throughput of the network. First, it leads to longer execution time of the transactions in a block, limiting block time. Furthermore, it limits the number of transactions that can be added to the block to allow nodes to execute the transactions and confirm the block. Ethereum has an average throughput of ~ 17 tx/sec. This low throughput means that during periods of high activity, e.g., NFT minting events, the network miners/validators cannot process all the transactions and a fee bidding war ensues to ensure priority execution pushing the transaction fees higher. Ethereum average fees at some points exceeded 0.2 ETH (~ $800) pricing out many users. A second problem of sequential execution is the inefficiency of the network nodes. Sequential instruction execution does not benefit from multiple processor cores, which results in low hardware utilization and inefficiency. This hurdles scalability and causes unnecessary energy consumption.

Parallel execution to the rescue?

The fundamental limitation in the EVM structure has set the stage for a new realm of L1s that focus on parallel execution (PE). Parallelism allows the division of transaction processing between multiple processor cores, improving hardware utilization which enables better scalability. In high-throughput chains, increasing hardware resources directly correlates to the number of transactions that can be executed. During high chain activity, the validator nodes can commission more cores to process the additional transaction load. The dynamic scaling for computing resources allows the network to achieve increased throughput at periods of high demand, significantly improving the user experience.

The other advantage of this approach is improved transaction confirmation latency. Dynamic scaling of node resources makes it possible to confirm transactions with low latency for all possible network loads. Transactions do not need to wait for tens or hundreds of blocks, nor do they incur excessive fees to prioritize confirmation. The improved confirmation times improve transaction finality and open the door to low latency blockchains. Guaranteed low latency of executing transactions enables several use cases that were not possible before.

Changing the chain execution model to allow PE is not a new idea,and several projects have already explored it. One approach is to replace the accounting model used by the EVM from an Accounts model to an Unspent Transaction Output (UTXO) model. The UTXO execution model is used in Bitcoin, and it allows parallel processing of transactions, which makes it ideal for payments. As UXTOs are limited in functionality, extensions are needed to enable the complex interactions needed for smart contracts. As examples, Cardano uses an extended UTXO model for this purpose and Findora uses a Hybrid UTXO model which implements both accounting models and allows users to change the asset type between the two models.

The other approach for PE does not change the accounts model and focuses instead on improving how the state of the chain is architected and modified. An example of this approach is Solana’s Sealevel framework. This article focuses on the latter approach.

How parallel execution works?

Parallel execution works by identifying independent transactions and executing them simultaneously. Two transactions are dependent if the execution of one will affect the execution of the other. For instance, AMM transactions in the same pool are dependent and must be executed in sequence.

Although the parallel processing concept is simple, the devil is in the details. The main challenge is to efficiently identify “independent” transactions. Classification of independent transactions requires understanding of how each transaction changes the blockchain memory or the chain state. Transactions interacting with the same smart contract, e.g., an AMM Pool, can simultaneously change the contract state, hence, cannot be simultaneously executed. With the current degree of composability between applications, identifying dependency is a challenging task. Imagine an AMM transaction that swaps Uni to USDC and the AMM router finds that the most-efficient route for executing it is Uni -> ETH -> DAI -> AAVE -> USDC. All the pools that are involved in this transaction cannot process any other transactions until the transaction is completely executed and the state of all involved pools are updated.

Identifying independent transactions

In this section, the approaches used by different parallel-execution engines are compared. The discussion is focused on approaches that control the state (memory) access. A blockchain state can be thought of as a RAM memory. Each chain account, or smart contract, owns a range of memory locations that it can modify. Dependent transactions are those that try to change the same memory locations in the same block. Different chains utilize different memory architectures and different mechanisms to recognize dependent transactions.

Several chains in this category build on the technology developed by Facebook’s demised blockchain project Diem. The Diem team has created the smart contract language Move to specifically improve SC execution. Aptos, Sui, and Linera are three high-profile projects that belong to this group. Apart from this group, Fuel is another well-known project focused on PE that uses its own SC language.

Aptos

Aptos builds on Diem’s Move language and MoveVM to create a high-throughput chain that implements parallel execution. Aptos’s approach is to detect dependencies while being transparent to the user/developer, i.e., without requiring transactions to explicitly declare which part of the state (memory locations) they use. Aptos uses a modification of the Software Transactional Memory (STM) called Block-STM. In Block-STM, the transactions are pre-ordered inside the block and are divided, during execution, between processor threads for optimistic execution. In optimistic execution, the transactions are executed assuming there are no dependencies. Memory locations that were modified by the transactions are recorded. After execution, all transaction results are validated. During validation, if a transaction is found to have accessed memory locations that are modified by preceding transactions, this transaction is invalidated. The result of the transaction is flushed, and the transaction is then re-executed. The process repeats until all transactions in the block are executed. Block-STM results in a speed up in execution when multiple processor cores are used. The speed up depends on the degree of inter-dependence between transactions. Results from the Aptos team shows that using 32 cores leads to an 8x improvement for high inter-dependence and 16x improvement for low inter-dependence. If all transactions in a block were dependent, Block-STM leads to a minor penalty in performance compared to sequential execution. Aptos claims that this approach can lead to a throughput of 160,000 TPS.

Sui

Another PE approach is to require transactions to explicitly declare the parts of chain state they modifiy. This approach is currently used by Solana and Sui. Solana calls the memory units accounts, and a transaction has to state which accounts it modifies. Sui uses a similar approach.

Sui also builds on the Diem’s technology by using the MoveVM. However, Sui uses a different version of the Move language. The Sui Move was implemented to change the storage model and asset permissions from the core Diem’s move. This represents a major difference to Aptos that uses the core Diem’s Move. Sui Move defines a state storage model that allows for easier identification of independent transactions. In Sui, state storage is defined as Objects. Objects typically represent assets and can be shared, which means that multiple users can modify the object. Each object has a unique ID within the Sui execution environment and has internal pointers to the owners’ addresses. By using these concepts, it is easy to identify dependencies by checking if transactions are using the same object.

By shifting the work to the developers to declare dependencies, the implementation of the execution engine becomes easier which means it theoretically can have better performance and scalability. However, this comes with the cost of less-than-optimal developer experience.

Sui is not launched yet and the project just recently launched their testnet. The founders of Sui claim that the implementation of parallel execution along with the using the Narwhal & Tusk consensus mechanism can lead to a throughput exceeding 100,000 tx/sec. This throughput, if true, can be a huge step up from Solana’s current throughput of ~ 2400 tx/sec and would exceed the throughput of Visa and Mastercard.

Linera

Linera is the newest addition to the parallel processing pack and recently announced their first funding round led by a16z. There are not many details about the project implementation. However, according to their funding announcement post, we know it is based on the FastPay protocol that was also developed at Facebook. Fastpay is based on a technology called Byzantine Consistent Broadcast. This technology focuses on accelerating independent payments, such as those happening in point-of-sale networks. It allows a group of validators to ensure the integrity of the payments as long as more than two-thirds of them are honest. Fastpay is a variation of the real-time gross settlement (RTGS) systems that are used in the networks among banks and financial institutions.

Building on FastPay, Linera is planning to build a blockchain that focuses on fast settlement and low latency via the execution of the payment transactions in parallel. It is important to note that Sui also uses the Byzantine Consistent Broadcast approach for simple payments. For other transactions, Sui’s own consensus mechanism, Narwhal and Tusk, is used to efficiently handle the more complex and dependent transactions such as DeFi transactions.

Fuel

Fuel focuses on being the execution layer in a modular blockchain stack. That means that Fuel does not implement consensus or stores the data of the blockchain on the Fuel chain. For a functional blockchain, Fuel interacts with other chains for consensus and data availability such as Ethereum or Celestia. This article provides a good review of the modular blockchain concept.

Fuel uses UTXO to create strict access lists, i.e., a list to control access to the same piece of the state. This model builds on the concept of Canonical Transaction Ordering. In this scheme, transaction ordering in the block leads to significant simplification of detecting dependencies between transactions. To implement this architecture Fuel has built a new virtual machine called FuelVM and a new language called Sway. FuelVM is a compatible and simplified implementation to the EVM which can be effective in onboarding developers to the Fuel ecosystem. Further, as Fuel focuses on the modular blockchain stack, Fuel SC executions can settle on the Ethereum mainnet. This approach aligns with the vision of Ethereum after the Merge as a rollup-centric settlement and data availability layer. In this architecture, Fuel can enable high-throughput executions that are batched and settled on Ethereum.

As a proof-of-concept, the Fuel team has created a Uniswap-style AMM called SwaySwap that is running on a testnet to demonstrate the improved performance of the FuelVM in comparison with EVM.

Challenges to the parallel execution approach

The parallel execution approach seems logical and straightforward. However, there are a few challenges that need to be discussed. The first is estimating the actual percentage of transactions that can be accelerated using this parallel execution. The second challenge is decentralization of the network, i.e., if validators can easily scale the computing power to boost throughput, how can full nodes that often use commodity hardware keep up to ensure the correctness of the chain?

Percentage of Parallelizable transactions

It is challenging to accurately estimate the percentage of chain transactions that can be executed in parallel in any chain. Moreover, this percentage can change significantly between blocks depending on the type of network activity. For instance, an NFT mint event can result in a burst of activity with a high percentage of dependent transactions. That said , we can use some assumptions to get a rough estimate of the average percentage of parallelizable transactions . For instance, we can assume that the majority of ETH and ERC20 transfers are independent transfers, i.e., originating from and received to different addresses. So we can assume that ~ 25% of simple ETH and ERC20 transfers are dependent, i.e., deposits to SCs and aggregating exchange hot wallets to cold wallets. On the other hand, all AMM transactions in the same pool are dependent. Given that most AMMs are often dominated by a small number of pools and that AMM trades are highly composable and interact with multiple pools, we can safely assume that at least 50% of AMM transactions are dependent.

By performing some analysis on the transaction categories in Ethereum, we can find that out of the ~ 1.2M Ethereum’s daily transactions 20–30% are ETH transfers, 10–20% are stablecoin transfers, 10–15% are DEX transfers, 4–6% are NFT trades, 8–10% are ERC20 approvals, and 12–15% are other ERC20 transfers . Using these numbers and assumptions, we can estimate that PE can accelerate roughly between 70–80% of the transactions in a SC platform. This means that the longest path of execution, i.e., sequential execution of dependent transactions can be between 20–30% of all transactions. In other words, it would be possible to achieve between 3x-5x increase in throughput by PE if the same gas limit is used. Some experimentations on building a parallel execution EVM have shown similar estimates where 3–5x improvement in throughput can be consistently achievable. In practice, high-throughput chains use much higher gas limit per block and shorter block times to achieve at least a 100x throughput improvement over Ethereum. The increased throughput requires powerful validator nodes to process these blocks. This requirement leads to the second criticism which is network centralization.

Network Centralization

Another common criticism of parallel processing is that it significantly pushes the network toward centralization. In high throughput networks, the network can process tens of thousands of transactions per second. Validator nodes are incentivized by fees and network rewards to process these transactions and invest in dedicated servers or scalable cloud architectures to process these transactions. The same cannot be said about companies or individuals who use the chain and need to run full nodes to interact with the chain. These entities cannot afford complex servers to process this massive load of transactions. This will push chain users to depend on specialized RPC node providers, e.g., Infura, which leads to more centralization.

Without the option to use consumer-grade hardware to operate full nodes, high-throughput chains can turn into a closed system where a small group of entities have the absolute power over the network. In this scenario, these entities can coordinate to censor transactions, entities or even applications, e.g., Tornado Cash, which can turn these chains into permissioned systems that are not different from Web 2.

Currently, the requirements of operating a full node on Sui testnet are lower than those of the Aptos testnet nodes. However, we expect these requirements to change significantly when the mainnets launch and applications start to emerge on the chain. Decentralization advocates have been proposing solutions to address these expected issues. The solutions include using light nodes that verify the correctness of blocks by using zk validity proofs or fraud proofs. The Fuel team is active in this regard and aligns with the ethos of the Ethereum community on the importance of decentralization. Aptos and Sui teams are not clear on prioritizing the implementation of these approaches or alternative approaches to promote decentralization. The Linera team has briefly discussed these concerns in their introduction post but the protocol implementation is yet to confirm this commitment.

Conclusion

Parallel execution engines are promising solutions to improve the throughput of smart contract platforms. Combined with innovation on the consensus mechanisms, parallel execution of transactions can lead to chains with throughputs approaching or exceeding 100k TPS. Such performance that rivals Visa and Mastercard can enable several use cases that are challenging today such as fully on-chain games and decentralized micropayment. These impressive throughput improvements don’t come without challenges on how to ensure decentralization. At Alliance, we are looking forward to supporting founders working on solving these problems and founders building innovative applications that benefit from these advancements.

Many thanks to the founders of the Aries team and Robert Chen for the comments and feedback on this article!

For feedback on this article, DM me on Twitter or leave a comment below!

--

--

Mohamed Fouda
Alliance

Crypto researcher and Investor. Contributor @AllianceDao, Venture partner @Volt Capital, PhD @Northwestern