Paradigm: Detailed explanation of Ethereums historical growth problems and solutions

Analysis2wks agoUpdate 6086cf...
40 0

Original article by Storm Slivkoff and Georgios Konstantopoulos

Original translation: Luffy, Foresight News

History growth is currently the biggest bottleneck for Ethereum’s expansion. Surprisingly, history growth has become a bigger problem than state growth. Within a few years, history data will exceed the storage capacity of many Ethereum nodes.

The good news is:

  • History growth is a much easier problem to solve than state growth.

  • A solution is already under active development.

  • Solving the history growth will alleviate the state growth problem.

In this post, we continue our investigation of Ethereum scaling from Part 1, now shifting our focus from state growth to historical growth. Using a refined dataset, our goals are to 1) technically understand Ethereum’s scaling bottlenecks, and 2) help inform the discussion around the optimal solution to Ethereum’s gas limit.

What is historical growth?

History is the collection of all blocks and transactions executed by Ethereum throughout its life cycle. It is all data from the genesis block to the current block. History growth is the accumulation of new blocks and new transactions over time.

Figure 1 shows the relationship between history growth and various protocol metrics and Ethereum node hardware constraints. History growth is limited by a different set of hardware constraints than state growth. History growth puts pressure on network IO because new blocks and transactions must be transmitted throughout the network. History growth also puts pressure on node storage space because each Ethereum node stores a complete copy of the history. If history growth is fast enough to exceed these hardware constraints, the node will no longer be able to reach a stable consensus with its peers. For an overview of state growth and other scaling bottlenecks, see Part 1 of this series.

Paradigm: Detailed explanation of Ethereums historical growth problems and solutions

Figure 1: Ethereum scaling bottleneck

Until recently, most of the network throughput of each node was used to transfer history (such as new blocks and transactions). This changed with the introduction of blobs in the Dencun hard fork. Blobs now account for a large portion of node network activity. However, blobs are not considered part of history because 1) they are only stored by nodes for 2 weeks and then discarded, and 2) they do not need to repeat data from Ethereum genesis. Due to (1), blobs do not significantly increase the storage burden of each Ethereum node. We will discuss blobs later in this article.

In this article, we will focus on history growth and discuss the relationship between history and state. Since state growth and history growth have some overlapping hardware constraints, they are related problems and solving one can help solve the other.

How fast has the historical growth been?

Figure 2 shows the historical growth rate since Ethereum’s genesis. Each vertical line represents one month of growth. The y-axis represents the number of gigabytes of historical growth for that month. Transactions are categorized by their “destination address” and are sized using RLP (https://ethereum.org/en/developers/docs/data-structures-and-encoding/rlp/) bytes. Contracts that cannot be easily identified are classified as “unknown”. The “other” category includes a range of small categories such as infrastructure and games.

Paradigm: Detailed explanation of Ethereums historical growth problems and solutions

Figure 2: Ethereum historical growth rate over time

A few key takeaways from the above chart:

  • History grows 6 to 8 times faster than state: History growth recently peaked at 36.0 GiB/month and is currently at 19.3 GiB/month. State growth peaked at about 6.0 GiB/month and is currently at 2.5 GiB/month. A comparison of history and state in terms of growth and cumulative size is described later in this article.

  • Prior to Decun, the historical growth rate had been accelerating: while the state had been growing roughly linearly for many years (see Part 1), the history was superlinear. Given that a linear growth rate would lead to a quadratic growth in overall size, a superlinear growth rate would lead to a more than quadratic growth in overall size. This acceleration stopped abruptly after Dencun. This is the first time Ethereum has experienced a significant drop in the historical growth rate.

  • Most of the recent historical growth comes from Rollups: each L2 publishes a copy of its transactions back to the mainnet. This generates a large amount of history and has caused Rollups to be the most important contributor to historical growth over the past year. However, Dencun allows L2s to publish their transaction data using blobs instead of history, so Rollups no longer generate the majority of Ethereum history. We will cover Rollups in more detail later in this article.

Who is the biggest contributor to Ethereums historical growth?

The historical number of contracts generated by different contract categories reveals how Ethereum usage patterns have evolved over time. Figure 3 shows the relative contribution of various contract categories. This is the same data as Figure 2, normalized.

Paradigm: Detailed explanation of Ethereums historical growth problems and solutions

Figure 3: Contribution of different contract types to historical growth

The data reveals four distinct periods of Ethereum usage patterns:

  • Early (purple): Ethereum’s first few years saw little on-chain activity. Most of these early contracts are difficult to identify now and are marked as “unknown” in the chart.

  • ERC-20 Era (Green): The ERC-20 standard was finalized in late 2015, but did not gain significant momentum until 2017 and 2018. ERC-20 contracts were the largest source of historical growth in 2019.

  • DEX/DeFi Era (Brown): DEX and DeFi contracts appeared on-chain as early as 2016 and began to gain traction in 2017. But it was not until the DeFi Summer of 2020 that they became the largest category in terms of historical growth. DeFi and DEX contracts accounted for more than 50% of historical growth in 2021 and parts of 2022.

  • Rollup Era (Gray): L2 Rollups start executing more transactions than mainnet in early 2023. In the months before Dencun, they generated about 2/3 of Ethereum history.

Each era represents a more complex usage pattern for Ethereum than the one before it. Complexity can be seen as a form of Ethereum scaling over time, which cannot be measured by simple metrics like transactions per second.

In the most recent data month (April 2024), Rollups no longer generate the majority of history. It is unclear whether future history will come from DEX and DeFi, or if some new usage pattern will emerge.

What about blobs?

The Dencun hard fork introduced blobs, significantly changing the historical growth dynamics by allowing Rollups to publish data using cheap blobs instead of historical records. Figure 4 zooms in on the historical growth rates before and after the Dencun upgrade. This chart is similar to Figure 2, except that each vertical line represents one day instead of one month.

Paradigm: Detailed explanation of Ethereums historical growth problems and solutions

Figure 4: Dencun’s impact on historical growth

We can draw several key conclusions from this chart:

  • Since Dencun, the historical growth of rollups has dropped by about 2/3: most rollups have converted from call data to blobs, which has greatly reduced the amount of history they generate. However, as of April 2024, there are still some rollups that have not yet converted from call data to blobs.

  • Total historical growth has dropped by about 1/3 since Dencun: Dencun only reduced historical growth for rollups. Historical growth for other contract categories increased slightly. Even after Dencun, historical growth is still 8 times greater than state growth (see next section for details).

While blobs have reduced the historical growth rate, they are still a new feature of Ethereum and it is unclear what level the historical growth rate will stabilize at with blobs in place.

How fast is historical growth acceptable?

Increasing the Gas limit will increase the historical growth rate. Therefore, proposals to increase the Gas limit (such as Pump the Gas ) must consider the relationship between historical growth and the hardware bottleneck of each node.

To determine an acceptable historical growth rate, we first need to understand how long the current node hardware can sustain in terms of networking and storage. Networking hardware can probably maintain the status quo indefinitely, as the historical growth rate is unlikely to return to its pre-Dencun peak before the gas limit is increased. However, the storage burden of history will continue to increase over time. Under the current storage strategy, it is inevitable that each nodes storage hard disk will eventually be filled with historical records.

Figure 5 shows the storage burden of Ethereum nodes over time and predicts the growth of storage burden over the next 3 years. The forecast refers to the growth rate in April 2024. The growth rate may increase or decrease as usage patterns or gas limits change in the future.

Paradigm: Detailed explanation of Ethereums historical growth problems and solutions

Figure 5: Size of history, state, and full node storage burden

We can draw several key conclusions from this figure:

  • History takes up about 3 times as much storage space as state. This difference grows over time, as history grows about 8 times as fast as state.

  • 1.8 TiB is the critical threshold, and many nodes will be forced to upgrade their storage hard disks. 2 TB is a common storage hard disk size, which only provides 1.8 TiB of free space. Note that TB (1 trillion bytes) is a different unit from TiB (= 1024^4 bytes). For many node operators, the real critical threshold is even lower, because after the merger, validators must run a consensus client together with the execution client.

  • The critical threshold will be reached in 2-3 years. Increasing the gas limit by any amount will accelerate this time accordingly. Reaching this threshold will impose a significant maintenance burden on node operators and require the purchase of additional hardware (such as $300 NVME drives).

Unlike state data, history data is append-only and is accessed much less frequently. Therefore, in theory, history data can be stored separately from state data on cheaper storage media. This can be achieved by some clients such as Geth.

In addition to storage capacity, network IO is another major limitation to historical growth. Unlike storage capacity, network IO limitations will not cause problems for nodes in the short term, but these limitations will become important as gas limits are increased in the future.

To understand how much historical growth the network capacity of a typical Ethereum node can support, one must know the relationship between historical growth and various network health metrics, such as reorganization rate, slot misses, finality misses, proof misses, sync committee misses, and block submission latency. Analysis of these metrics is beyond the scope of this article, but more information can be found in previous surveys of consensus layer health. In addition, the Ethereum Foundations Xatu project has been building public datasets to accelerate such analysis.

How to solve the historical growth problem?

History growth is a much easier problem to solve than state growth. It can be almost completely solved by candidate proposal EIP-4444. This EIP changes each node from saving the entire Ethereum history to only saving one year of history. After implementing EIP-4444, data storage will no longer be a bottleneck for Ethereums expansion, and gas limit increases will no longer be a constraint in the long run. EIP -4444 is necessary for the long-term sustainability of the network, otherwise the history growth rate will be very fast and the hardware of the network nodes will need to be regularly updated.

Figure 6 shows the impact of EIP-4444 on the storage burden of each node over the next 3 years. This is the same as Figure 4, but with the addition of a lighter line representing the storage burden after EIP-4444 is implemented.

Paradigm: Detailed explanation of Ethereums historical growth problems and solutions

Figure 6 : The impact of EIP-4444 on Ethereum node storage burden

Some key conclusions can be drawn from this figure:

  • EIP-4444 will halve the current storage burden. The storage burden will drop from 1.2 TiB to 633 GiB.

  • EIP-4444 will stabilize the history storage burden. Assuming a constant history growth rate, history data will be discarded at the rate it is generated.

  • After EIP-4444, it will take many years for node storage burden to reach todays levels. This is because state growth will be the only factor increasing storage burden, and state growth is slower than historical growth.

After the implementation of EIP-4444, the history growth will still bring a certain degree of storage burden, because the node will store one year of history. However, even if Ethereum reaches global scale, this burden is not difficult to solve. Once the history preservation method is proven to be reliable, the one-year expiration time of EIP-4444 may be shortened to a few months, weeks or even shorter.

How to preserve Ethereum history?

EIP-4444 raises the question: if history is not kept by the Ethereum nodes themselves, then how should it be kept? History plays a central role in Ethereums verification, accounting, and analysis, so preserving history is critical. Fortunately, history preservation is a simple problem that only requires 1/n honest data providers. This is in stark contrast to the state consensus problem, which requires 1/3 to 2/3 of the participants to be honest. Node operators can verify the authenticity of historical data sets by 1) replaying all transactions since the genesis block and 2) checking that these transactions reproduce the same state root as the current blockchain end.

There are many ways to save history.

  • Torrents/P2P: Torrents are the simplest and most reliable method. Ethereum nodes can periodically package parts of the history and share them as public torrent files. For example, a node might create a new history torrent file every 100,000 blocks. Node clients like Erigon already perform this process in a somewhat non-standardized way. In order to standardize this process, all node clients must use the same data format, the same parameters, and the same P2P network. Nodes will be able to choose whether to participate in this network based on their storage and bandwidth capabilities. Torrents have the advantage of using a highly lindy open standard that is already supported by a large number of data tools.

  • Portal Network: Portal Network is a new network designed specifically for hosting Ethereum data. It is a Torrent-like approach while also providing some additional features to make data verification easier. The advantage of Portal Network is that these additional layers of verification provide utility for light clients to efficiently verify and query shared data sets.

  • Cloud hosting: Cloud storage services such as AWS’s S3 or Cloudflare’s R2 provide a cheap and high-performance option for preserving historical records. However, this approach carries more legal and business operational risks, as there is no guarantee that these cloud services are always willing and able to host encrypted data.

The remaining implementation challenges are more social than technical. The Ethereum community needs to coordinate specific implementation details so that they can be integrated directly into every node client. In particular, performing a full sync from the genesis block (rather than a snapshot sync) will require retrieving history from a history provider rather than an Ethereum node. These changes do not technically require a hard fork, so they can be implemented earlier than Ethereums next hard fork, Pectra.

All of these history preservation methods can also be used by L2 to preserve the blob data they publish to the mainnet. Compared to history preservation, blob preservation is 1) more difficult because the total amount of data is much larger; 2) less important because blobs are not necessary for replaying mainnet history. However, blob preservation is still necessary for each L2 to replay its own history. Therefore, some form of blob preservation is important to the entire Ethereum ecosystem. In addition, if L2 develops a strong blob storage infrastructure, they may also be able to easily store L1 historical data.

It would be helpful to directly compare datasets stored by various node configurations before and after EIP-4444. Figure 7 shows the storage burden of different Ethereum node types. State data is accounts and contracts, history data is blocks and transactions, and archive data is an optional set of data indexes. The byte counts in this table are based on a recent reth snapshot, but the numbers for other node clients should be roughly comparable.

Paradigm: Detailed explanation of Ethereums historical growth problems and solutions

Figure 7: Storage burden of different Ethereum node types

in other words,

  • Archive nodes store state data and historical data as well as archive data. Archive nodes can be used when someone wants to be able to easily query the historical chain status.

  • Full nodes only store historical and state data. Most nodes today are full nodes. The storage burden of a full node is about half that of an archive node.

  • After EIP-4444, full nodes only store state data and historical data for the last year. This reduces the nodes storage burden from 1.2 TiB to 633 GiB and brings the storage space for historical data to a steady-state value.

  • Stateless nodes, also known as “light nodes”, do not store any data sets and are able to verify immediately at the end of the chain. This type of node will become possible once Verkle attempts or other state commitment schemes are added to Ethereum.

Finally, there are a few additional EIPs that limit the historical growth rate rather than just adapting to the current growth rate. This helps stay within the network IO constraints in the short term and within the storage constraints in the long term. While EIP-4444 is still necessary for the long-term sustainability of the network, these other EIPs will help Ethereum scale more efficiently in the future:

  • EIP-7623: Reprice call data, making certain transactions with too much call data more expensive. Making these usage patterns more expensive will force some of them to convert from call data to blobs. This will reduce the historical growth rate.

  • EIP-4488: Impose a limit on the total amount of call data that can be included in each block. This will impose stricter limits on how fast the history can grow.

These EIPs are easier to implement than EIP-4444, so they may serve as a short-term stopgap measure before EIP-4444 goes into production.

Conclusion

The purpose of this article is to use data to understand 1) how historical growth works and 2) how to solve the problem. Many of the data in this article are difficult to obtain through traditional means, so we hope that making this data public can provide some new insights into the historical growth problem.

History growth as a bottleneck for Ethereums expansion has not received enough attention. Even without increasing the Gas limit, Ethereums current practice of preserving history will force many nodes to upgrade their hardware in a few years. Fortunately, this is not a difficult problem to solve. There is already a clear solution in EIP-4444. We believe that the implementation of this EIP should be accelerated to leave room for future Gas limit increases.

Original link

This article is sourced from the internet: Paradigm: Detailed explanation of Ethereums historical growth problems and solutions

Related: Bitcoin Miners Struggle to Maintain Operations Ahead of the Halving

In Brief This year’s Bitcoin halving to cut mining reward to 3.125 BTC, challenging miners’ profitability. CryptoQuant reports 30% drop in miner hashprice since last halving, expected further decline. Competition and costs up, with Bitcoin network hashrate hitting 600 EH/s, affecting earnings. In approximately 10 days, the Bitcoin community will witness a significant event, i.e., the Bitcoin halving. This phenomenon will halve the reward for mining a Bitcoin block from 6.25 to 3.125 Bitcoins, putting pressure on miners’ profitability. Miners are now in a race against time, requiring higher Bitcoin prices to maintain their earnings. Why Bitcoin Miners Will Face Challenges According to a CryptoQuant’s report shared with BeInCrypto, miner hashprice have dipped 30% since the last halving in May 2020. Currently valued at $0.11 per Terahash per second, this…

© Copyright Notice

Related articles

No comments

You must be logged in to leave a comment!
Login immediately
No comments...