Proto-Danksharding: Speeding Up Blobs Verification
Learn how we sped up batched blob KZG proof verification by 161.11%
May 27, 2023
rust
ethereum
eip-4844
kzg-proofs
bls12-381
CONTENTS
Introduction
The Ethereum Foundation proposed
Next week, I will be defending my thesis titled “Parallelization of the KZG10 scheme”. In my thesis, I parallelized the KZG commitment scheme and BLS12-381 elliptic curve operations, along with a subset of the EIP-4844 proposal that uses these KZG commitments. My changes were incorporated into the
How c-kzg-4844 does things
C-kzg-4844 leaves the implementation of parallelism to higher-level programming languages that use its bindings. This approach is not only simpler but also safer. The focus of c-kzg-4844 is on single-core performance, which is great for a low-latency environment.
How go-kzg-4844 does things
Go-kzg-4844 offers the function VerifyBlobKZGProofBatch
, which is designed for single-core execution similar to c-kzg-4844. However, they also provide a parallelized version of this function called VerifyBlobKZGProofBatchPar
. This parallelized version uses go-routines to process each proof in parallel. Although not perfect, this parallel implementation is considerably faster than the sequential one.
How we do things in rust-kzg
The general idea behind our approach is as follows: if the number of blobs exceeds the number of physical CPU cores, we divide the blobs into subgroups of equal size. Each CPU core then independently runs the batched algorithm. For example, consider the illustration below. If there are 64 blobs and 4 CPU cores, we create 4 groups, each containing 16 blobs. Each group is assigned to its dedicated CPU core, which handles the execution of the blob verification process. By utilizing this approach, we effectively distribute the workload across multiple CPU cores, optimizing performance and ensuring efficient verification of the blobs.
However, one could argue that the performance of batched blob KZG proof verification depends on how Ethereum protocol execution clients choose to utilize this approach. If clients choose to verify blobs as soon as they receive them, they would likely opt for an approach that performs single blob verification faster. However, if they decide to wait and accumulate a fixed amount of blobs before performing the verification, this approach will yield much better performance.
Code example
In the code snippet, there is more to the implementation, but let’s focus on illustrating the main concept of this approach:
1#[cfg(feature = "parallel")]
2{
3 let num_blobs = blobs.len();
4 let num_cores = num_cpus::get_physical();
5
6 return if num_blobs > num_cores {
7 // Process blobs in parallel subgroups
8 let blobs_per_group = num_blobs / num_cores;
9
10 blobs
11 .par_chunks(blobs_per_group)
12 .enumerate()
13 .all(|(i, blob_group)| {
14 let num_blobs_in_group = blob_group.len();
15 let commitment_group = &commitments_g1
16 [blobs_per_group * i..blobs_per_group * i + num_blobs_in_group];
17 let proof_group =
18 &proofs_g1[blobs_per_group * i..blobs_per_group * i + num_blobs_in_group];
19 let (evaluation_challenges_fr, ys_fr) =
20 compute_challenges_and_evaluate_polynomial(
21 blob_group,
22 commitment_group,
23 ts,
24 );
25
26 verify_kzg_proof_batch(
27 commitment_group,
28 &evaluation_challenges_fr,
29 &ys_fr,
30 proof_group,
31 ts,
32 )
33 })
34 } else {
35 // Each group contains either one or zero blobs, so iterate
36 // over the single blob verification function in parallel
37 (blobs, commitments_g1, proofs_g1)
38 .into_par_iter()
39 .all(|(blob, commitment, proof)| {
40 verify_blob_kzg_proof(blob, commitment, proof, ts)
41 })
42 };
43}
When num_blobs > num_cores
, the code divides the blobs into parallel subgroups. The number of blobs per group is calculated based on the division. The code then iterates over each subgroup, performing various operations such as retrieving the corresponding commitment and proof groups. It also computes evaluation challenges and evaluates a polynomial using the provided data. Finally, it verifies a batch of KZG proofs using the obtained information.
In the else statement, when the number of blobs is not greater than the number of cores, the code handles each blob individually or in groups with only one blob. It uses parallel iteration to execute the blob verification function concurrently, similar to how go-kzg-4844 handles parallelism using go-routines.
Results
Rust and Go bindings, using the rust-kzg with blst backend, verified 64 blobs on 16 cores in 29.82 ms and 30.164 ms, respectively. In comparison, the native rust-kzg accomplished this task in 18.397 ms, while the parallelized implementation of go-kzg-4844 took 48.037 ms. It’s important to note that we only perform full error checking through the exported C API when we convert bytes to our internal types. Therefore, the performance of the native rust-kzg code is probably better because we omit those checks here, assuming we receive correct data from the byte conversion functions. With this in mind, the
Summary
- We potentially outperform go-kzg-4844 by approximately 59.25% within Go in batched blob KZG proof verification