<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>belijzajac.dev</title><link>https://belijzajac.dev/</link><description>Recent content on belijzajac.dev</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><managingEditor>blog@belijzajac.dev (belijzajac)</managingEditor><webMaster>blog@belijzajac.dev (belijzajac)</webMaster><copyright>Copyright © 2025 | CC BY-NC-SA 4.0</copyright><lastBuildDate>Sat, 27 May 2023 00:00:00 +0000</lastBuildDate><atom:link href="https://belijzajac.dev/index.xml" rel="self" type="application/rss+xml"/><item><title>Proto-Danksharding: Speeding Up Blobs Verification</title><link>https://belijzajac.dev/proto-danksharding-speeding-up-blobs-verification/</link><pubDate>Sat, 27 May 2023 00:00:00 +0000</pubDate><author>blog@belijzajac.dev (belijzajac)</author><guid>https://belijzajac.dev/proto-danksharding-speeding-up-blobs-verification/</guid><description>&lt;p&gt;&lt;img src="https://belijzajac.dev/post-images/protodanksharding.jpg" alt="protodanksharding"&gt;&lt;/p&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;The Ethereum Foundation proposed &lt;highlight&gt;&lt;a href="https://eips.ethereum.org/EIPS/eip-4844"&gt;EIP-4844&lt;/a&gt;&lt;/highlight&gt; on February 25, 2022, with the objective of reducing gas fees. It introduces a new transaction type called &amp;ldquo;blob&amp;rdquo;, which is temporarily stored and committed using the KZG commitment scheme. In addition, the Ethereum Foundation developed a project called &lt;highlight&gt;&lt;a href="https://github.com/ethereum/c-kzg-4844"&gt;c-kzg-4844&lt;/a&gt;&lt;/highlight&gt;, which provides a minimal implementation of the polynomial commitments API written in C. This project does not use parallelization and exposes its C API for bindings in different programming languages. Another project, called &lt;highlight&gt;&lt;a href="https://github.com/crate-crypto/go-kzg-4844"&gt;go-kzg-4844&lt;/a&gt;&lt;/highlight&gt;, which uses parallelism, has been practically implemented into the Ethereum code and is rumored to be the fastest implementation of EIP-4844 thus far.&lt;/p&gt;
&lt;p&gt;Next week, I will be defending my thesis titled &amp;ldquo;Parallelization of the KZG10 scheme&amp;rdquo;. In my thesis, I parallelized the KZG commitment scheme and BLS12-381 elliptic curve operations, along with a subset of the EIP-4844 proposal that uses these KZG commitments. My changes were incorporated into the &lt;highlight&gt;&lt;a href="https://github.com/grandinetech/rust-kzg"&gt;rust-kzg project&lt;/a&gt;&lt;/highlight&gt;, where we exported C functions through Rust to bind the parallelized functions of rust-kzg backends to those of c-kzg-4844. Fortunately, we were presented with a unique opportunity due to the go binding included in the c-kzg-4844 project. We then used this binding to benchmark our rust-kzg&amp;rsquo;s highly parallelized blst backend against their go-kzg-4844 project and assess its speed in comparison.&lt;/p&gt;
&lt;h2 id="how-c-kzg-4844-does-things"&gt;How c-kzg-4844 does things&lt;/h2&gt;
&lt;p&gt;C-kzg-4844 leaves the implementation of parallelism to higher-level programming languages that use its bindings. This approach is not only simpler but also safer. The focus of c-kzg-4844 is on single-core performance, which is great for a low-latency environment.&lt;/p&gt;
&lt;h2 id="how-go-kzg-4844-does-things"&gt;How go-kzg-4844 does things&lt;/h2&gt;
&lt;p&gt;Go-kzg-4844 offers the function &lt;code&gt;VerifyBlobKZGProofBatch&lt;/code&gt;, which is designed for single-core execution similar to c-kzg-4844. However, they also provide a parallelized version of this function called &lt;code&gt;VerifyBlobKZGProofBatchPar&lt;/code&gt;. This parallelized version uses go-routines to process each proof in parallel. Although not perfect, this parallel implementation is considerably faster than the sequential one.&lt;/p&gt;
&lt;h2 id="how-we-do-things-in-rust-kzg"&gt;How we do things in rust-kzg&lt;/h2&gt;
&lt;p&gt;The general idea behind our approach is as follows: if the number of blobs exceeds the number of physical CPU cores, we divide the blobs into subgroups of equal size. Each CPU core then independently runs the batched algorithm. For example, consider the illustration below. If there are 64 blobs and 4 CPU cores, we create 4 groups, each containing 16 blobs. Each group is assigned to its dedicated CPU core, which handles the execution of the blob verification process. By utilizing this approach, we effectively distribute the workload across multiple CPU cores, optimizing performance and ensuring efficient verification of the blobs.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://belijzajac.dev/post-images/batched-blob-verification-approach.png" alt="batched-blob-verification-process"&gt;&lt;/p&gt;
&lt;p&gt;However, one could argue that the performance of batched blob KZG proof verification depends on how Ethereum protocol execution clients choose to utilize this approach. If clients choose to verify blobs as soon as they receive them, they would likely opt for an approach that performs single blob verification faster. However, if they decide to wait and accumulate a fixed amount of blobs before performing the verification, this approach will yield much better performance.&lt;/p&gt;
&lt;h2 id="code-example"&gt;Code example&lt;/h2&gt;
&lt;p&gt;In the code snippet, there is more to the implementation, but let&amp;rsquo;s focus on illustrating the main concept of this approach:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-rust" data-lang="rust"&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 1&lt;/span&gt;&lt;span&gt;&lt;span style="color:#8ec07c"&gt;#[cfg(feature = &lt;/span&gt;&lt;span style="color:#b8bb26"&gt;&amp;#34;parallel&amp;#34;&lt;/span&gt;&lt;span style="color:#8ec07c"&gt;)]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 2&lt;/span&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 3&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fe8019"&gt;let&lt;/span&gt; num_blobs &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; blobs.len();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 4&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fe8019"&gt;let&lt;/span&gt; num_cores &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; num_cpus::get_physical();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 5&lt;/span&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 6&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fe8019"&gt;return&lt;/span&gt; &lt;span style="color:#fe8019"&gt;if&lt;/span&gt; num_blobs &lt;span style="color:#fe8019"&gt;&amp;gt;&lt;/span&gt; num_cores {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 7&lt;/span&gt;&lt;span&gt; &lt;span style="color:#928374;font-style:italic"&gt;// Process blobs in parallel subgroups
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 8&lt;/span&gt;&lt;span&gt;&lt;span style="color:#928374;font-style:italic"&gt;&lt;/span&gt; &lt;span style="color:#fe8019"&gt;let&lt;/span&gt; blobs_per_group &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; num_blobs &lt;span style="color:#fe8019"&gt;/&lt;/span&gt; num_cores;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 9&lt;/span&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;10&lt;/span&gt;&lt;span&gt; blobs
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;11&lt;/span&gt;&lt;span&gt; .par_chunks(blobs_per_group)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;12&lt;/span&gt;&lt;span&gt; .enumerate()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;13&lt;/span&gt;&lt;span&gt; .all(&lt;span style="color:#fe8019"&gt;|&lt;/span&gt;(i, blob_group)&lt;span style="color:#fe8019"&gt;|&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;14&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fe8019"&gt;let&lt;/span&gt; num_blobs_in_group &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; blob_group.len();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;15&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fe8019"&gt;let&lt;/span&gt; commitment_group &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; &lt;span style="color:#fe8019"&gt;&amp;amp;&lt;/span&gt;commitments_g1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;16&lt;/span&gt;&lt;span&gt; [blobs_per_group &lt;span style="color:#fe8019"&gt;*&lt;/span&gt; i&lt;span style="color:#fe8019"&gt;..&lt;/span&gt;blobs_per_group &lt;span style="color:#fe8019"&gt;*&lt;/span&gt; i &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; num_blobs_in_group];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;17&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fe8019"&gt;let&lt;/span&gt; proof_group &lt;span style="color:#fe8019"&gt;=&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;18&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fe8019"&gt;&amp;amp;&lt;/span&gt;proofs_g1[blobs_per_group &lt;span style="color:#fe8019"&gt;*&lt;/span&gt; i&lt;span style="color:#fe8019"&gt;..&lt;/span&gt;blobs_per_group &lt;span style="color:#fe8019"&gt;*&lt;/span&gt; i &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; num_blobs_in_group];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;19&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fe8019"&gt;let&lt;/span&gt; (evaluation_challenges_fr, ys_fr) &lt;span style="color:#fe8019"&gt;=&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;20&lt;/span&gt;&lt;span&gt; compute_challenges_and_evaluate_polynomial(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;21&lt;/span&gt;&lt;span&gt; blob_group,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;22&lt;/span&gt;&lt;span&gt; commitment_group,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;23&lt;/span&gt;&lt;span&gt; ts,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;24&lt;/span&gt;&lt;span&gt; );
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;25&lt;/span&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;26&lt;/span&gt;&lt;span&gt; verify_kzg_proof_batch(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;27&lt;/span&gt;&lt;span&gt; commitment_group,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;28&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fe8019"&gt;&amp;amp;&lt;/span&gt;evaluation_challenges_fr,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;29&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fe8019"&gt;&amp;amp;&lt;/span&gt;ys_fr,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;30&lt;/span&gt;&lt;span&gt; proof_group,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;31&lt;/span&gt;&lt;span&gt; ts,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;32&lt;/span&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;33&lt;/span&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;34&lt;/span&gt;&lt;span&gt; } &lt;span style="color:#fe8019"&gt;else&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;35&lt;/span&gt;&lt;span&gt; &lt;span style="color:#928374;font-style:italic"&gt;// Each group contains either one or zero blobs, so iterate
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;36&lt;/span&gt;&lt;span&gt;&lt;span style="color:#928374;font-style:italic"&gt;&lt;/span&gt; &lt;span style="color:#928374;font-style:italic"&gt;// over the single blob verification function in parallel
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;37&lt;/span&gt;&lt;span&gt;&lt;span style="color:#928374;font-style:italic"&gt;&lt;/span&gt; (blobs, commitments_g1, proofs_g1)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;38&lt;/span&gt;&lt;span&gt; .into_par_iter()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;39&lt;/span&gt;&lt;span&gt; .all(&lt;span style="color:#fe8019"&gt;|&lt;/span&gt;(blob, commitment, proof)&lt;span style="color:#fe8019"&gt;|&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;40&lt;/span&gt;&lt;span&gt; verify_blob_kzg_proof(blob, commitment, proof, ts)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;41&lt;/span&gt;&lt;span&gt; })
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;42&lt;/span&gt;&lt;span&gt; };
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;43&lt;/span&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;When &lt;code&gt;num_blobs &amp;gt; num_cores&lt;/code&gt;, the code divides the blobs into parallel subgroups. The number of blobs per group is calculated based on the division. The code then iterates over each subgroup, performing various operations such as retrieving the corresponding commitment and proof groups. It also computes evaluation challenges and evaluates a polynomial using the provided data. Finally, it verifies a batch of KZG proofs using the obtained information.&lt;/p&gt;
&lt;p&gt;In the else statement, when the number of blobs is not greater than the number of cores, the code handles each blob individually or in groups with only one blob. It uses parallel iteration to execute the blob verification function concurrently, similar to how go-kzg-4844 handles parallelism using go-routines.&lt;/p&gt;
&lt;h2 id="results"&gt;Results&lt;/h2&gt;
&lt;p&gt;&lt;img src="https://belijzajac.dev/post-images/batched-blob-verification-results.png" alt="batched-blob-verification-results"&gt;&lt;/p&gt;
&lt;p&gt;Rust and Go bindings, using the rust-kzg with blst backend, verified 64 blobs on 16 cores in 29.82 ms and 30.164 ms, respectively. In comparison, the native rust-kzg accomplished this task in 18.397 ms, while the parallelized implementation of go-kzg-4844 took 48.037 ms. It’s important to note that we only perform full error checking through the exported C API when we convert bytes to our internal types. Therefore, the performance of the native rust-kzg code is probably better because we omit those checks here, assuming we receive correct data from the byte conversion functions. With this in mind, the &lt;highlight&gt;rust-kzg with blst backend outperformed go-kzg-4844 by approximately 161.11% in terms of speed, while its bindings were approximately 59.25% faster&lt;/highlight&gt;.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;We potentially outperform go-kzg-4844 by approximately 59.25% within Go in batched blob KZG proof verification&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>WisniaLang: Compiler Project</title><link>https://belijzajac.dev/wisnialang-compiler-project/</link><pubDate>Mon, 17 Oct 2022 00:00:00 +0000</pubDate><author>blog@belijzajac.dev (belijzajac)</author><guid>https://belijzajac.dev/wisnialang-compiler-project/</guid><description>&lt;p&gt;&lt;img src="https://belijzajac.dev/post-images/dragon-maid-compiler-book.jpg" alt="dragon-book"&gt;&lt;/p&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;For the past 3 years, I have been working on the &lt;highlight&gt;&lt;a href="https://github.com/belijzajac/WisniaLang"&gt;WisniaLang&lt;/a&gt;&lt;/highlight&gt; compiler for my own programming language that compiles to native machine code and packs it into an executable by itself. Unlike many others, I rolled out my own compiler backend from scratch that does fast but naive code generation. While it&amp;rsquo;s admittedly a more old-fashioned approach to compiler engineering, it&amp;rsquo;s the path I chose to take when developing my compiler.&lt;/p&gt;
&lt;h2 id="architecture"&gt;Architecture&lt;/h2&gt;
&lt;p&gt;&lt;img src="https://belijzajac.dev/post-images/wisnialang-architecture.png" alt="architecture"&gt;&lt;/p&gt;
&lt;p&gt;My compiler&amp;rsquo;s architecture is divided into several main phases that work together to complete this translation. These phases include lexical analysis, which breaks the source code down into smaller pieces called tokens; syntactic analysis, which builds a representation of the structure of the source code called an abstract syntax tree (AST); semantic analysis, which checks the AST for semantic errors while traversing the tree; intermediate representation (IR), which represents the code in a lower-level form close to the target architecture; code generation, which allocates registers and generates machine code from the said IRs; and, lastly, packing the resulting machine code into an executable program in ELF format.&lt;/p&gt;
&lt;h2 id="programming-languages-and-llvm"&gt;Programming languages and LLVM&lt;/h2&gt;
&lt;p&gt;Before going further, let me get straight to the point:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Writing compilers is easy&lt;/li&gt;
&lt;li&gt;Optimizing the machine code is hard&lt;/li&gt;
&lt;li&gt;Supporting arbitrary architectures / operating systems is hard&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;img src="https://belijzajac.dev/post-images/llvm-approach.png" alt="llvm-approach"&gt;&lt;/p&gt;
&lt;p&gt;This is where LLVM comes in handy. LLVM uses an intermediate representation language, which is kind of similar to assembly, but with a few higher level constructs. LLVM is good at optimizing this IR language, as well as compiling into different architecture and binary formats. So as a language author using LLVM, I&amp;rsquo;m really writing a transpiler from my language to LLVM IR, and letting the LLVM compiler do the hard work.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;You talk about LLVM so much, why&amp;rsquo;s that? Let me begin with this illustration:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://belijzajac.dev/post-images/llvm-family.png" alt="llvm-family"&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m not sure if it&amp;rsquo;s a positive thing, but the LLVM project has achieved such widespread adoption that it&amp;rsquo;s almost reached a monopoly status, much like the Chromium project, for instance. Apart from Google Chrome, numerous other browsers are built upon the Chromium codebase. From Electron web apps to Arc, Microsoft Edge, Opera, Vivaldi, Brave, and beyond, the list just goes on. Firefox and Safari are perhaps the only web browsers that stand out from this copy-paste crowd.&lt;/p&gt;
&lt;p&gt;I just wanted to point out that while 99.9% of compiler developers opt for LLVM, the remaining few explore alternative compiler backends like &lt;highlight&gt;&lt;a href="https://c9x.me/compile/"&gt;QBE&lt;/a&gt;&lt;/highlight&gt;, develop interpreters (like Python), or create virtual machines (such as the JVM for Java and Kotlin). Some even write transpilers that convert high-level languages into something low-level like C, which is then compiled with gcc. If you recall the dragon compiler book appearing at the top of this page, these and similar compiler books are gradually losing relevance because they don&amp;rsquo;t teach how to use LLVM, the industry&amp;rsquo;s compiler standard.&lt;/p&gt;
&lt;h2 id="benchmark-no-1-fibonacci-sequence"&gt;Benchmark No. 1: Fibonacci sequence&lt;/h2&gt;
&lt;p&gt;To benchmark different compilers, I chose the Fibonacci sequence without recursion problem and computed the 46th Fibonacci number with each compiler under test. This number was chosen because it conveniently fits within 32 bits. Compile-time and runtime benchmarks were performed using the &lt;highlight&gt;&lt;a href="https://github.com/sharkdp/hyperfine"&gt;hyperfine&lt;/a&gt;&lt;/highlight&gt; command-line benchmarking tool, which closely resembles Rust&amp;rsquo;s &lt;highlight&gt;&lt;a href="https://github.com/bheisler/criterion.rs"&gt;Criterion&lt;/a&gt;&lt;/highlight&gt; benchmarking library. Binary size benchmarks were carried out using standard Linux tools like &lt;code&gt;strip&lt;/code&gt; to remove debug symbols from binaries and &lt;code&gt;wc&lt;/code&gt; to display byte counts for each binary file.&lt;/p&gt;
&lt;h3 id="wisnialang-benchmark"&gt;WisniaLang benchmark&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-rust" data-lang="rust"&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 1&lt;/span&gt;&lt;span&gt;&lt;span style="color:#fe8019"&gt;fn&lt;/span&gt; &lt;span style="color:#fabd2f"&gt;fibonacci&lt;/span&gt;(n: int) -&amp;gt; int {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 2&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fe8019"&gt;if&lt;/span&gt; (n &lt;span style="color:#fe8019"&gt;&amp;lt;=&lt;/span&gt; &lt;span style="color:#d3869b"&gt;1&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 3&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fe8019"&gt;return&lt;/span&gt; n;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 4&lt;/span&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 5&lt;/span&gt;&lt;span&gt; int prev &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; &lt;span style="color:#d3869b"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 6&lt;/span&gt;&lt;span&gt; int current &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; &lt;span style="color:#d3869b"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 7&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fe8019"&gt;for&lt;/span&gt; (int i &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; &lt;span style="color:#d3869b"&gt;2&lt;/span&gt;; i &lt;span style="color:#fe8019"&gt;&amp;lt;=&lt;/span&gt; n; i &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; i &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; &lt;span style="color:#d3869b"&gt;1&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 8&lt;/span&gt;&lt;span&gt; int next &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; prev &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; current;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 9&lt;/span&gt;&lt;span&gt; prev &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; current;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;10&lt;/span&gt;&lt;span&gt; current &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; next;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;11&lt;/span&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;12&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fe8019"&gt;return&lt;/span&gt; current;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;13&lt;/span&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;14&lt;/span&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;15&lt;/span&gt;&lt;span&gt;&lt;span style="color:#fe8019"&gt;fn&lt;/span&gt; &lt;span style="color:#fabd2f"&gt;main&lt;/span&gt;() {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;16&lt;/span&gt;&lt;span&gt; print(fibonacci(&lt;span style="color:#d3869b"&gt;46&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;17&lt;/span&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4&gt;Compile time&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;1&lt;/span&gt;&lt;span&gt;hyperfine --runs &lt;span style="color:#d3869b"&gt;1000&lt;/span&gt; --warmup &lt;span style="color:#d3869b"&gt;10&lt;/span&gt; --shell&lt;span style="color:#fe8019"&gt;=&lt;/span&gt;none &lt;span style="color:#b8bb26"&gt;&amp;#39;./wisnia fibonacci.wsn&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;2&lt;/span&gt;&lt;span&gt;Benchmark 1: ./wisnia fibonacci.wsn
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;3&lt;/span&gt;&lt;span&gt; Time &lt;span style="color:#fe8019"&gt;(&lt;/span&gt;mean ± σ&lt;span style="color:#fe8019"&gt;)&lt;/span&gt;: 1.6 ms ± 0.3 ms &lt;span style="color:#fe8019"&gt;[&lt;/span&gt;User: 0.8 ms, System: 0.5 ms&lt;span style="color:#fe8019"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;4&lt;/span&gt;&lt;span&gt; Range &lt;span style="color:#fe8019"&gt;(&lt;/span&gt;min … max&lt;span style="color:#fe8019"&gt;)&lt;/span&gt;: 1.3 ms … 8.7 ms &lt;span style="color:#d3869b"&gt;1000&lt;/span&gt; runs
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4&gt;Runtime&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;1&lt;/span&gt;&lt;span&gt;hyperfine --runs &lt;span style="color:#d3869b"&gt;1000&lt;/span&gt; --warmup &lt;span style="color:#d3869b"&gt;10&lt;/span&gt; --shell&lt;span style="color:#fe8019"&gt;=&lt;/span&gt;none &lt;span style="color:#b8bb26"&gt;&amp;#39;./a.out&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;2&lt;/span&gt;&lt;span&gt;Benchmark 1: ./a.out
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;3&lt;/span&gt;&lt;span&gt; Time &lt;span style="color:#fe8019"&gt;(&lt;/span&gt;mean ± σ&lt;span style="color:#fe8019"&gt;)&lt;/span&gt;: 109.6 µs ± 36.8 µs &lt;span style="color:#fe8019"&gt;[&lt;/span&gt;User: 58.2 µs, System: 4.7 µs&lt;span style="color:#fe8019"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;4&lt;/span&gt;&lt;span&gt; Range &lt;span style="color:#fe8019"&gt;(&lt;/span&gt;min … max&lt;span style="color:#fe8019"&gt;)&lt;/span&gt;: 84.0 µs … 736.3 µs &lt;span style="color:#d3869b"&gt;1000&lt;/span&gt; runs
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4&gt;Binary size&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;1&lt;/span&gt;&lt;span&gt;wc -c a.out
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;2&lt;/span&gt;&lt;span&gt;&lt;span style="color:#d3869b"&gt;421&lt;/span&gt; a.out
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="c-gcc-benchmark"&gt;C++ (gcc) benchmark&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-cpp" data-lang="cpp"&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 1&lt;/span&gt;&lt;span&gt;&lt;span style="color:#8ec07c"&gt;#include&lt;/span&gt; &lt;span style="color:#8ec07c;font-style:italic"&gt;&amp;lt;iostream&amp;gt;&lt;/span&gt;&lt;span style="color:#8ec07c"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 2&lt;/span&gt;&lt;span&gt;&lt;span style="color:#8ec07c"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 3&lt;/span&gt;&lt;span&gt;&lt;span style="color:#fe8019"&gt;constexpr&lt;/span&gt; &lt;span style="color:#fe8019"&gt;auto&lt;/span&gt; &lt;span style="color:#fabd2f"&gt;fibonacci&lt;/span&gt;(u_int32_t n) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 4&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fe8019"&gt;if&lt;/span&gt; (n &lt;span style="color:#fe8019"&gt;&amp;lt;=&lt;/span&gt; &lt;span style="color:#d3869b"&gt;1&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 5&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fe8019"&gt;return&lt;/span&gt; n;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 6&lt;/span&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 7&lt;/span&gt;&lt;span&gt; u_int32_t prev &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; &lt;span style="color:#d3869b"&gt;0&lt;/span&gt;, current &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; &lt;span style="color:#d3869b"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 8&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fe8019"&gt;for&lt;/span&gt; (size_t i &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; &lt;span style="color:#d3869b"&gt;2&lt;/span&gt;; i &lt;span style="color:#fe8019"&gt;&amp;lt;=&lt;/span&gt; n; i&lt;span style="color:#fe8019"&gt;++&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 9&lt;/span&gt;&lt;span&gt; u_int32_t next &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; prev &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; current;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;10&lt;/span&gt;&lt;span&gt; prev &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; current;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;11&lt;/span&gt;&lt;span&gt; current &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; next;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;12&lt;/span&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;13&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fe8019"&gt;return&lt;/span&gt; current;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;14&lt;/span&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;15&lt;/span&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;16&lt;/span&gt;&lt;span&gt;&lt;span style="color:#fabd2f"&gt;int&lt;/span&gt; &lt;span style="color:#fabd2f"&gt;main&lt;/span&gt;() {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;17&lt;/span&gt;&lt;span&gt; std&lt;span style="color:#fe8019"&gt;::&lt;/span&gt;printf(&lt;span style="color:#b8bb26"&gt;&amp;#34;%d&amp;#34;&lt;/span&gt;, fibonacci(&lt;span style="color:#d3869b"&gt;46&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;18&lt;/span&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4&gt;Compile time&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;1&lt;/span&gt;&lt;span&gt;hyperfine --runs &lt;span style="color:#d3869b"&gt;100&lt;/span&gt; --warmup &lt;span style="color:#d3869b"&gt;10&lt;/span&gt; --shell&lt;span style="color:#fe8019"&gt;=&lt;/span&gt;none &lt;span style="color:#b8bb26"&gt;&amp;#39;g++ -std=c++23 -O3 fibonacci.cpp&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;2&lt;/span&gt;&lt;span&gt;Benchmark 1: g++ -std&lt;span style="color:#fe8019"&gt;=&lt;/span&gt;c++23 -O3 fibonacci.cpp
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;3&lt;/span&gt;&lt;span&gt; Time &lt;span style="color:#fe8019"&gt;(&lt;/span&gt;mean ± σ&lt;span style="color:#fe8019"&gt;)&lt;/span&gt;: 456.4 ms ± 4.5 ms &lt;span style="color:#fe8019"&gt;[&lt;/span&gt;User: 415.8 ms, System: 35.2 ms&lt;span style="color:#fe8019"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;4&lt;/span&gt;&lt;span&gt; Range &lt;span style="color:#fe8019"&gt;(&lt;/span&gt;min … max&lt;span style="color:#fe8019"&gt;)&lt;/span&gt;: 448.9 ms … 472.1 ms &lt;span style="color:#d3869b"&gt;100&lt;/span&gt; runs
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4&gt;Runtime&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;1&lt;/span&gt;&lt;span&gt;hyperfine --runs &lt;span style="color:#d3869b"&gt;1000&lt;/span&gt; --warmup &lt;span style="color:#d3869b"&gt;10&lt;/span&gt; --shell&lt;span style="color:#fe8019"&gt;=&lt;/span&gt;none &lt;span style="color:#b8bb26"&gt;&amp;#39;./a.out&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;2&lt;/span&gt;&lt;span&gt;Benchmark 1: ./a.out
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;3&lt;/span&gt;&lt;span&gt; Time &lt;span style="color:#fe8019"&gt;(&lt;/span&gt;mean ± σ&lt;span style="color:#fe8019"&gt;)&lt;/span&gt;: 347.1 µs ± 62.8 µs &lt;span style="color:#fe8019"&gt;[&lt;/span&gt;User: 206.4 µs, System: 67.2 µs&lt;span style="color:#fe8019"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;4&lt;/span&gt;&lt;span&gt; Range &lt;span style="color:#fe8019"&gt;(&lt;/span&gt;min … max&lt;span style="color:#fe8019"&gt;)&lt;/span&gt;: 271.9 µs … 926.4 µs &lt;span style="color:#d3869b"&gt;1000&lt;/span&gt; runs
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4&gt;Binary size&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;1&lt;/span&gt;&lt;span&gt;strip a.out
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;2&lt;/span&gt;&lt;span&gt;wc -c a.out
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;3&lt;/span&gt;&lt;span&gt;&lt;span style="color:#d3869b"&gt;14472&lt;/span&gt; a.out
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="c-clang-benchmark"&gt;C++ (clang) benchmark&lt;/h3&gt;
&lt;p&gt;Same program as before, just different compiler.&lt;/p&gt;
&lt;h4&gt;Compile time&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;1&lt;/span&gt;&lt;span&gt;hyperfine --runs &lt;span style="color:#d3869b"&gt;100&lt;/span&gt; --warmup &lt;span style="color:#d3869b"&gt;10&lt;/span&gt; --shell&lt;span style="color:#fe8019"&gt;=&lt;/span&gt;none &lt;span style="color:#b8bb26"&gt;&amp;#39;clang++ -std=c++2b -O3 fibonacci.cpp&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;2&lt;/span&gt;&lt;span&gt;Benchmark 1: clang++ -std&lt;span style="color:#fe8019"&gt;=&lt;/span&gt;c++2b -O3 fibonacci.cpp
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;3&lt;/span&gt;&lt;span&gt; Time &lt;span style="color:#fe8019"&gt;(&lt;/span&gt;mean ± σ&lt;span style="color:#fe8019"&gt;)&lt;/span&gt;: 538.2 ms ± 16.9 ms &lt;span style="color:#fe8019"&gt;[&lt;/span&gt;User: 481.7 ms, System: 45.7 ms&lt;span style="color:#fe8019"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;4&lt;/span&gt;&lt;span&gt; Range &lt;span style="color:#fe8019"&gt;(&lt;/span&gt;min … max&lt;span style="color:#fe8019"&gt;)&lt;/span&gt;: 524.3 ms … 657.9 ms &lt;span style="color:#d3869b"&gt;100&lt;/span&gt; runs
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4&gt;Runtime&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;1&lt;/span&gt;&lt;span&gt;hyperfine --runs &lt;span style="color:#d3869b"&gt;1000&lt;/span&gt; --warmup &lt;span style="color:#d3869b"&gt;10&lt;/span&gt; --shell&lt;span style="color:#fe8019"&gt;=&lt;/span&gt;none &lt;span style="color:#b8bb26"&gt;&amp;#39;./a.out&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;2&lt;/span&gt;&lt;span&gt;Benchmark 1: ./a.out
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;3&lt;/span&gt;&lt;span&gt; Time &lt;span style="color:#fe8019"&gt;(&lt;/span&gt;mean ± σ&lt;span style="color:#fe8019"&gt;)&lt;/span&gt;: 351.4 µs ± 67.7 µs &lt;span style="color:#fe8019"&gt;[&lt;/span&gt;User: 203.2 µs, System: 72.2 µs&lt;span style="color:#fe8019"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;4&lt;/span&gt;&lt;span&gt; Range &lt;span style="color:#fe8019"&gt;(&lt;/span&gt;min … max&lt;span style="color:#fe8019"&gt;)&lt;/span&gt;: 267.1 µs … 984.8 µs &lt;span style="color:#d3869b"&gt;1000&lt;/span&gt; runs
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4&gt;Binary size&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;1&lt;/span&gt;&lt;span&gt;strip a.out
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;2&lt;/span&gt;&lt;span&gt;wc -c a.out
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;3&lt;/span&gt;&lt;span&gt;&lt;span style="color:#d3869b"&gt;14504&lt;/span&gt; a.out
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="rust-benchmark"&gt;Rust benchmark&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-rust" data-lang="rust"&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 1&lt;/span&gt;&lt;span&gt;&lt;span style="color:#fe8019"&gt;fn&lt;/span&gt; &lt;span style="color:#fabd2f"&gt;fibonacci&lt;/span&gt;(n: &lt;span style="color:#fabd2f"&gt;u32&lt;/span&gt;) -&amp;gt; &lt;span style="color:#fabd2f"&gt;u32&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 2&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fe8019"&gt;if&lt;/span&gt; n &lt;span style="color:#fe8019"&gt;&amp;lt;=&lt;/span&gt; &lt;span style="color:#d3869b"&gt;1&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 3&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fe8019"&gt;return&lt;/span&gt; n;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 4&lt;/span&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 5&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fe8019"&gt;let&lt;/span&gt; (&lt;span style="color:#fe8019"&gt;mut&lt;/span&gt; prev, &lt;span style="color:#fe8019"&gt;mut&lt;/span&gt; current) &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; (&lt;span style="color:#d3869b"&gt;0&lt;/span&gt;, &lt;span style="color:#d3869b"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 6&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fe8019"&gt;for&lt;/span&gt; _ &lt;span style="color:#fe8019"&gt;in&lt;/span&gt; &lt;span style="color:#d3869b"&gt;2&lt;/span&gt;&lt;span style="color:#fe8019"&gt;..=&lt;/span&gt;n {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 7&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fe8019"&gt;let&lt;/span&gt; next &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; prev &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; current;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 8&lt;/span&gt;&lt;span&gt; prev &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; current;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 9&lt;/span&gt;&lt;span&gt; current &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; next;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;10&lt;/span&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;11&lt;/span&gt;&lt;span&gt; current
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;12&lt;/span&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;13&lt;/span&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;14&lt;/span&gt;&lt;span&gt;&lt;span style="color:#fe8019"&gt;fn&lt;/span&gt; &lt;span style="color:#fabd2f"&gt;main&lt;/span&gt;() {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;15&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fabd2f"&gt;println!&lt;/span&gt;(&lt;span style="color:#b8bb26"&gt;&amp;#34;&lt;/span&gt;&lt;span style="color:#b8bb26"&gt;{}&lt;/span&gt;&lt;span style="color:#b8bb26"&gt;&amp;#34;&lt;/span&gt;, fibonacci(&lt;span style="color:#d3869b"&gt;46&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;16&lt;/span&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4&gt;Compile time&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;1&lt;/span&gt;&lt;span&gt;hyperfine --runs &lt;span style="color:#d3869b"&gt;100&lt;/span&gt; --warmup &lt;span style="color:#d3869b"&gt;10&lt;/span&gt; --shell&lt;span style="color:#fe8019"&gt;=&lt;/span&gt;none &lt;span style="color:#b8bb26"&gt;&amp;#39;rustc -C opt-level=3 fibonacci.rs&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;2&lt;/span&gt;&lt;span&gt;Benchmark 1: rustc -C opt-level&lt;span style="color:#fe8019"&gt;=&lt;/span&gt;&lt;span style="color:#d3869b"&gt;3&lt;/span&gt; fibonacci.rs
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;3&lt;/span&gt;&lt;span&gt; Time &lt;span style="color:#fe8019"&gt;(&lt;/span&gt;mean ± σ&lt;span style="color:#fe8019"&gt;)&lt;/span&gt;: 173.4 ms ± 3.0 ms &lt;span style="color:#fe8019"&gt;[&lt;/span&gt;User: 130.5 ms, System: 51.2 ms&lt;span style="color:#fe8019"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;4&lt;/span&gt;&lt;span&gt; Range &lt;span style="color:#fe8019"&gt;(&lt;/span&gt;min … max&lt;span style="color:#fe8019"&gt;)&lt;/span&gt;: 168.6 ms … 183.8 ms &lt;span style="color:#d3869b"&gt;100&lt;/span&gt; runs
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4&gt;Runtime&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;1&lt;/span&gt;&lt;span&gt;hyperfine --runs &lt;span style="color:#d3869b"&gt;1000&lt;/span&gt; --warmup &lt;span style="color:#d3869b"&gt;10&lt;/span&gt; --shell&lt;span style="color:#fe8019"&gt;=&lt;/span&gt;none &lt;span style="color:#b8bb26"&gt;&amp;#39;./fibonacci&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;2&lt;/span&gt;&lt;span&gt;Benchmark 1: ./fibonacci
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;3&lt;/span&gt;&lt;span&gt; Time &lt;span style="color:#fe8019"&gt;(&lt;/span&gt;mean ± σ&lt;span style="color:#fe8019"&gt;)&lt;/span&gt;: 490.4 µs ± 82.8 µs &lt;span style="color:#fe8019"&gt;[&lt;/span&gt;User: 264.9 µs, System: 129.3 µs&lt;span style="color:#fe8019"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;4&lt;/span&gt;&lt;span&gt; Range &lt;span style="color:#fe8019"&gt;(&lt;/span&gt;min … max&lt;span style="color:#fe8019"&gt;)&lt;/span&gt;: 375.1 µs … 1092.6 µs &lt;span style="color:#d3869b"&gt;1000&lt;/span&gt; runs
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4&gt;Binary size&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;1&lt;/span&gt;&lt;span&gt;strip fibonacci
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;2&lt;/span&gt;&lt;span&gt;wc -c fibonacci
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;3&lt;/span&gt;&lt;span&gt;&lt;span style="color:#d3869b"&gt;321920&lt;/span&gt; fibonacci
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="benchmark-no-2-29988-lines-of-code"&gt;Benchmark No. 2: 29'988 lines of code&lt;/h2&gt;
&lt;p&gt;I wrote a &lt;highlight&gt;&lt;a href="https://belijzajac.dev/post-data/main.py"&gt;Python script&lt;/a&gt;&lt;/highlight&gt; that generates program code for WisniaLang, C++, and Rust. It generates similar calls to a function named &lt;code&gt;calculate_1997&lt;/code&gt;, such as &lt;code&gt;calculate_1&lt;/code&gt;, &lt;code&gt;calculate_2&lt;/code&gt;, and &lt;code&gt;calculate_1999&lt;/code&gt;, for over 2000 times:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-cpp" data-lang="cpp"&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 1&lt;/span&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 2&lt;/span&gt;&lt;span&gt;&lt;span style="color:#fabd2f"&gt;void&lt;/span&gt; calculate_1997() {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 3&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fabd2f"&gt;int&lt;/span&gt; i &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; &lt;span style="color:#d3869b"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 4&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fabd2f"&gt;int&lt;/span&gt; a &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; &lt;span style="color:#d3869b"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 5&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fabd2f"&gt;int&lt;/span&gt; b &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; &lt;span style="color:#d3869b"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 6&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fe8019"&gt;while&lt;/span&gt; (b &lt;span style="color:#fe8019"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#d3869b"&gt;1997&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 7&lt;/span&gt;&lt;span&gt; a &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; a &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; b &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; i;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 8&lt;/span&gt;&lt;span&gt; b &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; a &lt;span style="color:#fe8019"&gt;-&lt;/span&gt; b &lt;span style="color:#fe8019"&gt;-&lt;/span&gt; i;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 9&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fabd2f"&gt;int&lt;/span&gt; c &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; a &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; b;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;10&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fabd2f"&gt;int&lt;/span&gt; d &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; a &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; b &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; c;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;11&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fabd2f"&gt;int&lt;/span&gt; e &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; a &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; b &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; c &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; d;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;12&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fabd2f"&gt;int&lt;/span&gt; f &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; a &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; b &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; c &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; d &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; e;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;13&lt;/span&gt;&lt;span&gt; i &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; f &lt;span style="color:#fe8019"&gt;-&lt;/span&gt; e &lt;span style="color:#fe8019"&gt;-&lt;/span&gt; d &lt;span style="color:#fe8019"&gt;-&lt;/span&gt; c &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; &lt;span style="color:#d3869b"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;14&lt;/span&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;15&lt;/span&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;16&lt;/span&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;17&lt;/span&gt;&lt;span&gt;&lt;span style="color:#fabd2f"&gt;int&lt;/span&gt; main() {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;18&lt;/span&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;19&lt;/span&gt;&lt;span&gt; calculate_1997();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;20&lt;/span&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;21&lt;/span&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You can run this script with &lt;code&gt;python main.py --wisnia --cpp --rust 2000&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id="wisnialang-benchmark-1"&gt;WisniaLang benchmark&lt;/h3&gt;
&lt;p&gt;The program can be found at &lt;highlight&gt;&lt;a href="https://belijzajac.dev/post-data/calculate.wsn"&gt;post-data/calculate.wsn&lt;/a&gt;&lt;/highlight&gt;.&lt;/p&gt;
&lt;h4&gt;Compile time&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;1&lt;/span&gt;&lt;span&gt;hyperfine --runs &lt;span style="color:#d3869b"&gt;20&lt;/span&gt; --warmup &lt;span style="color:#d3869b"&gt;1&lt;/span&gt; --shell&lt;span style="color:#fe8019"&gt;=&lt;/span&gt;none &lt;span style="color:#b8bb26"&gt;&amp;#39;./wisnia calculate.wsn&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;2&lt;/span&gt;&lt;span&gt;Benchmark 1: ./wisnia calculate.wsn
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;3&lt;/span&gt;&lt;span&gt; Time &lt;span style="color:#fe8019"&gt;(&lt;/span&gt;mean ± σ&lt;span style="color:#fe8019"&gt;)&lt;/span&gt;: 2.367 s ± 0.054 s &lt;span style="color:#fe8019"&gt;[&lt;/span&gt;User: 2.328 s, System: 0.036 s&lt;span style="color:#fe8019"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;4&lt;/span&gt;&lt;span&gt; Range &lt;span style="color:#fe8019"&gt;(&lt;/span&gt;min … max&lt;span style="color:#fe8019"&gt;)&lt;/span&gt;: 2.282 s … 2.466 s &lt;span style="color:#d3869b"&gt;20&lt;/span&gt; runs
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4&gt;Binary size&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;1&lt;/span&gt;&lt;span&gt;wc -c a.out
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;2&lt;/span&gt;&lt;span&gt;&lt;span style="color:#d3869b"&gt;336025&lt;/span&gt; a.out
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="c-gcc-benchmark-1"&gt;C++ (gcc) benchmark&lt;/h3&gt;
&lt;p&gt;The program can be found at &lt;highlight&gt;&lt;a href="https://belijzajac.dev/post-data/calculate.cpp"&gt;post-data/calculate.cpp&lt;/a&gt;&lt;/highlight&gt;.&lt;/p&gt;
&lt;h4&gt;Compile time&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;1&lt;/span&gt;&lt;span&gt;hyperfine --runs &lt;span style="color:#d3869b"&gt;20&lt;/span&gt; --warmup &lt;span style="color:#d3869b"&gt;1&lt;/span&gt; --shell&lt;span style="color:#fe8019"&gt;=&lt;/span&gt;none &lt;span style="color:#b8bb26"&gt;&amp;#39;g++ -std=c++23 -O3 calculate.cpp&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;2&lt;/span&gt;&lt;span&gt;Benchmark 1: g++ -std&lt;span style="color:#fe8019"&gt;=&lt;/span&gt;c++23 -O3 calculate.cpp
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;3&lt;/span&gt;&lt;span&gt; Time &lt;span style="color:#fe8019"&gt;(&lt;/span&gt;mean ± σ&lt;span style="color:#fe8019"&gt;)&lt;/span&gt;: 2.177 s ± 0.009 s &lt;span style="color:#fe8019"&gt;[&lt;/span&gt;User: 2.110 s, System: 0.064 s&lt;span style="color:#fe8019"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;4&lt;/span&gt;&lt;span&gt; Range &lt;span style="color:#fe8019"&gt;(&lt;/span&gt;min … max&lt;span style="color:#fe8019"&gt;)&lt;/span&gt;: 2.156 s … 2.193 s &lt;span style="color:#d3869b"&gt;20&lt;/span&gt; runs
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4&gt;Binary size&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;1&lt;/span&gt;&lt;span&gt;strip a.out
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;2&lt;/span&gt;&lt;span&gt;wc -c a.out
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;3&lt;/span&gt;&lt;span&gt;&lt;span style="color:#d3869b"&gt;96304&lt;/span&gt; a.out
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="c-clang-benchmark-1"&gt;C++ (clang) benchmark&lt;/h3&gt;
&lt;p&gt;Same program as before, just different compiler.&lt;/p&gt;
&lt;h4&gt;Compile time&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;1&lt;/span&gt;&lt;span&gt;hyperfine --runs &lt;span style="color:#d3869b"&gt;20&lt;/span&gt; --warmup &lt;span style="color:#d3869b"&gt;1&lt;/span&gt; --shell&lt;span style="color:#fe8019"&gt;=&lt;/span&gt;none &lt;span style="color:#b8bb26"&gt;&amp;#39;clang++ -std=c++2b -O3 calculate.cpp&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;2&lt;/span&gt;&lt;span&gt;Benchmark 1: clang++ -std&lt;span style="color:#fe8019"&gt;=&lt;/span&gt;c++2b -O3 calculate.cpp
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;3&lt;/span&gt;&lt;span&gt; Time &lt;span style="color:#fe8019"&gt;(&lt;/span&gt;mean ± σ&lt;span style="color:#fe8019"&gt;)&lt;/span&gt;: 2.179 s ± 0.025 s &lt;span style="color:#fe8019"&gt;[&lt;/span&gt;User: 2.125 s, System: 0.048 s&lt;span style="color:#fe8019"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;4&lt;/span&gt;&lt;span&gt; Range &lt;span style="color:#fe8019"&gt;(&lt;/span&gt;min … max&lt;span style="color:#fe8019"&gt;)&lt;/span&gt;: 2.156 s … 2.252 s &lt;span style="color:#d3869b"&gt;20&lt;/span&gt; runs
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4&gt;Binary size&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;1&lt;/span&gt;&lt;span&gt;strip a.out
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;2&lt;/span&gt;&lt;span&gt;wc -c a.out
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;3&lt;/span&gt;&lt;span&gt;&lt;span style="color:#d3869b"&gt;96336&lt;/span&gt; a.out
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="rust-benchmark-1"&gt;Rust benchmark&lt;/h3&gt;
&lt;p&gt;The program can be found at &lt;highlight&gt;&lt;a href="https://belijzajac.dev/post-data/calculate.rs"&gt;post-data/calculate.rs&lt;/a&gt;&lt;/highlight&gt;.&lt;/p&gt;
&lt;h4&gt;Compile time&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;1&lt;/span&gt;&lt;span&gt;hyperfine --runs &lt;span style="color:#d3869b"&gt;20&lt;/span&gt; --warmup &lt;span style="color:#d3869b"&gt;1&lt;/span&gt; --shell&lt;span style="color:#fe8019"&gt;=&lt;/span&gt;none &lt;span style="color:#b8bb26"&gt;&amp;#39;rustc -C opt-level=3 calculate.rs&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;2&lt;/span&gt;&lt;span&gt;Benchmark 1: rustc -C opt-level&lt;span style="color:#fe8019"&gt;=&lt;/span&gt;&lt;span style="color:#d3869b"&gt;3&lt;/span&gt; calculate.rs
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;3&lt;/span&gt;&lt;span&gt; Time &lt;span style="color:#fe8019"&gt;(&lt;/span&gt;mean ± σ&lt;span style="color:#fe8019"&gt;)&lt;/span&gt;: 2.353 s ± 0.027 s &lt;span style="color:#fe8019"&gt;[&lt;/span&gt;User: 2.268 s, System: 0.095 s&lt;span style="color:#fe8019"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;4&lt;/span&gt;&lt;span&gt; Range &lt;span style="color:#fe8019"&gt;(&lt;/span&gt;min … max&lt;span style="color:#fe8019"&gt;)&lt;/span&gt;: 2.324 s … 2.436 s &lt;span style="color:#d3869b"&gt;20&lt;/span&gt; runs
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4&gt;Binary size&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;1&lt;/span&gt;&lt;span&gt;strip calculate
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;2&lt;/span&gt;&lt;span&gt;wc -c calculate
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;3&lt;/span&gt;&lt;span&gt;&lt;span style="color:#d3869b"&gt;317824&lt;/span&gt; calculate
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="results"&gt;Results&lt;/h2&gt;
&lt;p&gt;Combining mean compile time, runtime, and binary sizes from benchmark results, we obtain the following graphs.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://belijzajac.dev/post-images/benchmark-1.png" alt="benchmark-results-1"&gt;&lt;/p&gt;
&lt;p&gt;The runtime range for WisniaLang was from &lt;code&gt;84.0 µs&lt;/code&gt; to &lt;code&gt;736.3 µs&lt;/code&gt; over 1000 program runs, indicating ambiguous results due to benchmarking a 17-line program that executes 3 lines of code 45 times. However, this does demonstrate the speed at which we can compile small programs. In the future, I plan to report on the recursive Fibonacci sequence.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://belijzajac.dev/post-images/benchmark-2.png" alt="benchmark-results-1"&gt;&lt;/p&gt;
&lt;p&gt;WisniaLang generates code as fast as established compilers, but this may be because it doesn&amp;rsquo;t perform many static code analysis or optimization steps. This has resulted in my binary being quite large. In contrast, C++ optimizes out redundant code, simplifying the while loop to use at most three variables. This is the while loop in question:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-cpp" data-lang="cpp"&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;1&lt;/span&gt;&lt;span&gt;&lt;span style="color:#fe8019"&gt;while&lt;/span&gt; (b &lt;span style="color:#fe8019"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#d3869b"&gt;1997&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;2&lt;/span&gt;&lt;span&gt; a &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; a &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; b &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; i;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;3&lt;/span&gt;&lt;span&gt; b &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; a &lt;span style="color:#fe8019"&gt;-&lt;/span&gt; b &lt;span style="color:#fe8019"&gt;-&lt;/span&gt; i;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;4&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fabd2f"&gt;int&lt;/span&gt; c &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; a &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; b;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;5&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fabd2f"&gt;int&lt;/span&gt; d &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; a &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; b &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; c;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;6&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fabd2f"&gt;int&lt;/span&gt; e &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; a &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; b &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; c &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; d;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;7&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fabd2f"&gt;int&lt;/span&gt; f &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; a &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; b &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; c &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; d &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; e;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;8&lt;/span&gt;&lt;span&gt; i &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; f &lt;span style="color:#fe8019"&gt;-&lt;/span&gt; e &lt;span style="color:#fe8019"&gt;-&lt;/span&gt; d &lt;span style="color:#fe8019"&gt;-&lt;/span&gt; c &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; &lt;span style="color:#d3869b"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;9&lt;/span&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;What I mean is that C++ likely optimized the code to use only three variables &amp;ndash; &lt;code&gt;a&lt;/code&gt;, &lt;code&gt;b&lt;/code&gt;, and &lt;code&gt;i&lt;/code&gt; &amp;ndash; by substituting the values of &lt;code&gt;c&lt;/code&gt;, &lt;code&gt;d&lt;/code&gt;, &lt;code&gt;e&lt;/code&gt;, and &lt;code&gt;f&lt;/code&gt; directly, thereby reducing redundancy. This is something I&amp;rsquo;ll fix in the future releases of WisniaLang.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;If compilation speed and binary size are important, dropping the LLVM toolchain can have a positive impact&lt;/li&gt;
&lt;li&gt;However, doing so means missing out on LLVM optimizations as well as support for arbitrary OSes and architectures&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Outperforming Rayon with OpenMP</title><link>https://belijzajac.dev/outperforming-rayon-with-openmp/</link><pubDate>Tue, 16 Nov 2021 00:00:00 +0000</pubDate><author>blog@belijzajac.dev (belijzajac)</author><guid>https://belijzajac.dev/outperforming-rayon-with-openmp/</guid><description>&lt;p&gt;&lt;img src="https://belijzajac.dev/post-images/rip-craberino.jpg" alt="rip-craberino"&gt;&lt;/p&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;For the Blockchain Technologies course, students were paired into groups and assigned to produce the fastest Rust library implementing the KZG10 cryptographic scheme. Two teams used the &lt;highlight&gt;&lt;a href="https://github.com/supranational/blst"&gt;blst&lt;/a&gt;&lt;/highlight&gt; backend, which is implemented in assembly and has direct bindings for Rust and C. The first team, &lt;highlight&gt;&lt;a href="https://github.com/grandinetech/rust-kzg/tree/main/blst"&gt;blst-from-scratch&lt;/a&gt;&lt;/highlight&gt;, used the Rust bindings provided by the blst library to produce an interface closer to &lt;highlight&gt;&lt;a href="https://github.com/benjaminion/c-kzg"&gt;c-kzg&lt;/a&gt;&lt;/highlight&gt;. The second team, which I was part of, worked on the &lt;highlight&gt;&lt;a href="https://github.com/grandinetech/rust-kzg/tree/main/ckzg"&gt;ckzg&lt;/a&gt;&lt;/highlight&gt; library in C. We were responsible for producing an implementation that could integrate into Rust via the C bindings provided by my team.&lt;/p&gt;
&lt;h2 id="choosing-the-right-tool-for-the-job"&gt;Choosing the right tool for the job&lt;/h2&gt;
&lt;p&gt;It&amp;rsquo;s a no-brainer for Rust programmers to choose &lt;code&gt;Rayon&lt;/code&gt; when it comes to writing parallel code, as there aren&amp;rsquo;t many other viable and easy-to-use options available. While Rust does offer alternatives like &lt;code&gt;std::thread&lt;/code&gt;, which provides access to native OS threads, the manual creation and management of threads can be cumbersome.&lt;/p&gt;
&lt;p&gt;When I was working on my C code, I had to decide on the best approach to parallelize it. My options included:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;pthread&lt;/code&gt;: A POSIX standard for thread creation and management.&lt;/li&gt;
&lt;li&gt;A popular third-party threadpool library from GitHub.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;OpenMP&lt;/code&gt;: Parallel programming library for C and C++ without manual thread management.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I chose OpenMP because, during experimentation, I discovered it yielded the best results and was relatively straightforward to use. However, I encountered a challenge in integrating it with Rust to ensure compatibility across multiple platforms, starting with Linux and possibly macOS. Eventually, I came up with the following Bash script to automate the entire process of building and packaging shared libraries. Fortunately, OpenMP was integrated into Rust by either:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;exporting the &lt;code&gt;RUSTFLAGS&lt;/code&gt; environment variable pointing to the correct &lt;code&gt;libomp&lt;/code&gt; LLVM runtime&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 1&lt;/span&gt;&lt;span&gt;&lt;span style="color:#928374;font-style:italic"&gt;# Linux&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 2&lt;/span&gt;&lt;span&gt;apt install libomp-dev
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 3&lt;/span&gt;&lt;span&gt;&lt;span style="color:#fabd2f"&gt;export&lt;/span&gt; LIBOMP_PATH&lt;span style="color:#fe8019"&gt;=&lt;/span&gt;&lt;span style="color:#fe8019"&gt;$(&lt;/span&gt;find /usr/lib/llvm* -name libiomp5.so | head -n 1&lt;span style="color:#fe8019"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 4&lt;/span&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 5&lt;/span&gt;&lt;span&gt;&lt;span style="color:#928374;font-style:italic"&gt;# MacOS&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 6&lt;/span&gt;&lt;span&gt;brew install libomp
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 7&lt;/span&gt;&lt;span&gt;ln -s /usr/local/opt/libomp/lib/libomp.dylib /usr/local/lib
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 8&lt;/span&gt;&lt;span&gt;ln -s /usr/local/opt/libomp/include/omp.h /usr/local/include
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 9&lt;/span&gt;&lt;span&gt;&lt;span style="color:#fabd2f"&gt;export&lt;/span&gt; LIBOMP_PATH&lt;span style="color:#fe8019"&gt;=&lt;/span&gt;/usr/local/lib/libomp.dylib
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;10&lt;/span&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;11&lt;/span&gt;&lt;span&gt;&lt;span style="color:#928374;font-style:italic"&gt;# And finally&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;12&lt;/span&gt;&lt;span&gt;&lt;span style="color:#fabd2f"&gt;export&lt;/span&gt; RUSTFLAGS&lt;span style="color:#fe8019"&gt;=&lt;/span&gt;&lt;span style="color:#b8bb26"&gt;&amp;#34;-C link-arg=&lt;/span&gt;$LIBOMP_PATH&lt;span style="color:#b8bb26"&gt;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ul&gt;
&lt;li&gt;or creating a &lt;code&gt;.cargo/config.toml&lt;/code&gt; file inside the project directory and mentioning it there&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;1&lt;/span&gt;&lt;span&gt;[build]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;2&lt;/span&gt;&lt;span&gt;rustflags = [
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;3&lt;/span&gt;&lt;span&gt; &amp;#34;-C&amp;#34;, &amp;#34;link-arg=LIBOMP_PATH&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;4&lt;/span&gt;&lt;span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Well, that was simple.&lt;/p&gt;
&lt;h2 id="searching-for-bottlenecks"&gt;Searching for bottlenecks&lt;/h2&gt;
&lt;p&gt;In order to optimize a program&amp;rsquo;s performance, CPU profiling tools like &lt;code&gt;Perf&lt;/code&gt; play a crucial role by providing detailed insights into where computational resources are being used. One powerful visualization tool generated by these profilers is the flamegraph, which offers a clear representation of a program&amp;rsquo;s CPU usage over time.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://belijzajac.dev/post-images/flame-graphu.svg" alt="flamegraph-of-fft-g1"&gt;&lt;/p&gt;
&lt;p&gt;The flamegraph displayed above illustrates the CPU time distribution of the c-kzg library&amp;rsquo;s &lt;code&gt;fft_g1&lt;/code&gt; benchmark. Upon analysis, it became evident that a significant portion of the execution time was spent in assembly code, highlighting potential areas for optimization. Further investigation on &lt;highlight&gt;&lt;a href="https://github.com/protolambda/go-kzg"&gt;go-kzg&lt;/a&gt;&lt;/highlight&gt; revealed that the &lt;code&gt;fft_g1&lt;/code&gt; benchmark was indeed a performance bottleneck and stood out as a prime candidate for parallelization. By parallelizing this specific operation, we can improving the overall performance of the library.&lt;/p&gt;
&lt;h2 id="parallelizing-fft_g1"&gt;Parallelizing fft_g1&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;fft_g1&lt;/code&gt; function calls the &lt;code&gt;fft_g1_fast&lt;/code&gt; function, which applies the &lt;em&gt;divide-and-conquer&lt;/em&gt; principle to divide a large problem into smaller subproblems, recursively solving each of them. The general procedure here is to distribute work (&lt;code&gt;fft_f1_fast&lt;/code&gt;s) among worker threads.&lt;/p&gt;
&lt;p&gt;The blst-from-scratch team implemented it as follows:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;display:grid;"&gt;&lt;code class="language-rust" data-lang="rust"&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 1&lt;/span&gt;&lt;span&gt;&lt;span style="color:#fe8019"&gt;let&lt;/span&gt; (lo, hi) &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; ret.split_at_mut(half);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex; background-color:#3d3d3d"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 2&lt;/span&gt;&lt;span&gt;rayon::join(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex; background-color:#3d3d3d"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 3&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fe8019"&gt;||&lt;/span&gt; fft_g1_fast(lo, data, stride &lt;span style="color:#fe8019"&gt;*&lt;/span&gt; &lt;span style="color:#d3869b"&gt;2&lt;/span&gt;, roots, roots_stride &lt;span style="color:#fe8019"&gt;*&lt;/span&gt; &lt;span style="color:#d3869b"&gt;2&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex; background-color:#3d3d3d"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 4&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fe8019"&gt;||&lt;/span&gt; fft_g1_fast(hi, &lt;span style="color:#fe8019"&gt;&amp;amp;&lt;/span&gt;data[stride&lt;span style="color:#fe8019"&gt;..&lt;/span&gt;], stride &lt;span style="color:#fe8019"&gt;*&lt;/span&gt; &lt;span style="color:#d3869b"&gt;2&lt;/span&gt;, roots, roots_stride &lt;span style="color:#fe8019"&gt;*&lt;/span&gt; &lt;span style="color:#d3869b"&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex; background-color:#3d3d3d"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 5&lt;/span&gt;&lt;span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 6&lt;/span&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 7&lt;/span&gt;&lt;span&gt;&lt;span style="color:#fe8019"&gt;for&lt;/span&gt; i &lt;span style="color:#fe8019"&gt;in&lt;/span&gt; &lt;span style="color:#d3869b"&gt;0&lt;/span&gt;&lt;span style="color:#fe8019"&gt;..&lt;/span&gt;half {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 8&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fe8019"&gt;let&lt;/span&gt; y_times_root &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; ret[i &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; half].mul(&lt;span style="color:#fe8019"&gt;&amp;amp;&lt;/span&gt;roots[i &lt;span style="color:#fe8019"&gt;*&lt;/span&gt; roots_stride]);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 9&lt;/span&gt;&lt;span&gt; ret[i &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; half] &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; ret[i].sub(&lt;span style="color:#fe8019"&gt;&amp;amp;&lt;/span&gt;y_times_root);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;10&lt;/span&gt;&lt;span&gt; ret[i] &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; ret[i].add_or_dbl(&lt;span style="color:#fe8019"&gt;&amp;amp;&lt;/span&gt;y_times_root);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;11&lt;/span&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;As a side note, &lt;code&gt;rayon::join&lt;/code&gt; spawns two threads, one executing each of the two closures.&lt;/p&gt;
&lt;p&gt;The C equivalent, on the other hand, was as follows:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;display:grid;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex; background-color:#3d3d3d"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 1&lt;/span&gt;&lt;span&gt;&lt;span style="color:#8ec07c"&gt;#pragma omp parallel sections
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 2&lt;/span&gt;&lt;span&gt;&lt;span style="color:#8ec07c"&gt;&lt;/span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex; background-color:#3d3d3d"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 3&lt;/span&gt;&lt;span&gt; &lt;span style="color:#8ec07c"&gt;#pragma omp section
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 4&lt;/span&gt;&lt;span&gt;&lt;span style="color:#8ec07c"&gt;&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 5&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fabd2f"&gt;fft_g1_fast&lt;/span&gt;(out, in, stride &lt;span style="color:#fe8019"&gt;*&lt;/span&gt; &lt;span style="color:#d3869b"&gt;2&lt;/span&gt;, roots, roots_stride &lt;span style="color:#fe8019"&gt;*&lt;/span&gt; &lt;span style="color:#d3869b"&gt;2&lt;/span&gt;, half);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 6&lt;/span&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex; background-color:#3d3d3d"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 7&lt;/span&gt;&lt;span&gt; &lt;span style="color:#8ec07c"&gt;#pragma omp section
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 8&lt;/span&gt;&lt;span&gt;&lt;span style="color:#8ec07c"&gt;&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 9&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fabd2f"&gt;fft_g1_fast&lt;/span&gt;(out &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; half, in &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; stride, stride &lt;span style="color:#fe8019"&gt;*&lt;/span&gt; &lt;span style="color:#d3869b"&gt;2&lt;/span&gt;, roots, roots_stride &lt;span style="color:#fe8019"&gt;*&lt;/span&gt; &lt;span style="color:#d3869b"&gt;2&lt;/span&gt;, half);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;10&lt;/span&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;11&lt;/span&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex; background-color:#3d3d3d"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;12&lt;/span&gt;&lt;span&gt;&lt;span style="color:#8ec07c"&gt;#pragma omp parallel
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex; background-color:#3d3d3d"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;13&lt;/span&gt;&lt;span&gt;&lt;span style="color:#8ec07c"&gt;#pragma omp for
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;14&lt;/span&gt;&lt;span&gt;&lt;span style="color:#8ec07c"&gt;&lt;/span&gt;&lt;span style="color:#fe8019"&gt;for&lt;/span&gt; (&lt;span style="color:#fabd2f"&gt;uint64_t&lt;/span&gt; i &lt;span style="color:#fe8019"&gt;=&lt;/span&gt; &lt;span style="color:#d3869b"&gt;0&lt;/span&gt;; i &lt;span style="color:#fe8019"&gt;&amp;lt;&lt;/span&gt; half; i&lt;span style="color:#fe8019"&gt;++&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;15&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fabd2f"&gt;g1_t&lt;/span&gt; y_times_root;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;16&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fabd2f"&gt;g1_mul&lt;/span&gt;(&lt;span style="color:#fe8019"&gt;&amp;amp;&lt;/span&gt;y_times_root, &lt;span style="color:#fe8019"&gt;&amp;amp;&lt;/span&gt;out[i &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; half], &lt;span style="color:#fe8019"&gt;&amp;amp;&lt;/span&gt;roots[i &lt;span style="color:#fe8019"&gt;*&lt;/span&gt; roots_stride]);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;17&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fabd2f"&gt;g1_sub&lt;/span&gt;(&lt;span style="color:#fe8019"&gt;&amp;amp;&lt;/span&gt;out[i &lt;span style="color:#fe8019"&gt;+&lt;/span&gt; half], &lt;span style="color:#fe8019"&gt;&amp;amp;&lt;/span&gt;out[i], &lt;span style="color:#fe8019"&gt;&amp;amp;&lt;/span&gt;y_times_root);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;18&lt;/span&gt;&lt;span&gt; &lt;span style="color:#fabd2f"&gt;g1_add_or_dbl&lt;/span&gt;(&lt;span style="color:#fe8019"&gt;&amp;amp;&lt;/span&gt;out[i], &lt;span style="color:#fe8019"&gt;&amp;amp;&lt;/span&gt;out[i], &lt;span style="color:#fe8019"&gt;&amp;amp;&lt;/span&gt;y_times_root);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;19&lt;/span&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In addition to parallel sections, I also used OpenMP&amp;rsquo;s parallel for-loop, because I noticed it yielded a &lt;strong&gt;5% greater performance&lt;/strong&gt; on my personal machine. Considering the &lt;code&gt;ubuntu-latest&lt;/code&gt; runner in GitHub Actions CI had only two available cores, the halves of the problem were shared among two threads where each ran the for-loop to do arithmetic operations on polynomial &lt;code&gt;G1&lt;/code&gt; points.&lt;/p&gt;
&lt;p&gt;In the above code snippets, &lt;code&gt;fft_g1&lt;/code&gt; calls &lt;code&gt;fft_g1_fast&lt;/code&gt;, which up to scale 16 should at most &lt;code&gt;1 &amp;lt;&amp;lt; 15&lt;/code&gt; times call itself recursively, where each such call will be distributed among the 2 threads. Since we&amp;rsquo;re computing &lt;code&gt;fft_g1&lt;/code&gt; up to scale 8, there should be &lt;code&gt;(1 &amp;lt;&amp;lt; 7) + 1&lt;/code&gt; tasks (not to be confused by OpenMP&amp;rsquo;s &lt;code&gt;task&lt;/code&gt; pragma directive!) for &lt;code&gt;fft_g1_fast&lt;/code&gt; or &lt;code&gt;129&lt;/code&gt; such tasks that will be run in parallel!&lt;/p&gt;
&lt;h2 id="local-c-kzg-benchmark"&gt;Local c-kzg benchmark&lt;/h2&gt;
&lt;p&gt;Running on my personal computer with i5-7300HQ (4 threads overclocked at 3.50GHz), all mitigations turned off, and a custom Liquorix kernel, I was able to achieve the following results:&lt;/p&gt;
&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Original c-kzg library&lt;/th&gt;&lt;th&gt;Parallelized c-kzg library&lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;display:grid;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 1&lt;/span&gt;&lt;span&gt;$ ./fft_g1_bench
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 2&lt;/span&gt;&lt;span&gt;*** Benchmarking FFT_g1, &lt;span style="color:#d3869b"&gt;1&lt;/span&gt; second per test.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 3&lt;/span&gt;&lt;span&gt;fft_g1/scale_4 &lt;span style="color:#d3869b"&gt;1729769&lt;/span&gt; ns/op
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 4&lt;/span&gt;&lt;span&gt;fft_g1/scale_5 &lt;span style="color:#d3869b"&gt;4935085&lt;/span&gt; ns/op
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 5&lt;/span&gt;&lt;span&gt;fft_g1/scale_6 &lt;span style="color:#d3869b"&gt;12897731&lt;/span&gt; ns/op
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 6&lt;/span&gt;&lt;span&gt;fft_g1/scale_7 &lt;span style="color:#d3869b"&gt;32022026&lt;/span&gt; ns/op
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 7&lt;/span&gt;&lt;span&gt;fft_g1/scale_8 &lt;span style="color:#d3869b"&gt;76552852&lt;/span&gt; ns/op
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 8&lt;/span&gt;&lt;span&gt;fft_g1/scale_9 &lt;span style="color:#d3869b"&gt;184970057&lt;/span&gt; ns/op
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 9&lt;/span&gt;&lt;span&gt;fft_g1/scale_10 &lt;span style="color:#d3869b"&gt;418273808&lt;/span&gt; ns/op
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;10&lt;/span&gt;&lt;span&gt;fft_g1/scale_11 &lt;span style="color:#d3869b"&gt;919499032&lt;/span&gt; ns/op
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;11&lt;/span&gt;&lt;span&gt;fft_g1/scale_12 &lt;span style="color:#d3869b"&gt;2025633037&lt;/span&gt; ns/op
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;12&lt;/span&gt;&lt;span&gt;fft_g1/scale_13 &lt;span style="color:#d3869b"&gt;4479830518&lt;/span&gt; ns/op
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;13&lt;/span&gt;&lt;span&gt;fft_g1/scale_14 &lt;span style="color:#d3869b"&gt;9754557496&lt;/span&gt; ns/op
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex; background-color:#3d3d3d"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;14&lt;/span&gt;&lt;span&gt;fft_g1/scale_15 &lt;span style="color:#d3869b"&gt;21125613058&lt;/span&gt; ns/op
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/td&gt;&lt;td&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;display:grid;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 1&lt;/span&gt;&lt;span&gt;$ OMP_NUM_THREADS&lt;span style="color:#fe8019"&gt;=&lt;/span&gt;&lt;span style="color:#d3869b"&gt;4&lt;/span&gt; ./fft_g1_bench
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 2&lt;/span&gt;&lt;span&gt;*** Benchmarking FFT_g1, &lt;span style="color:#d3869b"&gt;1&lt;/span&gt; second per test.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 3&lt;/span&gt;&lt;span&gt;fft_g1/scale_4 &lt;span style="color:#d3869b"&gt;839454&lt;/span&gt; ns/op
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 4&lt;/span&gt;&lt;span&gt;fft_g1/scale_5 &lt;span style="color:#d3869b"&gt;2378457&lt;/span&gt; ns/op
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 5&lt;/span&gt;&lt;span&gt;fft_g1/scale_6 &lt;span style="color:#d3869b"&gt;6404191&lt;/span&gt; ns/op
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 6&lt;/span&gt;&lt;span&gt;fft_g1/scale_7 &lt;span style="color:#d3869b"&gt;16325966&lt;/span&gt; ns/op
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 7&lt;/span&gt;&lt;span&gt;fft_g1/scale_8 &lt;span style="color:#d3869b"&gt;38141754&lt;/span&gt; ns/op
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 8&lt;/span&gt;&lt;span&gt;fft_g1/scale_9 &lt;span style="color:#d3869b"&gt;90948810&lt;/span&gt; ns/op
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt; 9&lt;/span&gt;&lt;span&gt;fft_g1/scale_10 &lt;span style="color:#d3869b"&gt;204757690&lt;/span&gt; ns/op
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;10&lt;/span&gt;&lt;span&gt;fft_g1/scale_11 &lt;span style="color:#d3869b"&gt;457509973&lt;/span&gt; ns/op
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;11&lt;/span&gt;&lt;span&gt;fft_g1/scale_12 &lt;span style="color:#d3869b"&gt;1006089135&lt;/span&gt; ns/op
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;12&lt;/span&gt;&lt;span&gt;fft_g1/scale_13 &lt;span style="color:#d3869b"&gt;2240095284&lt;/span&gt; ns/op
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;13&lt;/span&gt;&lt;span&gt;fft_g1/scale_14 &lt;span style="color:#d3869b"&gt;4879448286&lt;/span&gt; ns/op
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex; background-color:#3d3d3d"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;14&lt;/span&gt;&lt;span&gt;fft_g1/scale_15 &lt;span style="color:#d3869b"&gt;10650876381&lt;/span&gt; ns/op
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;p&gt;That&amp;rsquo;s &lt;strong&gt;twice as fast&lt;/strong&gt; with as little effort as putting in a few pragmas!&lt;/p&gt;
&lt;h2 id="github-actions-ci-benchmarks"&gt;GitHub Actions CI benchmarks&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;fft_g1&lt;/code&gt; benchmark was limited to scale 7 because the overall run time for the job exceeds the 6 hour limit if I were to benchmark it up to scale 16, as Criterion runs each iteration a couple of hundred times to produce more accurate results, and that used to automatically cancel other running CI jobs as jobs submitted to GitHub Actions are limited to 360 minutes.&lt;/p&gt;
&lt;h3 id="benchmarking-blst-from-scratch"&gt;Benchmarking blst-from-scratch&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://belijzajac.dev/post-images/from-scratch-github-actions.png" alt="from-scratch-github-actions"&gt;&lt;/p&gt;
&lt;p&gt;From the above screenshot we can see that the parallelized version of the library ran &lt;code&gt;1m 28s&lt;/code&gt; shorter than its sequential version, and below are the results of sequential &lt;code&gt;fft_g1&lt;/code&gt; algorithm:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;display:grid;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;1&lt;/span&gt;&lt;span&gt;Benchmarking bench_fft_g1 scale: &amp;#39;7&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;2&lt;/span&gt;&lt;span&gt;Benchmarking bench_fft_g1 scale: &amp;#39;7&amp;#39;: Warming up for 3.0000 s
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;3&lt;/span&gt;&lt;span&gt;Benchmarking bench_fft_g1 scale: &amp;#39;7&amp;#39;: Collecting 100 samples in estimated 6.6364 s (200 iterations)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;4&lt;/span&gt;&lt;span&gt;Benchmarking bench_fft_g1 scale: &amp;#39;7&amp;#39;: Analyzing
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex; background-color:#3d3d3d"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;5&lt;/span&gt;&lt;span&gt;bench_fft_g1 scale: &amp;#39;7&amp;#39; time: [33.423 ms 33.785 ms 34.150 ms]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;of which the average run time for scale 7 was cut down by &lt;code&gt;38.926%&lt;/code&gt; by its parallel counterpart:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;display:grid;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;1&lt;/span&gt;&lt;span&gt;Benchmarking bench_fft_g1 scale: &amp;#39;7&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;2&lt;/span&gt;&lt;span&gt;Benchmarking bench_fft_g1 scale: &amp;#39;7&amp;#39;: Warming up for 3.0000 s
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;3&lt;/span&gt;&lt;span&gt;Benchmarking bench_fft_g1 scale: &amp;#39;7&amp;#39;: Collecting 100 samples in estimated 6.3282 s (300 iterations)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;4&lt;/span&gt;&lt;span&gt;Benchmarking bench_fft_g1 scale: &amp;#39;7&amp;#39;: Analyzing
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex; background-color:#3d3d3d"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;5&lt;/span&gt;&lt;span&gt;bench_fft_g1 scale: &amp;#39;7&amp;#39; time: [20.432 ms 20.634 ms 20.843 ms]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex; background-color:#3d3d3d"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;6&lt;/span&gt;&lt;span&gt; change: [-39.822% -38.926% -38.001%] (p = 0.00 &amp;lt; 0.05)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;7&lt;/span&gt;&lt;span&gt; Performance has improved.
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="benchmarking-ckzg"&gt;Benchmarking ckzg&lt;/h3&gt;
&lt;p&gt;&lt;img src="https://belijzajac.dev/post-images/ckzg-github-actions.png" alt="ckzg-github-actions"&gt;&lt;/p&gt;
&lt;p&gt;The sequential version of the ckzg library ran &lt;code&gt;2m 7s&lt;/code&gt; faster than the same version of blst-from-scratch because it had other benchmarks that performed faster, though the parallelized version ran &lt;code&gt;1m 2s&lt;/code&gt; faster than its sequential version. Below are the results of the sequantial &lt;code&gt;fft_g1&lt;/code&gt; algorithm:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;display:grid;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;1&lt;/span&gt;&lt;span&gt;Benchmarking bench_fft_g1 scale: &amp;#39;7&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;2&lt;/span&gt;&lt;span&gt;Benchmarking bench_fft_g1 scale: &amp;#39;7&amp;#39;: Warming up for 3.0000 s
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;3&lt;/span&gt;&lt;span&gt;Benchmarking bench_fft_g1 scale: &amp;#39;7&amp;#39;: Collecting 100 samples in estimated 6.8313 s (200 iterations)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;4&lt;/span&gt;&lt;span&gt;Benchmarking bench_fft_g1 scale: &amp;#39;7&amp;#39;: Analyzing
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex; background-color:#3d3d3d"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;5&lt;/span&gt;&lt;span&gt;bench_fft_g1 scale: &amp;#39;7&amp;#39; time: [32.194 ms 32.471 ms 32.760 ms]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Yet the parallel version of the &lt;code&gt;fft_g1&lt;/code&gt; algorithm performed much faster than it did for blst-from-scratch, even though both unparallelized versions for both teams performed evenly:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#ebdbb2;background-color:#282828;-moz-tab-size:4;-o-tab-size:4;tab-size:4;display:grid;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;1&lt;/span&gt;&lt;span&gt;Benchmarking bench_fft_g1 scale: &amp;#39;7&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;2&lt;/span&gt;&lt;span&gt;Benchmarking bench_fft_g1 scale: &amp;#39;7&amp;#39;: Warming up for 3.0000 s
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;3&lt;/span&gt;&lt;span&gt;Benchmarking bench_fft_g1 scale: &amp;#39;7&amp;#39;: Collecting 100 samples in estimated 5.0701 s (300 iterations)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;4&lt;/span&gt;&lt;span&gt;Benchmarking bench_fft_g1 scale: &amp;#39;7&amp;#39;: Analyzing
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex; background-color:#3d3d3d"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;5&lt;/span&gt;&lt;span&gt;bench_fft_g1 scale: &amp;#39;7&amp;#39; time: [16.854 ms 17.107 ms 17.439 ms]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex; background-color:#3d3d3d"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;6&lt;/span&gt;&lt;span&gt; change: [-48.216% -47.318% -46.306%] (p = 0.00 &amp;lt; 0.05)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span style="white-space:pre;-webkit-user-select:none;user-select:none;margin-right:0.4em;padding:0 0.4em 0 0.4em;color:#756d59"&gt;7&lt;/span&gt;&lt;span&gt; Performance has improved.
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;OpenMP lets you quickly prototype what is possible to parallelize with the help of CPU profiling tools like Perf&lt;/li&gt;
&lt;li&gt;Criterion is actually a really nice benchmarking tool to measure performance, especially when integrated into CI&lt;/li&gt;
&lt;/ul&gt;</description></item></channel></rss>