Written by Ryan (@chokobole33)
Special thanks to Ashjeong and KryptoSergeant for feedback and review.
TL;DR
- The integration of ICICLE introduced GPU acceleration for MSM and FFT, resulting in a 4x faster Circom proof generation.
- Parsing ZKey took around half of the total proof generation time in the past, but is now 3x faster
- The Mixed Matrix Commitment Scheme used in Plonky3 has been introduced.
- The Binary Field used in Binius has been introduced.
Overview
Currently, ZK proof generation is extremely expensive and slow. The Kroma team operates an Optimistic Rollup which mitigates the high operating cost to users that occurs when running, in comparison, a ZK Rollup. Through ZK proof technology, we implement ZK fault proofs to secure the network. To reduce the time and cost of ZK proof generation and to make ZK more widely usable for various purposes, we have been developing an original ZK library named Tachyon. Tachyon is a modular ZK backend that supports various frontends and can generate Halo2 and Circom proofs. Last June, we shared the news of Tachyon’s second official release, which you can read more about here: Tachyon in Mainnet. In the rest of this article, we will cover the new features and improvements we have introduced to Tachyon.
What’s Optimized in v0.3.0?
Circom proof generation becomes 4x faster using GPU
MSM and FFT are known to be the most expensive computations within the process of generating ZK proofs as shown through Ingonyama’s article “Hardware Review: GPUs , FPGAs and Zero Knowledge Proofs.” To explain briefly, MSM calculates the inner product between vectors of points on an Elliptic Curve and vectors of the Elliptic Curve’s scalar field, while FFT converts polynomials from coefficient form to evaluation form.
Though Tachyon’s CPU-based MSM and FFT implementations are currently already faster than other libraries, extreme improvements in proof generation speed have been furthered on the GPU side. Previously, MSM GPUs had significant speed variances when scalar points were not uniformly distributed, limiting their use in production environments; however, this has been resolved with Ingonyama’s ICICLE technology, a library of “blazing fast cryptographic primitives…on GPU” (What is ICICLE? | Ingonyama Developer Documentation). ICICLE has been integrated into Tachyon, resulting in an 8–10x speed improvement for MSM and a 3–5x speed improvement for NTT compared to traditional CPU implementations, totaling a 4x speed improvement in Circom proof generation. The graphs below show the stark difference in speed between Tachyon’s already fast CPU implementation versus Tachyon’s insanely fast GPU implementation.
As a modular backend, Tachyon can apply this upgrade across various backends, allowing anyone to easily generate proofs with the GPU Circom prover and reproduce the benchmark results of a 4x speed improvement in Circom proof generation.
Future Releases include the following:
- GPU algorithms will be applied not only to BN254 but to all implemented fields and curves.
- MSM/NTT GPU will be applied to Halo2 proof generation.
- Issues where ICICLE returns invalid results instead of errors when GPU RAM is insufficient will be resolved.
- If GPU RAM is insufficient, MSM will calculate as much as possible on the GPU and use the CPU for the rest of the process, or, alternatively, call the GPU algorithm repeatedly.
- If there is sufficient GPU RAM or multiple GPU devices are available, multiple MSM or NTT computations will be performed simultaneously in a batch.
Parsing ZKey becomes 3x faster
Earlier this month, the Mopro team raised an issue noting that Tachyon was faster than Rapidsnark for circuits with fewer constraints but confirmed that for circuits with many constraints, such as Keccak256, proof generation was much slower.
This strange discrepancy was due to the difference of how inputs are parsed.
Note that when generating a Circom proof, ZKey, Witnesses, and Public input are used as inputs to create the proof.
Analysis showed that while Rapidsnark reads files into memory and uses pointers within that memory, Tachyon directly parses these files with copies to generate proofs. This was the bottleneck for high-degree and highly constrained cases, where parsing time accounted for about half of the overall proving time.
template <typename Curve>
class ProvingKey {
public:
using G1Point = typename Curve::G1Curve::AffinePoint;
using G2Point = typename Curve::G2Curve::AffinePoint;
using F = typename G1Point::ScalarField;
template <typename Circuit>
bool Load(ToxicWaste<Curve>& toxic_waste, const Circuit& circuit) {
// create key from |toxic_waste|.
return true;
}
private:
VerifyingKey<Curve> verifying_key_;
G1Piont beta_g1_;
G1Point delta_g1_;
std::vector<G1Point> a_g1_query_;
std::vector<G1Point> b_g1_query_;
std::vector<G2Point> b_g2_query_;
std::vector<G1Point> h_g1_query_;
std::vector<G1Point> l_g1_query_;
};
The copies occurred because the ProvingKey class attempted to own memory with a std::vector<T>, based on the assumption that the Tachyon Circom prover could directly generate the key to create proofs. In contrast, Rapidsnark assumed that keys would always be read from ZKey files, avoiding this issue.
template <typename T>
class MaybeOwned {
public:
MaybeOwned(const T& value): value_(value), owned_(true) {
ptr_ = &value_;
}
MaybeOwned(Ptr ptr): ptr_(ptr) {}
MaybeOwned& operator=(const T& value) {
value_ = value;
ptr_ = &value_;
return *this;
}
MaybeOwned& operator=(Ptr ptr) {
// Release |value_| if needed.
ptr_ = ptr;
return *this;
}
T& operator*() { return *ptr_; }
const T& operator*() const { return *ptr_; }
private:
T value_;
T* ptr_ = nullptr;
bool owned_ = false;
};
To resolve this, we introduced a class like MaybeOwned<T>, which significantly reduced the ZKey parsing cost, making parsing 3x faster. MaybeOwned<T> is highly versatile and is not only limited to use in ProvingKey. In the next release, we plan to use file mapping to allocate large memory sizes to the disk and use MaybeOwned<T> to reduce costs. Additionally, MaybeOwned<T> will also allow for the use of batch algorithms in the ICICLE API that can be used when polynomials are in contiguous memory, which will also increase speed.
What’s New in v0.3.0?
Mixed Matrix Commitment Scheme
Kroma currently uses the Scroll zkEVM-based Fault Proof method but plans to transition to the SP1 zkVM-based Fault Proof method developed by the Succinct team. The existing zkEVM method requires direct circuit development, which is time-consuming and thus, reduces development productivity. Switching to zkVM will eliminate the need for direct circuit development, thereby increasing productivity; however, zkVMs have drawbacks in proof generation speed and cost compared to zkEVMs. Currently, SP1 uses a library called Plonky3 as its backend, and we aim to improve speed and cost by generating proofs based on Tachyon.
To achieve this, we have introduced the FieldMerkleTree, an implementation of the Mixed Matrix Commitment Scheme. Unlike the Vector Commitment Scheme or Polynomial Commitment Scheme, which take a single vector or polynomial as input, the Mixed Matrix Commitment Scheme can take multiple vectors or polynomials at once. Plonky3 is STARK-based, and STARK can use smaller field sizes compared to SNARK. This allows the use of SIMD to calculate multiple fields simultaneously when determining the Merkle Tree’s root.
Binary Field
The Binary Field is used in Binius. Recently, it gained attention through a blog post by Vitalik. The field used in KZG-based SNARKs is about ~256 bits for security reasons, whereas the GF(2) used in Binary Field is 1 bit. Using the Towers of binary field technique, it can be efficiently expanded up to 128 bits. This field is hardware-friendly and does not waste space when expressing hash functions like Keccak used in Ethereum.
Polygon also recently announced the development of a zkVM using Binius.
Vitallikc highlighted the importance of multi-proofs in the Youtube video “Multi-proofs for rollups: A nice-to-have or a necessity?”. The figures above show how Taiko and Optimism implements their multi-proofs systems. Tachyon is designed as a modular ZK backend with the multi-proofs system in mind as even if the ZK protocol itself is flawless, the prover, being a type of software, might still have bugs. Binius is being developed in parallel to our other updates, and parts of it are included in this v0.3.0 release.
Conclusion
As of the time of writing, Tachyon has seen rapid development, with approximately 2,850 commits. With this release, we have integrated ICICLE, generating the first end-to-end proof using GPU acceleration. In the future, we plan to support more frontends and backends to benefit multiple frontends and backends through a single optimization. With the release of v0.3.0, Tachyon improved total circom proof generation time by 3~4x by integrating ICICLE and optimizing zkey and wtns parsing. Tachyon became the only ZK library to implement all prime fields used in both SNARK and STARK among currently available ZK libraries. Currently, we support Scroll Halo2 v0.3, and in the next v0.4.0 release, we will support Scroll Halo2 v1.1 and PSE Halo2, contributing to the entire Halo2 ecosystem.
About Kroma
As Asia’s leading Layer 2 solution built on the Superchain, Kroma is the first OP Stack rollup with an active fault proof system utilizing zkEVM.
Kroma will transition to a universal ZK Rollup once the generation of ZK proofs becomes more cost-efficient and faster — using its original modular ZK backend library, Tachyon.
Kroma plans to push for gamified web3 experience backed by its strengths in gaming, consumer applications, Asia market, and technical capabilities for true universal web3 adoption.
Follow us:
Website | Twitter | Discord | Warpcast | Github | Docs | Ecosystem | Brand Kit | Grant