Mira: Efficient Folding for Pairing-based Arguments

Succinct non-interactive arguments of knowledge (SNARKs) are powerful cryptographic primitives that enable a prover to convince a verifier of the validity of a claim with a short proof. These primitives enable verifiable computation, where a client can outsource a difficult task to an untrusted cloud provider and then check via a proof that the task was performed correctly. While theoretical SNARK constructions have been known since the 90s, there have been numerous efforts over the past decades to minimize the prover complexity and proof size for real-world applications.

Pairing-based SNARKs have achieved widespread adoption due to a remarkable property: the proof consists of only a few group elements, regardless of the complexity of the claim. A well-known example is the Groth16 SNARK, which can prove the validity of any NP statement. As SNARKs have grown in adoption, there have been efforts to develop domain-specific SNARKs for various applications, including zero-knowledge theorem proving, regular expression matching, and training and inference of deep neural networks.

In the domain of verifiable ML inference, pairing-based SNARKs have recently achieved impressive results. In deep neural networks, the primary operations are matrix-vector multiplication and nonlinear function evaluation. To prove the correctness of a matrix-vector multiplication, a prover may use cqlin, a pairing-based lincheck protocol with constant-size proofs that requires only $O(n)$ prover operations, where $n$ is the matrix order. To prove the correctness of a nonlinear function evaluation, a prover may employ a lookup argument for greater efficiency. In this case, all valid input-output pairs for the function are stored in a lookup table, and the prover argues that the selected pairs are present in this table. For example, cq is a pairing-based lookup argument where the prover complexity is independent of the table size. Other pairing-based arguments, such as KZG polynomial commitment schemes, have also seen broad adoption in decentralized systems and verifiable machine learning.

As SNARKs are applied to increasingly complex computations, it may be infeasible for a single machine to produce a proof with its limited computational resources. To address this issue, there have been efforts to distribute the work of proof generation across multiple machines. One strategy is to partition the work of existing SNARKs across multiple machines. DIZK developed an approach of this nature for distributing the workload of the Groth16 prover. Another strategy is to split a single large statement into $N$ smaller statements, which are proved in multiple steps. At each step, the prover generates a proof that: (1) the current statement is valid, and (2) the proof of the previous statement is valid. This strategy is formalized as incrementally verifiable computation (IVC). This strategy can be generalized to verify any distributed computation over a directed acyclic graph of nodes. In this more general setting, the technique is known as proof-carrying data (PCD). PCD has been highly impactful in real-world applications, enabling efficient zero-knowledge virtual machines, verifiable MapReduce computations, image authentication systems, succinct ledgers, append-only dictionaries, and verifiable financial disclosures. These techniques are useful not only for coordinating the efforts of multiple independent provers, but also for reducing the resource requirements for a single prover that aims to prove the correctness of a complex local computation. For example, to prove the correctness of inference of an ML model with $N$ layers, a prover could partition the work by layer. Alternatively, the prover could further partition the work, separately proving the correctness of each matrix-vector multiplication or nonlinear function evaluation and then aggregating the resulting proofs.

To achieve concrete efficiency improvements, it is critical to minimize the recursive overhead, i.e., the additional work that the prover performs at each step, beyond proving each statement’s validity. In traditional constructions of IVC/PCD, the recursive overhead was substantial due to the costs of representing the pairing operations in the step circuit. Halo pioneered a novel approach to minimize the recursive overhead by leveraging the additive property of the polynomial commitment scheme. Subsequent work further reduced the prover overhead by introducing accumulation/folding schemes. These elegant techniques avoid the costly verification of the previous proof at each step, instead performing a lightweight check for the validity of a related claim. There has been tremendous progress in reducing the complexity of this related claim. In recent schemes such as Nova and Protostar, the dominant cost of these checks is just a few group operations, which are far less expensive than pairings in arithmetic circuit format. However, if the main claim involves pairings, folding will still be resource-intensive.

Recently, there have been efforts to generalize the supported relations for folding schemes. For instance, the initial constructions of accumulation/folding schemes¹ only supported rank-one constraint systems (R1CS). Subsequently, Protostar generalized these techniques to handle a broader set of special-sound protocols with an algebraic verifier, while HyperNova generalized the construction in Nova to support Customizable Constraint Systems, which can represent a variety of arithmetizations, including R1CS, Plonkish and AIR, without significant overheads. However, none of these works have optimized for the important class of pairing-based protocols.

To address these significant limitations, we present a generic accumulation scheme for a broader class of special-sound protocols. In the class of special-sound protocols that are considered in Protostar, the prover sends messages consisting of field elements, and the verifier performs algebraic checks on these messages and other values in the transcript. In our expanded class of special-sound protocols, the prover messages and the public inputs are encoded with a linear-only encoding scheme. In these schemes, an encoding of a linear combination of elements can be obtained from the encodings of the underlying elements using a homomorphic evaluation algorithm. An algebraic test can be performed on the encoded elements to check an algebraic relation on the underlying elements. The pairing operation is an example of a degree-two algebraic test, where the encoded values are group elements. In this new framework, the verifier performs algebraic tests on the encoded values and the challenges.

Two constructions of linear-only encoding schemes are particularly relevant in this setting. The first construction is the trivial identity encoding. This yields a class of protocols where the prover messages are field elements and the verifier checks the output of an algebraic map defined over these field elements. In this case, the Protostar accumulation scheme is directly applicable. The second construction is based on bilinear maps, which allow for degree-two algebraic tests. This yields a class of protocols where the prover messages are group elements, and the verifier performs the algebraic test using pairings. This allows usage of the trivial identity commitment in the accumulation scheme since the prover messages are already succinct in this construction.

To construct an accumulation scheme for a specific protocol, it suffices to specify the relevant details of the protocol and apply the generic accumulation scheme in a black-box manner. The relevant details include the public inputs, the prover’s messages, the verifier’s challenges, and the verifier’s algebraic tests. We specify these details to evaluate the concrete efficiency and clarify the implementation aspects. However, the security of the accumulation scheme for a given protocol will follow directly from our result that establishes the security for the entire class of protocols.

The degree of the relation has a great impact on the efficiency of the folding scheme, since a higher degree relation requires the prover to compute additional cross terms. The verifier must then check that the linear combination of these cross terms is correct. As a result, it is important to design the special-sound protocol to minimize the complexity of the folding verifier. The lincheck protocol from cqlin has a verifier degree that is linear in the matrix order, which would create prohibitive overhead in folding. We propose a modified protocol that achieves a degree that is independent of the matrix order $n$, while only requiring $O(\log n)$ additional field elements to be sent by the prover.

To summarize, we propose a generic accumulation scheme for a broader class of special-sound protocols, building upon the framework in Protostar and Protogalaxy. This scheme enables folding of a variety of useful pairing-based arguments without the need to encode the expensive pairing-based checks in the step circuit, thereby improving the folding efficiency for this class of arguments. We implemented this scheme and demonstrated major performance improvements for two key applications:

The first application is proof aggregation for Groth16 SNARKs. Prior work achieves logarithmic-size proofs or incurs high recursive overhead. We are able to achieve constant-size proofs with low prover complexity via our SnarkStar protocol, a specific instantiation of our scheme that achieves 5.8x faster prover time and 9.7x lower memory usage than the state-of-the-art proof aggregation system. This protocol enables aggregation of proofs with different verification keys, offering a flexible approach to aggregation in prover markets.
The second application is verifiable ML inference. TensorPlonk is a recent SNARK for ML inference that uses cq, cqlin, and KZG polynomial commitments as building blocks. To reduce the prover’s space complexity, we apply our folding scheme to these pairing-based arguments. To further improve efficiency, we provide an optimized lincheck protocol with a folding verifier degree that is independent of the matrix order. We assemble these subprotocols to form the TensorStar protocol, which enables fast proof generation for ML inference on resource-constrained devices. We show that our scheme scales to larger models, whereas current solutions are unable to generate proofs for such models due to memory limitations.

To learn more, check out the full paper: https://eprint.iacr.org/2024/2025

The implementation is available open source: https://github.com/joshbeal/mira

Many thanks to Yale Ventures for supporting this project with a Roberts Innovation Fund Award and to the Snarkify team for their work on the base Protostar implementation.

As in prior work, we use these terms interchangeably, except when providing the precise technical definitions. ↩︎