CryptoDB
Jonas Bertels
Publications and invited talks
Year
Venue
Title
2025
TCHES
OPTIMSM: FPGA hardware accelerator for Zero-Knowledge MSM
Abstract
The Multi-Scalar Multiplication (MSM) is the main barrier to accelerating Zero-Knowledge applications. In recent years, hardware acceleration of this algorithm on both FPGA and GPU has become a popular research topic and the subject of a multi-million dollar prize competition (ZPrize). This work presents OPTIMSM: Optimized Processing Through Iterative Multi-Scalar Multiplication. This novel accelerator focuses on the acceleration of the MSM algorithm for any Elliptic Curve (EC) by improving upon the Pippenger algorithm. A new iteration technique is introduced to decouple the required buckets from the window size, resulting in fewer EC computations for the same on-chip memory resources. Furthermore, we combine known optimizations from the literature for the first time to achieve additional latency improvements. Our enhanced MSM implementation significantly reduces computation time, achieving a speedup of up to x12.77 compared to recent FPGA implementations. Specifically, for the BLS12-381 curve, we reduce the computation time for an MSM of size 224 to 914 ms using a single compute unit on the U55C FPGA or to 231 ms using four U55C devices. These results indicate a substantial improvement in efficiency, paving the way for more scalable and efficient Zero-Knowledge proof systems.
2025
TCHES
FINAL bootstrap acceleration on FPGA using DSP-free constant-multiplier NTTs
Abstract
This work showcases Quatorze-bis, a state-of-the-art Number Theoretic Transform circuit for TFHE-like cryptosystems on FPGAs. It contains a novel modular multiplication design for modular multiplication with a constant for a constant modulus. This modular multiplication design does not require any DSP units or any dedicated multiplier unit, nor does it require extra logic when compared to the state-of-the-art modular multipliers. Furthermore, we present an implementation of a constant multiplier Number Theoretic Transform design for TFHE-like schemes. Lastly, we use this Number Theoretic Transform design to implement a FINAL hardware accelerator for the AMD Alveo U55c which improves the Throughput metric of TFHE-like cryptosystems on FPGAs by a factor 9.28x over Li et al.’s NFP CHES 2024 accelerator and by 10-25% over the absolute state-of-the-art design FPT [vBDTV23] while using one third of FPTs DSPs.
Coauthors
- Jonas Bertels (2)
- Thomas de Ruijter (1)
- Wouter Legiest (1)
- Hilder V. L. Pereira (1)
- Xander Pottier (1)
- Michiel Van Beirendonck (1)
- Ingrid Verbauwhede (2)