International Association for Cryptologic Research

International Association
for Cryptologic Research

CryptoDB

cuZK: Accelerating Zero-Knowledge Proof with A Faster Parallel Multi-Scalar Multiplication Algorithm on GPUs

Authors:
Tao Lu , Zhejiang University, Hangzhou, China
Chengkun Wei , Zhejiang University, Hangzhou, China
Ruijing Yu , Zhejiang University, Hangzhou, China
Chaochao Chen , Zhejiang University, Hangzhou, China
Wenjing Fang , Ant Group, Hangzhou, China
Lei Wang , Ant Group, Hangzhou, China
Zeke Wang , Zhejiang University, Hangzhou, China
Wenzhi Chen , Zhejiang University, Hangzhou, China
Download:
DOI: 10.46586/tches.v2023.i3.194-220
URL: https://tches.iacr.org/index.php/TCHES/article/view/10961
Search ePrint
Search Google
Abstract: Zero-knowledge proof is a critical cryptographic primitive. Its most practical type, called zero-knowledge Succinct Non-interactive ARgument of Knowledge (zkSNARK), has been deployed in various privacy-preserving applications such as cryptocurrencies and verifiable machine learning. Unfortunately, zkSNARK like Groth16 has a high overhead on its proof generation step, which consists of several time-consuming operations, including large-scale matrix-vector multiplication (MUL), number-theoretic transform (NTT), and multi-scalar multiplication (MSM). Therefore, this paper presents cuZK, an efficient GPU implementation of zkSNARK with the following three techniques to achieve high performance. First, we propose a new parallel MSM algorithm. This MSM algorithm achieves nearly perfect linear speedup over the Pippenger algorithm, a well-known serial MSM algorithm. Second, we parallelize the MUL operation. Along with our self-designed MSM scheme and well-studied NTT scheme, cuZK achieves the parallelization of all operations in the proof generation step. Third, cuZK reduces the latency overhead caused by CPU-GPU data transfer by 1) reducing redundant data transfer and 2) overlapping data transfer and device computation. The evaluation results show that our MSM module provides over 2.08x (up to 2.94x) speedup versus the state-of-the-art GPU implementation. cuZK achieves over 2.65x (up to 4.86x) speedup on standard benchmarks and 2.18× speedup on a GPU-accelerated cryptocurrency application, Filecoin.
BibTeX
@article{tches-2023-33287,
  title={cuZK: Accelerating Zero-Knowledge Proof with A Faster Parallel Multi-Scalar Multiplication Algorithm on GPUs},
  journal={IACR Transactions on Cryptographic Hardware and Embedded Systems},
  publisher={Ruhr-Universität Bochum},
  volume={2023, Issue 3},
  pages={194-220},
  url={https://tches.iacr.org/index.php/TCHES/article/view/10961},
  doi={10.46586/tches.v2023.i3.194-220},
  author={Tao Lu and Chengkun Wei and Ruijing Yu and Chaochao Chen and Wenjing Fang and Lei Wang and Zeke Wang and Wenzhi Chen},
  year=2023
}