Yu Dai

CryptoDB

Yu Dai

Publications and invited talks

Year

Venue

Title

2024

TCHES

Load-Balanced Parallel Implementation on GPUs for Multi-Scalar Multiplication Algorithm Abstract

Yutian Chen Cong Peng Yu Dai Min Luo Debiao He

Multi-scalar multiplication (MSM) is an important building block in most of elliptic-curve-based zero-knowledge proof systems, such as Groth16 and PLONK. Recently, Lu et al. proposed cuZK, a new parallel MSM algorithm on GPUs. In this paper, we revisit this scheme and present a new GPU-based implementation to further improve the performance of MSM algorithm. First, we propose a novel method for mapping scalars into Pippenger’s bucket indices, largely reducing the number of buckets compared to the original Pippenger algorithm. Second, in the case that memory is sufficient, we develop a new efficient algorithm based on homogeneous coordinates in the bucket accumulation phase. Moreover, our accumulation phase is load-balanced, which means the parallel speedup ratio is almost linear growth as the number of device threads increases. Finally, we also propose a parallel layered reduction algorithm for the bucket aggregation phase, whose time complexity remains at the logarithmic level of the number of buckets. The implementation results over the BLS12-381 curve on the V100 graphics card show that our proposed algorithm achieves up to 1.998x, 1.821x and 1.818x speedup compared to cuZK at scales of 221, 222, and 223, respectively.

2024

ASIACRYPT

Revisiting Pairing-Friendly Curves with Embedding Degrees 10 and 14 Abstract

Yu Dai Debiao He Cong Peng Zhijian Yang Chang-an Zhao

Since 2015, there has been a significant decrease in the asymptotic complexity of computing discrete logarithms in finite fields. As a result, the key sizes of many mainstream pairing-friendly curves have to be updated to maintain the desired security level. In PKC'20, Guillevic conducted a comprehensive assessment of the security of a series of pairing-friendly curves with embedding degrees ranging from $9$ to $17$. In this paper, we focus on five pairing-friendly curves with embedding degrees 10 and 14 at the 128-bit security level, with BW14-351 emerging as the most competitive candidate. First, we extend the optimized formula for the optimal pairing on BW13-310, a 128-bit secure curve with a prime $p$ in 310 bits and embedding degree $13$, to our target curves. This generalization allows us to compute the optimal pairing in approximately $\log r/(2\varphi(k))$ Miller iterations, where $r$ and $k$ are the order of pairing groups and the embedding degree respectively. Second, we develop optimized algorithms for cofactor multiplication for $\G_1$ and $\G_2$, as well as subgroup membership testing for $\G_2$ on these curves. Finally, we provide detailed performance comparisons between BW14-351 and other popular curves on a 64-bit platform in terms of pairing computation, hashing to $\G_1$ and $\G_2$, group exponentiations, and subgroup membership testings. Our results demonstrate that BW14-351 is a strong candidate for building pairing-based cryptographic protocols.

2023

TCHES

Don’t Forget Pairing-Friendly Curves with Odd Prime Embedding Degrees Abstract

Yu Dai Fangguo Zhang Chang-an Zhao

Pairing-friendly curves with odd prime embedding degrees at the 128-bit security level, such as BW13-310 and BW19-286, sparked interest in the field of public-key cryptography as small sizes of the prime fields. However, compared to mainstream pairing-friendly curves at the same security level, i.e., BN446 and BLS12-446, the performance of pairing computations on BW13-310 and BW19-286 is usually considered inefficient. In this paper we investigate high performance software implementations of pairing computation on BW13-310 and corresponding building blocks used in pairing-based protocols, including hashing, group exponentiations and membership testings. Firstly, we propose efficient explicit formulas for pairing computation on this curve. Moreover, we also exploit the state-of-art techniques to implement hashing in G1 and G2, group exponentiations and membership testings. In particular, for exponentiations in G2 and GT , we present new optimizations to speed up computational efficiency. Our implementation results on a 64-bit processor show that the gap in the performance of pairing computation between BW13-310 and BN446 (resp. BLS12-446) is only up to 4.9% (resp. 26%). More importantly, compared to BN446 and BLS12-446, BW13-310 is about 109.1% − 227.3%, 100% − 192.6%, 24.5%−108.5% and 68.2%−145.5% faster in terms of hashing to G1, exponentiations in G1 and GT , and membership testing for GT , respectively. These results reveal that BW13-310 would be an interesting candidate in pairing-based cryptographic protocols.