CryptoDB
Yunlei Zhao
Publications and invited talks
Year
Venue
Title
2025
TCHES
VeloFHE: GPU Acceleration for FHEW and TFHE Bootstrapping
Abstract
Bit-wise Fully Homomorphic Encryption schemes like FHEW and TFHE offer efficient functional bootstrapping, enabling concurrent function evaluation and noise reduction. While advantageous for secure computations, these schemes suffer from high data expansion, posing significant performance challenges in practical ap- plications due to massive ciphertexts. To address these issues, we propose VeloFHE, a CUDA-accelerated design to enhance the efficiency of FHEW and TFHE schemes on GPUs. We develop a novel hybrid four-step Number Theoretic Transform (NTT) approach for fast polynomial multiplication. By decomposing large-scale NTTs into highly parallelizable submodules, incorporating cyclic and negacyclic convolutions, and introducing several memory-oriented optimizations, we significantly reduce both the computational complexity and memory requirements. For blind rotation, besides the gadget decomposition approach, we also apply a recent proposed modulus raising technique to both schemes to alleviate memory pressure. We further optimize it by refining computational flow to reduce noise from scaling and maintain accumulator compatibility. For key switching, we address input-output parallelism mismatches, and offloading suitable computations to the CPU, effectively hiding latency through asynchronous execution. Additionally, we explore batching in bootstrapping, de- veloping a general framework that accommodates both schemes with either gadget decomposition or modulus raising method.Our experimental results demonstrate significant performance improvements. The proposed NTT implementation shows over 35% improvement compared to recent GPU implementations. On an RTX 4090 GPU, we achieve speedups of 371.86x and 390.44x for FHEW and TFHE gate bootstrapping, respectively, compared to OpenFHE running on a 48-thread CPU at a 128-bit security level. The corresponding throughputs are 7,007 and 11,378 operations per second. Furthermore, relative to the state-of-the-art GPU implementation [XLK+25], our approach provides speedups of 2.56x, 2.24x, and 2.33x for TFHE gate bootstrapping, homomorphic evaluation of arbitrary functions, and homomorphic flooring operation, respectively. Our VeloFHE surpasses some current hardware designs, offering an effective solution for more practical and efficient privacy-preserving computations.
2024
ASIACRYPT
CPA-secure KEMs are also sufficient for Post-Quantum TLS 1.3
Abstract
In the post-quantum migration of TLS 1.3, an ephemeral Diffie-Hellman must be replaced with a post-quantum key encapsulation mechanism (KEM). At EUROCRYPT 2022, Huguenin-Dumittan and Vaudenay [HV22] demonstrated that KEMs with standard CPA security are sufficient for the security of the TLS1.3 handshake. However, their result is only proven in the random oracle model (ROM), and as the authors comment, their reduction is very much non-tight and not sufficient to
guarantee security in practice due to the $O(q^6)$-loss, where $q$ is the number of adversary’s queries to random oracles. Moreover, in order to analyze the post-quantum security of TLS 1.3 handshake with a KEM, it is necessary to consider the security in the quantum ROM (QROM). Therefore, they leave the tightness improvement of their ROM proof and the QROM proof of such a result as an interesting open question.
In this paper, we resolve this problem. We improve the ROM proof in [HV22] from an $O(q^6)$-loss to an $O(q)$-loss with standard CPA-secure KEMs which can be directly obtained from the underlying public-key encryption (PKE) scheme in CRYSTALS-Kyber. Moreover, we show that if the KEMs are constructed from rigid deterministic public-key encryption (PKE) schemes such as the ones in Classic McEliece and NTRU, this $O(q)$-loss can be further improved to an $O(1)$-loss. Hence, our reductions are sufficient to guarantee security in practice. According to our results, a CPA-secure KEM (which is more concise and efficient than the currently used CCA/1CCA-secure KEM) can be directly employed to construct a post-quantum TLS 1.3. Furthermore, we lift our ROM result into QROM and first prove that the CPA-secure KEMs are also sufficient for the post-quantum TLS 1.3 handshake. In particular, the techniques introduced to improve reduction tightness in this paper may be of independent interest.
Coauthors
- Ray C. C. Cheung (1)
- Wangchen Dai (1)
- Xiaotie Deng (1)
- Robert H. Deng (1)
- Goichiro Hanaoka (1)
- Haodong Jiang (1)
- Junzuo Lai (1)
- Chan H. Lee (1)
- Shengli Liu (1)
- Zhe Liu (1)
- Ying Liu (1)
- Xianhui Lu (1)
- Shiyu Shen (1)
- Jian Weng (2)
- Hao Yang (1)
- Andrew Chi-Chih Yao (1)
- Moti Yung (3)
- Yunlei Zhao (8)
- Lu Zhou (1)
- Biming Zhou (1)
- Hong Zhu (1)