International Association for Cryptologic Research

International Association
for Cryptologic Research

CryptoDB

Ying Liu

Publications and invited talks

Year
Venue
Title
2025
TCHES
VeloFHE: GPU Acceleration for FHEW and TFHE Bootstrapping
Bit-wise Fully Homomorphic Encryption schemes like FHEW and TFHE offer efficient functional bootstrapping, enabling concurrent function evaluation and noise reduction. While advantageous for secure computations, these schemes suffer from high data expansion, posing significant performance challenges in practical ap- plications due to massive ciphertexts. To address these issues, we propose VeloFHE, a CUDA-accelerated design to enhance the efficiency of FHEW and TFHE schemes on GPUs. We develop a novel hybrid four-step Number Theoretic Transform (NTT) approach for fast polynomial multiplication. By decomposing large-scale NTTs into highly parallelizable submodules, incorporating cyclic and negacyclic convolutions, and introducing several memory-oriented optimizations, we significantly reduce both the computational complexity and memory requirements. For blind rotation, besides the gadget decomposition approach, we also apply a recent proposed modulus raising technique to both schemes to alleviate memory pressure. We further optimize it by refining computational flow to reduce noise from scaling and maintain accumulator compatibility. For key switching, we address input-output parallelism mismatches, and offloading suitable computations to the CPU, effectively hiding latency through asynchronous execution. Additionally, we explore batching in bootstrapping, de- veloping a general framework that accommodates both schemes with either gadget decomposition or modulus raising method.Our experimental results demonstrate significant performance improvements. The proposed NTT implementation shows over 35% improvement compared to recent GPU implementations. On an RTX 4090 GPU, we achieve speedups of 371.86x and 390.44x for FHEW and TFHE gate bootstrapping, respectively, compared to OpenFHE running on a 48-thread CPU at a 128-bit security level. The corresponding throughputs are 7,007 and 11,378 operations per second. Furthermore, relative to the state-of-the-art GPU implementation [XLK+25], our approach provides speedups of 2.56x, 2.24x, and 2.33x for TFHE gate bootstrapping, homomorphic evaluation of arbitrary functions, and homomorphic flooring operation, respectively. Our VeloFHE surpasses some current hardware designs, offering an effective solution for more practical and efficient privacy-preserving computations.
2024
TCHES
Faster NTRU-based Bootstrapping in less than 4 ms
Bootstrapping is a critical technique in constructing fully homomorphic encryption (FHE), which serves to refresh the noise in FHE ciphertexts, enabling an arbitrary number of homomorphic operations. Among published results, the TFHE-rs library [Zam22] offers the fastest bootstrapping implementation on CPU platforms by taking advantage of AVX-512 instructions.In this paper, we improve the efficiency of the bootstrapping algorithm based on the NTRU problem. First, we introduce the approximate gadget decomposition method tailored for NTRU ciphertext, reducing the number of NTT operations required for external products. Second, by integrating the approximate decomposition and key unrolling techniques, we improve the performance of CMux-based blind rotation. Third, for the automorphism-based blind rotation method, we present a hybrid window size technique that reduces the number of automorphisms by 34% compared to recent work [XZD+23](in Crypto23).Subsequently, we implement the proposed bootstrapping algorithm on the CPU platform with AVX instructions. Experimental results demonstrate that our method only takes 3.8ms, which achieves a 1.8× speedup compared to the TFHE-rs library. Finally, we propose an efficient FPGA accelerator based on the CMux method, which not only achieves the best performance but also exhibits high throughput advantages. Our accelerator can improve performance by 2x compared to state-of-the-art FPGA implementations (e.g., FPT).
2023
TCHES
Faster Bootstrapping via Modulus Raising and Composite NTT
FHEW-like schemes utilize exact gadget decomposition to reduce error growth and ensure that the bootstrapping incurs only polynomial error growth. However, the exact gadget decomposition method requires higher computation complexity and larger memory storage. In this paper, we improve the efficiency of the FHEWlike schemes by utilizing the composite NTT that performs the Number Theoretic Transform (NTT) with a composite modulus. Specifically, based on the composite NTT, we integrate modulus raising and gadget decomposition in the external product, which reduces the number of NTTs required in the blind rotation from 2(dg + 1)n to 2(⌈dg⌉/2 + 1)n. Furthermore, we develop a modulus packing technique that uses the Chinese Remainder Theorem (CRT) and composite NTT to bootstrap multiple LWE ciphertexts through one blind rotation process.We implement the bootstrapping algorithms and evaluate the performance on various benchmark computations using both binary and ternary secret keys. Our results of the single bootstrapping process indicate that the proposed approach achieves speedups of up to 1.7 x, and reduces the size of the blind rotation key by 50% under specific parameters. Finally, we instantiate two ciphertexts in the packing procedure, and the experimental results show that our technique is around 1.5 x faster than the two bootstrapping processes under the 127-bit security level.