International Association for Cryptologic Research

International Association
for Cryptologic Research


Khin Mi Mi Aung


Field Instruction Multiple Data 📺
Fully homomorphic encryption~(FHE) has flourished since it was first constructed by Gentry~(STOC 2009). Single instruction multiple data~(SIMD) gave rise to efficient homomorphic operations on vectors in \((\mathbb{F}_{t^d})^\ell\), for prime \(t\). RLWE instantiated with cyclotomic polynomials of the form \(X^{2^N}+1\) dominate implementations of FHE due to highly efficient fast fourier transformations. However, this choice yields very short SIMD plaintext vectors and high degree extension fields, e.g. \(\ell < 100, d > 100\) for small primes~(\(t = 3, 5, \dots\)). In this work, we describe a method to encode more data on top of SIMD, \emph{Field Instruction Multiple Data}, applying reverse multiplication friendly embedding~(RMFE) to FHE. With RMFE, length-\(k\) \(\mathbb{F}_{t}\) vectors can be encoded into \(\mathbb{F}_{t^d}\) and multiplied once. The results have to be recoded~(decoded and then re-encoded) before further multiplications can be done. We introduce an FHE-specific technique to additionally evaluate arbitrary linear transformations on encoded vectors for free during the FHE recode operation. On top of that, we present two optimizations to unlock high degree extension fields with small \(t\) for homomorphic computation: \(r\)-fold RMFE, which allows products of up to \(2^r\) encoded vectors before recoding, and a three-stage recode process for RMFEs obtained by composing two smaller RMFEs. Experiments were performed to evaluate the effectiveness of FIMD from various RMFEs compared to standard SIMD operations. Overall, we found that FIMD generally had \(>2\times\) better (amortized) multiplication times compared to FHE for the same amount of data, while using almost \(k/2 \times\) fewer ciphertexts required.
High-Performance FV Somewhat Homomorphic Encryption on GPUs: An Implementation using CUDA 📺
Homomorphic encryption (HE) offers great capabilities that can solve a wide range of privacy-preserving computing problems. This tool allows anyone to process encrypted data producing encrypted results that only the decryption key’s owner can decrypt. Although HE has been realized in several public implementations, its performance is quite demanding. The reason for this is attributed to the huge amount of computation required by secure HE schemes. In this work, we present a CUDAbased implementation of the Fan and Vercauteren (FV) Somewhat HomomorphicEncryption (SHE) scheme. We demonstrate several algebraic tools such as the Chinese Remainder Theorem (CRT), Residual Number System (RNS) and Discrete Galois Transform (DGT) to accelerate and facilitate FV computation on GPUs. We also show how the entire FV computation can be done on GPU without multi-precision arithmetic. We compare our GPU implementation with two mature state-of-the-art implementations: 1) Microsoft SEAL v2.3.0-4 and 2) NFLlib-FV. Our implementation outperforms them and achieves on average 5.37x, 7.37x, 22.22x, 5.11x and 13.18x (resp. 2.03x, 2.94x, 27.86x, 8.53x and 18.69x) for key generation, encryption, decryption, homomorphic addition and homomorphic multiplication against SEAL-FVRNS (resp. NFLlib-FV).