CFNTT: Scalable Radix-2/4 NTT Multiplication Architecture with an Efficient Conflict-free Memory Mapping Scheme
Number theoretic transform (NTT) is widely utilized to speed up polynomial multiplication, which is the critical computation bottleneck in a lot of cryptographic algorithms like lattice-based post-quantum cryptography (PQC) and homomorphic encryption (HE). One of the tendency for NTT hardware architecture is to support diverse security parameters and meet resource constraints on different computing platforms. Thus flexibility and Area-Time Product (ATP) become two crucial metrics in NTT hardware design. The flexibility of NTT in terms of different vector sizes and moduli can be obtained directly. Whereas the varying strides in memory access of in-place NTT render the design for different radix and number of parallel butterfly units a tough problem. This paper proposes an efficient conflict-free memory mapping scheme that supports the configuration for both multiple parallel butterfly units and arbitrary radix of NTT. Compared to other approaches, this scheme owns broader applicability and facilitates the parallelization of non-radix-2 NTT hardware design. Based on this scheme, we propose a scalable radix-2 and radix-4 NTT multiplication architecture by algorithm-hardware co-design. A dedicated schedule method is leveraged to reduce the number of modular additions/subtractions and modular multiplications in radix-4 butterfly unit by 20% and 33%, respectively. To avoid the bit-reversed cost and save memory footprint in arbitrary radix NTT/INTT, we put forward a general method by rearranging the loop structure and reusing the twiddle factors. The hardware-level optimization is achieved by excavating the symmetric operators in radix-4 butterfly unit, which saves almost 50% hardware resources compared to a straightforward implementation. Through experimental results and theoretical analysis, we point out that the radix-4 NTT with the same number of parallel butterfly units outperforms the radix-2 NTT in terms of area-time performance in the interleaved memory system. This advantage is enlarged when increasing the number of parallel butterfly units. For example, when processing 1024 14-bit points NTT with 8 parallel butterfly units, the ATP of LUT/FF/DSP/BRAM in radix-4 NTT core is approximately 2.2 × /1.2 × /1.1 × /1.9× less than that of the radix-2 NTT core on a similar FPGA platform.
Lattice-Based Group Encryption with Full Dynamicity and Message Filtering Policy 📺
Group encryption (GE) is a fundamental privacy-preserving primitive analog of group signatures, which allows users to decrypt speciﬁc ciphertexts while hiding themselves within a crowd. Since its ﬁrst birth, numerous constructions have been proposed, among which the schemes separately constructed by Libert et al. (Asiacrypt 2016) over lattices and by Nguyen et al. (PKC 2021) over coding theory are postquantum secure. Though the last scheme, at the ﬁrst time, achieved the full dynamicity (allowing group users to join or leave the group in their ease) and message ﬁltering policy, which greatly improved the state-of-aﬀairs of GE systems, its practical applications are still limited due to the rather complicated design, ineﬃciency and the weaker security (secure in the random oracles). In return, the Libert et al.’s scheme possesses a solid security (secure in the standard model), but it lacks the previous functions and still suﬀers from ineﬃciency because of extremely using lattice trapdoors. In this work, we re-formalize the model and security deﬁnitions of fully dynamic group encryption (FDGE) that are essentially equivalent to but more succinct than Nguyen et al.’s; Then, we provide a generic and eﬃcient zero-knowledge proof method for proving that a binary vector is non-zero over lattices, on which a proof for the Prohibitive message ﬁltering policy in the lattice setting is ﬁrst achieved (yet in a simple manner); Finally, by combining appropriate cryptographic materials and our presented zero-knowledge proofs, we achieve the ﬁrst latticebased FDGE schemes in a simpler manner, which needs no any lattice trapdoor and is proved secure in the standard model (assuming interaction during the proof phase), outweighing the existing post-quantum secure GE systems in terms of functions, eﬃciency and security.