IACR News item: 08 October 2025
Jipeng Zhang, Jiaheng Zhang
Falcon, a NTRU-based digital signature algorithm, has been selected by NIST as one of the post-quantum cryptography (PQC) standards. Compared to verification, the signature generation of Falcon is relatively slow. One of the core operations in signature generation is discrete Gaussian sampling, which involves a component known as the BaseSampler. The BaseSampler accounts for up to 30% of the time required for signature generation, making it a significant performance bottleneck. This work aims to address this bottleneck.
We design a vectorized version of the BaseSample and provide optimized implementations across six different instruction sets: SSE2, AVX2, AVX-512F, NEON, RISC-V Vector (RVV), and RV64IM. The AVX2 implementation, for instance, achieves an 8.4× speedup over prior work. Additionally, we optimize the FFT/iFFT operations using RVV and RV64D. For the RVV implementation, we introduce a new method using strided load/store instructions, with 4+4 and 4+5 layer merging strategies for Falcon-512 and Falcon-1024, respectively, resulting in a speedup of more than 4×.
Finally, we present the results of our optimized implementations across eight different instruction sets for signature generation of Falcon. For instance, our AVX2, AVX-512F, and RV64GCVB implementations achieve performance improvements of 23%, 36%, and 59%, respectively, for signature generation of Falcon-512.
We design a vectorized version of the BaseSample and provide optimized implementations across six different instruction sets: SSE2, AVX2, AVX-512F, NEON, RISC-V Vector (RVV), and RV64IM. The AVX2 implementation, for instance, achieves an 8.4× speedup over prior work. Additionally, we optimize the FFT/iFFT operations using RVV and RV64D. For the RVV implementation, we introduce a new method using strided load/store instructions, with 4+4 and 4+5 layer merging strategies for Falcon-512 and Falcon-1024, respectively, resulting in a speedup of more than 4×.
Finally, we present the results of our optimized implementations across eight different instruction sets for signature generation of Falcon. For instance, our AVX2, AVX-512F, and RV64GCVB implementations achieve performance improvements of 23%, 36%, and 59%, respectively, for signature generation of Falcon-512.
Additional news items may be found on the IACR news page.