International Association for Cryptologic Research

International Association
for Cryptologic Research


Bernhard Jungk

Affiliation: Hochschule RheinMain


Parameterized Hardware Accelerators for Lattice-Based Cryptography and Their Application to the HW/SW Co-Design of qTESLA 📺
This paper presents a set of efficient and parameterized hardware accelerators that target post-quantum lattice-based cryptographic schemes, including a versatile cSHAKE core, a binary-search CDT-based Gaussian sampler, and a pipelined NTT-based polynomial multiplier, among others. Unlike much of prior work, the accelerators are fully open-sourced, are designed to be constant-time, and can be parameterized at compile-time to support different parameters without the need for re-writing the hardware implementation. These flexible, publicly-available accelerators are leveraged to demonstrate the first hardware-software co-design using RISC-V of the post-quantum lattice-based signature scheme qTESLA with provably secure parameters. In particular, this work demonstrates that the NIST’s Round 2 level 1 and level 3 qTESLA variants achieve over a 40-100x speedup for key generation, about a 10x speedup for signing, and about a 16x speedup for verification, compared to the baseline RISC-V software-only implementation. For instance, this corresponds to execution in 7.7, 34.4, and 7.8 milliseconds for key generation, signing, and verification, respectively, for qTESLA’s level 1 parameter set on an Artix-7 FPGA, demonstrating the feasibility of the scheme for embedded applications.
Efficient Side-Channel Protections of ARX Ciphers
The current state of the art of Boolean masking for the modular addition operation in software has a very high performance overhead. Firstly, the instruction count is very high compared to a normal addition operation. Secondly, until recently, the entropy consumed by such protections was also quite high. Our paper significantly improves both aspects, by applying the Threshold Implementation (TI) methodology with two shares and by reusing internal values as randomness source in such a way that the uniformity is always preserved. Our approach performs considerably faster compared to the previously known masked addition and subtraction algorithms by Coron et al. and Biryukov et al. improving the state of the art by 36%, if we only consider the number of ARM assembly instructions. Furthermore, similar to the masked adder from Biryukov et al. we reduce the amount of randomness and only require one bit additional entroy per addition, which is a good trade-off for the improved performance. We applied our improved masked adder to ChaCha20, for which we provide two new first-order protected implementations and achieve a 36% improvement over the best published result for ChaCha20 using an ARM Cortex-M4 microprocessor.
On FPGA-based implementations of Gr\{o}stl
Bernhard Jungk Steffen Reith
The National Institute of Standards and Technology (NIST) has started a competition for a new secure hash standard. To make a significant comparison between the submitted candidates, third party implementations of all proposed hash functions are needed. This is one of the reasons why the SHA-3 candidate Gr\{o}stl has been chosen for a FPGA-based implementation. Mainly our work is motivated by actual and future developments of the automotive market (e.g. car-2-car communication systems), which will increase the necessity for a suitable cryptographic infrastructure in modern vehicles (cf. AUTOSAR project) even further. One core component of such an infrastructure is a secure cryptographic hash function, which is used for a lot of applications like challenge-response authentication systems or digital signature schemes. Another motivation to evaluate Gr\{o}stl is its resemblance to AES. The automotive market demands, like any mass market, low budget and therefore compact implementations, hence our evaluation of Gr\{o}stl focuses on area optimizations. It is shown that, while Gr\{o}stl is inherently quite large compared to AES, it is still possible to implement the Gr\{o}stl algorithm on small and low budget FPGAs like the second smallest available Spartan-3, while maintaining a reasonable high throughput.