International Association for Cryptologic Research

International Association
for Cryptologic Research


Ingrid Verbauwhede


Carry Your Fault: A Fault Propagation Attack on Side-Channel Protected LWE-based KEM
Post-quantum cryptographic (PQC) algorithms, especially those based on the learning with errors (LWE) problem, have been subjected to several physical attacks in the recent past. Although the attacks broadly belong to two classes – passive side-channel attacks and active fault attacks, the attack strategies vary significantly due to the inherent complexities of such algorithms. Exploring further attack surfaces is, therefore, an important step for eventually securing the deployment of these algorithms. Also, it is mportant to test the robustness of the already proposed countermeasures in this regard. In this work, we propose a new fault attack on side-channel secure masked implementation of LWE-based key-encapsulation mechanisms (KEMs) exploiting fault propagation. The attack typically originates due to an algorithmic modification widely used to enable masking, namely the Arithmetic-to-Boolean (A2B) conversion. We exploit the data dependency of the adder carry chain in A2B and extract sensitive information, albeit masking (of arbitrary order) being present. As a practical demonstration of the exploitability of this information leakage, we show key recovery attacks of Kyber, although the leakage also exists for other schemes like Saber. The attack on Kyber targets the decapsulation module and utilizes Belief Propagation (BP) for key recovery. To the best of our knowledge, it is the first attack exploiting an algorithmic component introduced to ease masking rather than only exploiting the randomness introduced by masking to obtain desired faults (as done by Delvaux [Del22]). Finally, we performed both simulated and electromagnetic (EM) fault-based practical validation of the attack for an open-source first-order secure Kyber implementation running on an STM32 platform.
BASALISC: Programmable Hardware Accelerator for BGV Fully Homomorphic Encryption
Fully Homomorphic Encryption (FHE) allows for secure computation on encrypted data. Unfortunately, huge memory size, computational cost and bandwidth requirements limit its practicality. We present BASALISC, an architecture family of hardware accelerators that aims to substantially accelerate FHE computations in the cloud. BASALISC is the first to implement the BGV scheme with fully-packed bootstrapping – the noise removal capability necessary for arbitrary-depth computation. It supports a customized version of bootstrapping that can be instantiated with hardware multipliers optimized for area and power.BASALISC is a three-abstraction-layer RISC architecture, designed for a 1 GHz ASIC implementation and underway toward 150mm2 die tape-out in a 12nm GF process. BASALISC’s four-layer memory hierarchy includes a two-dimensional conflict-free inner memory layer that enables 32 Tb/s radix-256 NTT computations without pipeline stalls. Its conflict-resolution permutation hardware is generalized and re-used to compute BGV automorphisms without throughput penalty. BASALISC also has a custom multiply-accumulate unit to accelerate BGV key switching.The BASALISC toolchain comprises a custom compiler and a joint performance and correctness simulator. To evaluate BASALISC, we study its physical realizability, emulate and formally verify its core functional units, and we study its performance on a set of benchmarks. Simulation results show a speedup of more than 5,000× over HElib – a popular software FHE library.
Semi-Automatic Locating of Cryptographic Operations in Side-Channel Traces
Locating a cryptographic operation in a side-channel trace, i.e. finding out where it is in the time domain, without having a template, can be a tedious task even for unprotected implementations. The sheer amount of data can be overwhelming. In a simple call to OpenSSL for AES-128 ECB encryption of a single data block, only 0.00028% of the trace relate to the actual AES-128 encryption. The rest is overhead. We introduce the (to our best knowledge) first method to locate a cryptographic operation in a side-channel trace in a largely automated fashion. The method exploits meta information about the cryptographic operation and requires an estimate of its implementation’s execution time.The method lends itself to parallelization and our implementation in a tool greatly benefits from GPU acceleration. The tool can be used offline for trace segmentation and for generating a template which can then be used online in real-time waveformmatching based triggering systems for trace acquisition or fault injection. We evaluate it in six scenarios involving hardware and software implementations of different cryptographic operations executed on diverse platforms. Two of these scenarios cover realistic protocol level use-cases and demonstrate the real-world applicability of our tool in scenarios where classical leakage-detection techniques would not work. The results highlight the usefulness of the tool because it reliably and efficiently automates the task and therefore frees up time of the analyst.The method does not work on traces of implementations protected by effective time randomization countermeasures, e.g. random delays and unstable clock frequency, but is not affected by masking, shuffling and similar countermeasures.
Masked Accelerators and Instruction Set Extensions for Post-Quantum Cryptography
Side-channel attacks can break mathematically secure cryptographic systems leading to a major concern in applied cryptography. While the cryptanalysis and security evaluation of Post-Quantum Cryptography (PQC) have already received an increasing research effort, a cost analysis of efficient side-channel countermeasures is still lacking. In this work, we propose a masked HW/SW codesign of the NIST PQC finalists Kyber and Saber, suitable for their different characteristics. Among others, we present a novel masked ciphertext compression algorithm for non-power-of-two moduli. To accelerate linear performance bottlenecks, we developed a generic Number Theoretic Transform (NTT) multiplier, which, in contrast to previously published accelerators, is also efficient and suitable for schemes not based on NTT. For the critical non-linear operations, masked HW accelerators were developed, allowing a secure execution using RISC-V instruction set extensions. With the proposed design, we achieved a cycle count of K:214k/E:298k/D:313k for Kyber and K:233k/E:312k/D:351k for Saber with NIST Level III parameter sets. For the same parameter sets, the masking overhead for the first-order secure decapsulation operation including randomness generation is a factor of 4.48 for Kyber (D:1403k)and 2.60 for Saber (D:915k).
Polynomial multiplication on embedded vector architectures
High-degree, low-precision polynomial arithmetic is a fundamental computational primitive underlying structured lattice based cryptography. Its algorithmic properties and suitability for implementation on different compute platforms is an active area of research, and this article contributes to this line of work: Firstly, we present memory-efficiency and performance improvements for the Toom-Cook/Karatsuba polynomial multiplication strategy. Secondly, we provide implementations of those improvements on Arm® Cortex®-M4 CPU, as well as the newer Cortex-M55 processor, the first M-profile core implementing the M-profile Vector Extension (MVE), also known as Arm® Helium™ technology. We also implement the Number Theoretic Transform (NTT) on the Cortex-M55 processor. We show that despite being singleissue, in-order and offering only 8 vector registers compared to 32 on A-profile SIMD architectures like Arm® Neon™ technology and the Scalable Vector Extension (SVE), by careful register management and instruction scheduling, we can obtain a 3× to 5× performance improvement over already highly optimized implementations on Cortex-M4, while maintaining a low area and energy profile necessary for use in embedded market. Finally, as a real-world application we integrate our multiplication techniques to post-quantum key-encapsulation mechanism Saber
Higher-Order Masked Ciphertext Comparison for Lattice-Based Cryptography
Checking the equality of two arrays is a crucial building block of the Fujisaki-Okamoto transformation, and as such it is used in several post-quantum key encapsulation mechanisms including Kyber and Saber. While this comparison operation is easy to perform in a black box setting, it is hard to efficiently protect against side-channel attacks. For instance, the hash-based method by Oder et al. is limited to first-order masking, a higher-order method by Bache et al. was shown to be flawed, and a very recent higher-order technique by Bos et al. suffers in runtime. In this paper, we first demonstrate that the hash-based approach, and likely many similar first-order techniques, succumb to a relatively simple side-channel collision attack. We can successfully recover a Kyber512 key using just 6000 traces. While this does not break the security claims, it does show the need for efficient higher-order methods. We then present a new higher-order masked comparison algorithm based on the (insecure) higher-order method of Bache et al. Our new method is 4.2x, resp. 7.5x, faster than the method of Bos et al. for a 2nd, resp. 3rd, -order masking on the ARM Cortex-M4, and unlike the method of Bache et al., the new technique takes ciphertext compression into account. We prove correctness, security, and masking security in detail and provide performance numbers for 2nd and 3rd-order implementations. Finally, we verify our the side-channel security of our implementation using the test vector leakage assessment (TVLA) methodology.
An energy and area efficient, all digital entropy source compatible with modern standards based on jitter pipelining
Adriaan Peetermans Ingrid Verbauwhede
This paper proposes an energy and area efficient entropy source, suitable for true random number generation, accompanied with a stochastic model in a 28nm CMOS technology. The design uses a jitter pipelining architecture together with an increased timing resolution to achieve a maximal throughput of 298 Mbit/s and a best energy efficiency of 1.46 pJ/bit at a supply of 0.8V. The generated random bits pass the NIST SP 800-90B IID tests with a min entropy rate of 0.933 bit/bit, which is more than required by the AIS-31 standard. The all digital design allows for effortless transfer to other technology nodes, taking advantage of all benefits related to further technology scaling.
Analysis and Comparison of Table-based Arithmetic to Boolean Masking 📺
Masking is a popular technique to protect cryptographic implementations against side-channel attacks and comes in several variants including Boolean and arithmetic masking. Some masked implementations require conversion between these two variants, which is increasingly the case for masking of post-quantum encryption and signature schemes. One way to perform Arithmetic to Boolean (A2B) mask conversion is a table-based approach first introduced by Coron and Tchulkine, and later corrected and adapted by Debraize in CHES 2012. In this work, we show both analytically and experimentally that the table-based A2B conversion algorithm proposed by Debraize does not achieve the claimed resistance against differential power analysis due to a non-uniform masking of an intermediate variable. This non-uniformity is hard to find analytically but leads to clear leakage in experimental validation. To address the non-uniform masking issue, we propose two new A2B conversions: one that maintains efficiency at the cost of additional memory and one that trades efficiency for a reduced memory footprint. We give analytical and experimental evidence for their security, and will make their implementations, which are shown to be free from side-channel leakage in 100.000 power traces collected on the ARM Cortex-M4, available online. We conclude that when designing side-channel protection mechanisms, it is of paramount importance to perform both a theoretical analysis and an experimental validation of the method.
Scabbard: a suite of efficient learning with rounding key-encapsulation mechanisms 📺
In this paper, we introduce Scabbard, a suite of post-quantum keyencapsulation mechanisms. Our suite contains three different schemes Florete, Espada, and Sable based on the hardness of module- or ring-learning with rounding problem. In this work, we first show how the latest advancements on lattice-based cryptographycan be utilized to create new better schemes and even improve the state-of-the-art on post-quantum cryptography. We put particular focus on designing schemes that can optimally exploit the parallelism offered by certain hardware platforms and are also suitable for resource constrained devices. We show that this can be achieved without compromising the security of the schemes or penalizing their performance on other platforms.To substantiate our claims, we provide optimized implementations of our three new schemes on a wide range of platforms including general-purpose Intel processors using both portable C and vectorized instructions, embedded platforms such as Cortex-M4 microcontrollers, and hardware platforms such as FPGAs. We show that on each platform, our schemes can outperform the state-of-the-art in speed, memory footprint, or area requirements.
Time-memory trade-off in Toom-Cook multiplication: an application to module-lattice based cryptography 📺
Since the introduction of the ring-learning with errors problem, the number theoretic transform (NTT) based polynomial multiplication algorithm has been studied extensively. Due to its faster quasilinear time complexity, it has been the preferred choice of cryptographers to realize ring-learning with errors cryptographic schemes. Compared to NTT, Toom-Cook or Karatsuba based polynomial multiplication algorithms, though being known for a long time, still have a fledgling presence in the context of post-quantum cryptography.In this work, we observe that the pre- and post-processing steps in Toom-Cook based multiplications can be expressed as linear transformations. Based on this observation we propose two novel techniques that can increase the efficiency of Toom-Cook based polynomial multiplications. Evaluation is reduced by a factor of 2, and we call this method precomputation, and interpolation is reduced from quadratic to linear, and we call this method lazy interpolation.As a practical application, we applied our algorithms to the Saber post-quantum key-encapsulation mechanism. We discuss in detail the various implementation aspects of applying our algorithms to Saber. We show that our algorithm can improve the efficiency of the computationally costly matrix-vector multiplication by 12−37% compared to previous methods on their respective platforms. Secondly, we propose different methods to reduce the memory footprint of Saber for Cortex-M4 microcontrollers. Our implementation shows between 2.6 and 5.7 KB reduction in the memory usage with respect to the smallest implementation in the literature.
Decryption Failure Attacks on IND-CCA Secure Lattice-Based Schemes
In this paper we investigate the impact of decryption failures on the chosen-ciphertext security of lattice-based primitives. We discuss a generic framework for secret key recovery based on decryption failures and present an attack on the NIST Post-Quantum Proposal ss-ntru-pke. Our framework is split in three parts: First, we use a technique to increase the failure rate of lattice-based schemes called failure boosting. Based on this technique we investigate the minimal effort for an adversary to obtain a failure in three cases: when he has access to a quantum computer, when he mounts a multi-target attack or when he can only perform a limited number of oracle queries. Secondly, we examine the amount of information that an adversary can derive from failing ciphertexts. Finally, these techniques are combined in an overall analysis of the security of lattice based schemes under a decryption failure attack. We show that an attacker could significantly reduce the security of lattice based schemes that have a relatively high failure rate. However, for most of the NIST Post-Quantum Proposals, the number of required oracle queries is above practical limits. Furthermore, a new generic weak-key (multi-target) model on lattice-based schemes, which can be viewed as a variant of the previous framework, is proposed. This model further takes into consideration the weak-key phenomenon that a small fraction of keys can have much larger decoding error probability for ciphertexts with certain key-related properties. We apply this model and present an attack in detail on the NIST Post-Quantum Proposal – ss-ntru-pke – with complexity below the claimed security level.
A Cautionary Note When Looking for a Truly Reconfigurable Resistive RAM PUF 📺
The reconfigurable physically unclonable function (PUF) is an advanced security hardware primitive, suitable for applications requiring key renewal or similar refresh functions. The Oxygen vacancies-based resistive RAM (RRAM), has been claimed to be a physically reconfigurable PUF due to its intrinsic switching variability. This paper first analyzes and compares various previously published RRAM-based PUFs with a physics-based RRAM model. We next discuss their possible reconfigurability assuming an ideal configuration-to-configuration behavior. The RRAM-to-RRAM variability, which mainly originates from a variable number of unremovable vacancies inside the RRAM filament, however, has been observed to have significant impact on the reconfigurability. We show by quantitative analysis on the clear uniqueness degradation from the ideal situation in all the discussed implementations. Thus we conclude that true reconfigurability with RRAM PUFs might be unachievable due to this physical phenomena.
Saber on ARM CCA-secure module lattice-based key encapsulation on ARM
The CCA-secure lattice-based post-quantum key encapsulation scheme Saber is a candidate in the NIST’s post-quantum cryptography standardization process. In this paper, we study the implementation aspects of Saber in resourceconstrained microcontrollers from the ARM Cortex-M series which are very popular for realizing IoT applications. In this work, we carefully optimize various parts of Saber for speed and memory. We exploit digital signal processing instructions and efficient memory access for a fast implementation of polynomial multiplication. We also use memory efficient Karatsuba and just-in-time strategy for generating the public matrix of the module lattice to reduce the memory footprint. We also show that our optimizations can be combined with each other seamlessly to provide various speed-memory trade-offs. Our speed optimized software takes just 1,147K, 1,444K, and 1,543K clock cycles on a Cortex-M4 platform for key generation, encapsulation and decapsulation respectively. Our memory efficient software takes 4,786K, 6,328K, and 7,509K clock cycles on an ultra resource-constrained Cortex-M0 platform for key generation, encapsulation, and decapsulation respectively while consuming only 6.2 KB of memory at most. These results show that lattice-based key encapsulation schemes are perfectly practical for securing IoT devices from quantum computing attacks.
ES-TRNG: A High-throughput, Low-area True Random Number Generator based on Edge Sampling
In this paper we present a novel true random number generator based on high-precision edge sampling. We use two novel techniques to increase the throughput and reduce the area of the proposed randomness source: variable-precision phase encoding and repetitive sampling. The first technique consists of encoding the oscillator phase with high precision in the regions around the signal edges and with low precision everywhere else. This technique results in a compact implementation at the expense of reduced entropy in some samples. The second technique consists of repeating the sampling at high frequency until the phase region encoded with high precision is captured. This technique ensures that only the high-entropy bits are sent to the output. The combination of the two proposed techniques results in a secure TRNG, which suits both ASIC and FPGA implementations. The core part of the proposed generator is implemented with 10 look-up tables (LUTs) and 5 flip-flops (FFs) of a Xilinx Spartan-6 FPGA, and achieves a throughput of 1.15 Mbps with 0.997 bits of Shannon entropy. On Intel Cyclone V FPGAs, this implementation uses 10 LUTs and 6 FFs, and achieves a throughput of 1.07 Mbps. This TRNG design is supported by a stochastic model and a formal security evaluation.
Fast Leakage Assessment
Oscar Reparaz Benedikt Gierlichs Ingrid Verbauwhede
We describe a fast technique for performing the computationally heavy part of leakage assessment, in any statistical moment (or other property) of the leakage samples distributions. The proposed technique outperforms by orders of magnitude the approach presented at CHES 2015 by Schneider and Moradi. We can carry out evaluations that before took 90 CPU-days in 4 CPU-hours (about a 500-fold speed-up). As a bonus, we can work with exact arithmetic, we can apply kernel-based density estimation methods, we can employ arbitrary pre-processing functions such as absolute value to power traces, and we can perform information-theoretic leakage assessment. Our trick is simple and elegant, and lends itself to an easy and compact implementation. We fit a prototype implementation in about 130 lines of C code.

Program Committees

Eurocrypt 2017
Asiacrypt 2017
CHES 2016
CHES 2014
CHES 2008
CHES 2007 (Program chair)
CHES 2006
CHES 2005
CHES 2003


David W. Archer (1)
Josep Balasch (2)
Lejla Batina (2)
Hanno Becker (1)
Arthur Beckers (1)
Jose Maria Bermudo Mera (3)
Begül Bilgin (1)
Andrey Bogdanov (1)
Donald Donglong Chen (1)
Ray C. C. Cheung (1)
Siddhartha Chowdhury (1)
Kai-Hsin Chuang (1)
Jan-Pieter D’Anvers (3)
Amitabh Das (1)
Robin Degraeve (1)
Jeroen Delvaux (2)
Vassil S. Dimitrov (1)
Georgios Dimou (1)
Sylvain DUQUESNE (1)
Junfeng Fan (4)
Andrea Fantini (1)
Sebastian Faust (1)
Tim Fritzmann (1)
Robin Geelen (1)
Santosh Ghosh (1)
Benedikt Gierlichs (6)
Guido Groeseneken (1)
Johann Großschädl (1)
Milos Grujic (1)
Dawu Gu (2)
Nicolas Guillermin (1)
Qian Guo (1)
Xu Guo (1)
Daniel Heinz (1)
Anthony Van Herrewege (1)
Matthias Hiller (1)
Alireza Hodjat (2)
Frank Hoornaert (1)
Brian Huffman (1)
David Hwang (2)
Kimmo U. Järvinen (2)
Thomas Johansson (1)
Patrick Karl (1)
Angshuman Karmakar (5)
Stefan Katzenbeisser (1)
Howon Kim (1)
Miroslav Knezevic (1)
Ünal Koçabas (1)
Amit Kumar (1)
Suparna Kundu (2)
Henry Kuo (1)
Bo-Cheng Lai (1)
Gregor Leander (1)
Zhenqi Li (1)
Dimitri Linten (1)
Zhe Liu (1)
Roel Maes (2)
Hugo De Man (1)
Tynan McAuley (1)
Nele Mentens (3)
Jose M. Bermudo Mera (1)
Debdeep Mukhopadhyay (1)
Svetla Nikova (1)
Alexander Nilsson (1)
Adriaan Peetermans (1)
Hilder V. L. Pereira (1)
Peter Pessl (1)
Bart Preneel (2)
Oscar Reparaz (5)
Debapriya Basu Roy (1)
Sujoy Sinha Roy (6)
Vladimir Rozic (2)
Ahmad-Reza Sadeghi (1)
Sayandeep Saha (1)
Kazuo Sakiyama (1)
Thomas Schamberger (1)
Patrick Schaumont (2)
Dries Schellekens (1)
Ben Selfridge (1)
Hwajeong Seo (1)
Georg Sigl (1)
Jürgen Teich (1)
Kris Tiri (2)
Deniz Toz (1)
Jens Trautmann (1)
Pim Tuyls (1)
Michiel Van Beirendonck (4)
Joos Vandewalle (1)
Kerem Varici (1)
Frederik Vercauteren (6)
Christian Wachsmann (1)
Daniel Wagner (1)
Stefan Wildermann (1)
Lennert Wouters (1)
Bohan Yang (1)
Shenglin Yang (1)
Gavin Xiaoxu Yao (1)
Joseph Yiu (1)
Meng-Day (Mandel) Yu (1)
Bin Zhang (1)