Ingrid Verbauwhede

CryptoDB

Ingrid Verbauwhede

Publications and invited talks

Year

Venue

Title

2025

TCHES

Higher-Order Time Sharing Masking Abstract

Dilip Kumar S. V. Siemen Dhooghe Josep Balasch Benedikt Gierlichs Ingrid Verbauwhede

At CHES 2024, Time Sharing Masking (TSM) was introduced as a novel low-latency masking technique for hardware circuits. TSM offers area and randomness efficiency, as well as glitch-extended PINI security, but it is limited to first-order security. We address this limitation and generalize TSM to higher-order security while maintaining all of TSM’s advantages. Additionally, we propose an area-latency tradeoff. We prove HO-TSM glitch-extended PINI security and successfully evaluate our circuits using formal verification tools. Furthermore, we demonstrate area- and latency-efficient implementations of the AES S-box, which do not exhibit leakage in TVLA on FPGA. Our proposed tradeoff enables a first-order secure implementation of a complete AES-128 encryption core with 92 kGE, 920 random bits per round, and 20 cycles of latency, which does not exhibit leakage in TVLA on FPGA.

2025

TCHES

OPTIMSM: FPGA hardware accelerator for Zero-Knowledge MSM Abstract

Xander Pottier Thomas de Ruijter Jonas Bertels Wouter Legiest Michiel Van Beirendonck Ingrid Verbauwhede

The Multi-Scalar Multiplication (MSM) is the main barrier to accelerating Zero-Knowledge applications. In recent years, hardware acceleration of this algorithm on both FPGA and GPU has become a popular research topic and the subject of a multi-million dollar prize competition (ZPrize). This work presents OPTIMSM: Optimized Processing Through Iterative Multi-Scalar Multiplication. This novel accelerator focuses on the acceleration of the MSM algorithm for any Elliptic Curve (EC) by improving upon the Pippenger algorithm. A new iteration technique is introduced to decouple the required buckets from the window size, resulting in fewer EC computations for the same on-chip memory resources. Furthermore, we combine known optimizations from the literature for the first time to achieve additional latency improvements. Our enhanced MSM implementation significantly reduces computation time, achieving a speedup of up to x12.77 compared to recent FPGA implementations. Specifically, for the BLS12-381 curve, we reduce the computation time for an MSM of size 224 to 914 ms using a single compute unit on the U55C FPGA or to 231 ms using four U55C devices. These results indicate a substantial improvement in efficiency, paving the way for more scalable and efficient Zero-Knowledge proof systems.

2025

TCHES

Rudraksh: A compact and lightweight post-quantum key-encapsulation mechanism Abstract

Suparna Kundu Archisman Ghosh Angshuman Karmakar Shreyas Sen Ingrid Verbauwhede

Resource-constrained devices such as wireless sensors and Internet of Things (IoT) devices have become ubiquitous in our digital ecosystem. These devices generate and handle a major part of our digital data. However, due to the impending threat of quantum computers on our existing public-key cryptographic schemes and the limited resources available on IoT devices, it is important to design lightweight post-quantum cryptographic (PQC) schemes suitable for these devices.In this work, we explored the design space of learning with error-based PQC schemes to design a lightweight key-encapsulation mechanism (KEM) suitable for resourceconstrained devices. We have done a scrupulous and extensive analysis and evaluation of different design elements, such as polynomial size, field modulus structure, reduction algorithm, and secret and error distribution of an LWE-based KEM. Our explorations led to the proposal of a lightweight PQC-KEM, Rudraksh, without compromising security. Our scheme provides security against chosen ciphertext attacks (CCA) with more than 100 bits of Core-SVP post-quantum security and belongs to the NIST-level-I security category (provide security at least as much as AES-128). We have also shown how ASCON can be used for lightweight pseudo-random number generation and hash function in the lattice-based KEMs instead of the widely used Keccak for lightweight design. Our FPGA results show that Rudraksh currently requires the least area among the PQC KEMs of similar security. Our implementation of Rudraksh provides a ~3x improvement in terms of the area requirement compared to the state-of-the-art areaoptimized implementation of Kyber, can operate at 63%-76% higher frequency with respect to high-throughput Kyber, and improves time-area-product ~2x compared to the state-of-the-art compact implementation of Kyber published in HPEC 2022.

2025

TCHES

FINAL bootstrap acceleration on FPGA using DSP-free constant-multiplier NTTs Abstract

Jonas Bertels Hilder V. L. Pereira Ingrid Verbauwhede

This work showcases Quatorze-bis, a state-of-the-art Number Theoretic Transform circuit for TFHE-like cryptosystems on FPGAs. It contains a novel modular multiplication design for modular multiplication with a constant for a constant modulus. This modular multiplication design does not require any DSP units or any dedicated multiplier unit, nor does it require extra logic when compared to the state-of-the-art modular multipliers. Furthermore, we present an implementation of a constant multiplier Number Theoretic Transform design for TFHE-like schemes. Lastly, we use this Number Theoretic Transform design to implement a FINAL hardware accelerator for the AMD Alveo U55c which improves the Throughput metric of TFHE-like cryptosystems on FPGAs by a factor 9.28x over Li et al.’s NFP CHES 2024 accelerator and by 10-25% over the absolute state-of-the-art design FPT [vBDTV23] while using one third of FPTs DSPs.

2025

TCHES

Entropy extractor based high-throughput post-processings for True Random Number Generators Abstract

Yifan Dang Miloš Grujić Bohan Yang Wenping Zhu Hanning Wang Min Zhu Ingrid Verbauwhede Leibo Liu

In cryptographic systems, true random number generation is essential, as a compromised TRNG could lead to a security catastrophe. The raw random numbers are discrete values that are derived at discrete points in time from a noise source of a TRNG. These values often exhibit statistical defects that require post-processing, also called conditioner, to improve uniformity. The two main types of post-processing are algorithmic post-processing and cryptographic post-processing, both of which have pros and cons in theories and applications. However, another type of postprocessing existing between these two types, named entropy extractor, has often been overlooked by the applied cryptographic community. Therefore, we implement two information-theoretically provable entropy extractors: Toeplitz extractor and Trevisan extractor catering to various performance requirements and applications of high-throughput TRNG post-processing. This paper proposes a combination of matrix chunking and FFT acceleration to boost the performance of the Toeplitz extractor, along with a modified Toeplitz matrix design to decrease the hardware consumption. In addition, we introduce a lightweight single-bit extractor to implement an efficient Trevisan extractor. Both algorithms are devised and verified through FPGA hardware simulations. The enhanced Toeplitz extractor achieves a throughput of 42 Gbps, while the Trevisan extractor attains 1.82 Gbps, representing an 84% and 73% improvement in throughput-to-area ratio over the previous best-performing design for each extractor. The standard statistical test suites, such as NIST SP800-22, NIST SP800-90B, and AIS-31, are adopted to evaluate the effectiveness of the proposed post-processing techniques. Naturally, this approach can only serve as a supplementary measure, as modern standards, such as AIS-31, necessitate formal analysis and stochastic models to account for randomness.

2024

TCHES

Carry Your Fault: A Fault Propagation Attack on Side-Channel Protected LWE-based KEM Abstract

Suparna Kundu Siddhartha Chowdhury Sayandeep Saha Angshuman Karmakar Debdeep Mukhopadhyay Ingrid Verbauwhede

Post-quantum cryptographic (PQC) algorithms, especially those based on the learning with errors (LWE) problem, have been subjected to several physical attacks in the recent past. Although the attacks broadly belong to two classes – passive side-channel attacks and active fault attacks, the attack strategies vary significantly due to the inherent complexities of such algorithms. Exploring further attack surfaces is, therefore, an important step for eventually securing the deployment of these algorithms. Also, it is mportant to test the robustness of the already proposed countermeasures in this regard. In this work, we propose a new fault attack on side-channel secure masked implementation of LWE-based key-encapsulation mechanisms (KEMs) exploiting fault propagation. The attack typically originates due to an algorithmic modification widely used to enable masking, namely the Arithmetic-to-Boolean (A2B) conversion. We exploit the data dependency of the adder carry chain in A2B and extract sensitive information, albeit masking (of arbitrary order) being present. As a practical demonstration of the exploitability of this information leakage, we show key recovery attacks of Kyber, although the leakage also exists for other schemes like Saber. The attack on Kyber targets the decapsulation module and utilizes Belief Propagation (BP) for key recovery. To the best of our knowledge, it is the first attack exploiting an algorithmic component introduced to ease masking rather than only exploiting the randomness introduced by masking to obtain desired faults (as done by Delvaux [Del22]). Finally, we performed both simulated and electromagnetic (EM) fault-based practical validation of the attack for an open-source first-order secure Kyber implementation running on an STM32 platform.

2024

TCHES

Time Sharing - A Novel Approach to Low-Latency Masking Abstract

Dilip Kumar S. V. Siemen Dhooghe Josep Balasch Benedikt Gierlichs Ingrid Verbauwhede

We present a novel approach to small area and low-latency first-order masking in hardware. The core idea is to separate the processing of shares in time in order to achieve non-completeness. Resulting circuits are proven first-order glitchextended PINI secure. This means the method can be straightforwardly applied to mask arbitrary functions without constraints which the designer must take care of. Furthermore we show that an implementation can benefit from optimization through EDA tools without sacrificing security. We provide concrete results of several case studies. Our low-latency implementation of a complete PRINCE core shows a 32% area improvement (44% with optimization) over the state-of-the-art. Our PRINCE S-Box passes formal verification with a tool and the complete core on FPGA shows no first-order leakage in TVLA with 100 million traces. Our low-latency implementation of the AES S-Box costs roughly one third (one quarter with optimization) of the area of state-of-the-art implementations. It shows no first-order leakage in TVLA with 250 million traces.

2024

TCHES

TRNG Entropy Model in the Presence of Flicker FM Noise Abstract

Adriaan Peetermans Ingrid Verbauwhede

Flicker Frequency Modulated (FM) noise, which influences free-running Ring Oscillators (ROs), can make a substantial contribution to the entropy generated by RO-based True Random Number Generators (TRNGs). While current TRNG stochastic models predominantly concentrate on white FM noise, the addition of flicker FM noise could remarkably enrich the analysis of the TRNG entropy production rate. This paper introduces an entropy model for TRNGs, employing Gaussian processes, to estimate entropy generation from both white FM and flicker FM noise. We analytically derive the flicker FM noise Auto-Correlation Function (ACF), enabling assessment of entropy contributions conditioned on partial knowledge of the TRNG’s internal state. Utilizing the developed model with commonly reported noise magnitudes found in literature, it is determined that flicker FM noise holds the potential to substantially enhance the TRNG’s entropy rate. However, due to considerable variation in reported magnitudes across limited available research on flicker FM noise, it cannot yet be universally accepted as a dependable source of TRNG entropy.

2023

TCHES

BASALISC: Programmable Hardware Accelerator for BGV Fully Homomorphic Encryption Abstract

Robin Geelen Michiel Van Beirendonck Hilder V. L. Pereira Brian Huffman Tynan McAuley Ben Selfridge Daniel Wagner Georgios Dimou Ingrid Verbauwhede Frederik Vercauteren David W. Archer

Fully Homomorphic Encryption (FHE) allows for secure computation on encrypted data. Unfortunately, huge memory size, computational cost and bandwidth requirements limit its practicality. We present BASALISC, an architecture family of hardware accelerators that aims to substantially accelerate FHE computations in the cloud. BASALISC is the first to implement the BGV scheme with fully-packed bootstrapping – the noise removal capability necessary for arbitrary-depth computation. It supports a customized version of bootstrapping that can be instantiated with hardware multipliers optimized for area and power.BASALISC is a three-abstraction-layer RISC architecture, designed for a 1 GHz ASIC implementation and underway toward 150mm2 die tape-out in a 12nm GF process. BASALISC’s four-layer memory hierarchy includes a two-dimensional conflict-free inner memory layer that enables 32 Tb/s radix-256 NTT computations without pipeline stalls. Its conflict-resolution permutation hardware is generalized and re-used to compute BGV automorphisms without throughput penalty. BASALISC also has a custom multiply-accumulate unit to accelerate BGV key switching.The BASALISC toolchain comprises a custom compiler and a joint performance and correctness simulator. To evaluate BASALISC, we study its physical realizability, emulate and formally verify its core functional units, and we study its performance on a set of benchmarks. Simulation results show a speedup of more than 5,000× over HElib – a popular software FHE library.

2022

TCHES

Semi-Automatic Locating of Cryptographic Operations in Side-Channel Traces Abstract

Jens Trautmann Arthur Beckers Lennert Wouters Stefan Wildermann Ingrid Verbauwhede Jürgen Teich

Locating a cryptographic operation in a side-channel trace, i.e. finding out where it is in the time domain, without having a template, can be a tedious task even for unprotected implementations. The sheer amount of data can be overwhelming. In a simple call to OpenSSL for AES-128 ECB encryption of a single data block, only 0.00028% of the trace relate to the actual AES-128 encryption. The rest is overhead. We introduce the (to our best knowledge) first method to locate a cryptographic operation in a side-channel trace in a largely automated fashion. The method exploits meta information about the cryptographic operation and requires an estimate of its implementation’s execution time.The method lends itself to parallelization and our implementation in a tool greatly benefits from GPU acceleration. The tool can be used offline for trace segmentation and for generating a template which can then be used online in real-time waveformmatching based triggering systems for trace acquisition or fault injection. We evaluate it in six scenarios involving hardware and software implementations of different cryptographic operations executed on diverse platforms. Two of these scenarios cover realistic protocol level use-cases and demonstrate the real-world applicability of our tool in scenarios where classical leakage-detection techniques would not work. The results highlight the usefulness of the tool because it reliably and efficiently automates the task and therefore frees up time of the analyst.The method does not work on traces of implementations protected by effective time randomization countermeasures, e.g. random delays and unstable clock frequency, but is not affected by masking, shuffling and similar countermeasures.

2022

TCHES

Masked Accelerators and Instruction Set Extensions for Post-Quantum Cryptography Abstract

Tim Fritzmann Michiel Van Beirendonck Debapriya Basu Roy Patrick Karl Thomas Schamberger Ingrid Verbauwhede Georg Sigl

Side-channel attacks can break mathematically secure cryptographic systems leading to a major concern in applied cryptography. While the cryptanalysis and security evaluation of Post-Quantum Cryptography (PQC) have already received an increasing research effort, a cost analysis of efficient side-channel countermeasures is still lacking. In this work, we propose a masked HW/SW codesign of the NIST PQC finalists Kyber and Saber, suitable for their different characteristics. Among others, we present a novel masked ciphertext compression algorithm for non-power-of-two moduli. To accelerate linear performance bottlenecks, we developed a generic Number Theoretic Transform (NTT) multiplier, which, in contrast to previously published accelerators, is also efficient and suitable for schemes not based on NTT. For the critical non-linear operations, masked HW accelerators were developed, allowing a secure execution using RISC-V instruction set extensions. With the proposed design, we achieved a cycle count of K:214k/E:298k/D:313k for Kyber and K:233k/E:312k/D:351k for Saber with NIST Level III parameter sets. For the same parameter sets, the masking overhead for the first-order secure decapsulation operation including randomness generation is a factor of 4.48 for Kyber (D:1403k)and 2.60 for Saber (D:915k).

2022

TCHES

Polynomial multiplication on embedded vector architectures Abstract

Hanno Becker Jose Maria Bermudo Mera Angshuman Karmakar Joseph Yiu Ingrid Verbauwhede

High-degree, low-precision polynomial arithmetic is a fundamental computational primitive underlying structured lattice based cryptography. Its algorithmic properties and suitability for implementation on different compute platforms is an active area of research, and this article contributes to this line of work: Firstly, we present memory-efficiency and performance improvements for the Toom-Cook/Karatsuba polynomial multiplication strategy. Secondly, we provide implementations of those improvements on Arm® Cortex®-M4 CPU, as well as the newer Cortex-M55 processor, the first M-profile core implementing the M-profile Vector Extension (MVE), also known as Arm® Helium™ technology. We also implement the Number Theoretic Transform (NTT) on the Cortex-M55 processor. We show that despite being singleissue, in-order and offering only 8 vector registers compared to 32 on A-profile SIMD architectures like Arm® Neon™ technology and the Scalable Vector Extension (SVE), by careful register management and instruction scheduling, we can obtain a 3× to 5× performance improvement over already highly optimized implementations on Cortex-M4, while maintaining a low area and energy profile necessary for use in embedded market. Finally, as a real-world application we integrate our multiplication techniques to post-quantum key-encapsulation mechanism Saber

2022

TCHES

Higher-Order Masked Ciphertext Comparison for Lattice-Based Cryptography Abstract

Jan-Pieter D’Anvers Daniel Heinz Peter Pessl Michiel Van Beirendonck Ingrid Verbauwhede

Checking the equality of two arrays is a crucial building block of the Fujisaki-Okamoto transformation, and as such it is used in several post-quantum key encapsulation mechanisms including Kyber and Saber. While this comparison operation is easy to perform in a black box setting, it is hard to efficiently protect against side-channel attacks. For instance, the hash-based method by Oder et al. is limited to first-order masking, a higher-order method by Bache et al. was shown to be flawed, and a very recent higher-order technique by Bos et al. suffers in runtime. In this paper, we first demonstrate that the hash-based approach, and likely many similar first-order techniques, succumb to a relatively simple side-channel collision attack. We can successfully recover a Kyber512 key using just 6000 traces. While this does not break the security claims, it does show the need for efficient higher-order methods. We then present a new higher-order masked comparison algorithm based on the (insecure) higher-order method of Bache et al. Our new method is 4.2x, resp. 7.5x, faster than the method of Bos et al. for a 2nd, resp. 3rd, -order masking on the ARM Cortex-M4, and unlike the method of Bache et al., the new technique takes ciphertext compression into account. We prove correctness, security, and masking security in detail and provide performance numbers for 2nd and 3rd-order implementations. Finally, we verify our the side-channel security of our implementation using the test vector leakage assessment (TVLA) methodology.

2022

TCHES

An energy and area efficient, all digital entropy source compatible with modern standards based on jitter pipelining Abstract

Adriaan Peetermans Ingrid Verbauwhede

This paper proposes an energy and area efficient entropy source, suitable for true random number generation, accompanied with a stochastic model in a 28nm CMOS technology. The design uses a jitter pipelining architecture together with an increased timing resolution to achieve a maximal throughput of 298 Mbit/s and a best energy efficiency of 1.46 pJ/bit at a supply of 0.8V. The generated random bits pass the NIST SP 800-90B IID tests with a min entropy rate of 0.933 bit/bit, which is more than required by the AIS-31 standard. The all digital design allows for effortless transfer to other technology nodes, taking advantage of all benefits related to further technology scaling.

2021

TCHES

Analysis and Comparison of Table-based Arithmetic to Boolean Masking 📺 Abstract

Michiel Van Beirendonck Jan-Pieter D’Anvers Ingrid Verbauwhede

Masking is a popular technique to protect cryptographic implementations against side-channel attacks and comes in several variants including Boolean and arithmetic masking. Some masked implementations require conversion between these two variants, which is increasingly the case for masking of post-quantum encryption and signature schemes. One way to perform Arithmetic to Boolean (A2B) mask conversion is a table-based approach first introduced by Coron and Tchulkine, and later corrected and adapted by Debraize in CHES 2012. In this work, we show both analytically and experimentally that the table-based A2B conversion algorithm proposed by Debraize does not achieve the claimed resistance against differential power analysis due to a non-uniform masking of an intermediate variable. This non-uniformity is hard to find analytically but leads to clear leakage in experimental validation. To address the non-uniform masking issue, we propose two new A2B conversions: one that maintains efficiency at the cost of additional memory and one that trades efficiency for a reduced memory footprint. We give analytical and experimental evidence for their security, and will make their implementations, which are shown to be free from side-channel leakage in 100.000 power traces collected on the ARM Cortex-M4, available online. We conclude that when designing side-channel protection mechanisms, it is of paramount importance to perform both a theoretical analysis and an experimental validation of the method.

2021

TCHES

Scabbard: a suite of efficient learning with rounding key-encapsulation mechanisms 📺 Abstract

Jose Maria Bermudo Mera Angshuman Karmakar Suparna Kundu Ingrid Verbauwhede

In this paper, we introduce Scabbard, a suite of post-quantum keyencapsulation mechanisms. Our suite contains three different schemes Florete, Espada, and Sable based on the hardness of module- or ring-learning with rounding problem. In this work, we first show how the latest advancements on lattice-based cryptographycan be utilized to create new better schemes and even improve the state-of-the-art on post-quantum cryptography. We put particular focus on designing schemes that can optimally exploit the parallelism offered by certain hardware platforms and are also suitable for resource constrained devices. We show that this can be achieved without compromising the security of the schemes or penalizing their performance on other platforms.To substantiate our claims, we provide optimized implementations of our three new schemes on a wide range of platforms including general-purpose Intel processors using both portable C and vectorized instructions, embedded platforms such as Cortex-M4 microcontrollers, and hardware platforms such as FPGAs. We show that on each platform, our schemes can outperform the state-of-the-art in speed, memory footprint, or area requirements.

2020

TCHES

Time-memory trade-off in Toom-Cook multiplication: an application to module-lattice based cryptography 📺 Abstract

Jose Maria Bermudo Mera Angshuman Karmakar Ingrid Verbauwhede

Since the introduction of the ring-learning with errors problem, the number theoretic transform (NTT) based polynomial multiplication algorithm has been studied extensively. Due to its faster quasilinear time complexity, it has been the preferred choice of cryptographers to realize ring-learning with errors cryptographic schemes. Compared to NTT, Toom-Cook or Karatsuba based polynomial multiplication algorithms, though being known for a long time, still have a fledgling presence in the context of post-quantum cryptography.In this work, we observe that the pre- and post-processing steps in Toom-Cook based multiplications can be expressed as linear transformations. Based on this observation we propose two novel techniques that can increase the efficiency of Toom-Cook based polynomial multiplications. Evaluation is reduced by a factor of 2, and we call this method precomputation, and interpolation is reduced from quadratic to linear, and we call this method lazy interpolation.As a practical application, we applied our algorithms to the Saber post-quantum key-encapsulation mechanism. We discuss in detail the various implementation aspects of applying our algorithms to Saber. We show that our algorithm can improve the efficiency of the computationally costly matrix-vector multiplication by 12−37% compared to previous methods on their respective platforms. Secondly, we propose different methods to reduce the memory footprint of Saber for Cortex-M4 microcontrollers. Our implementation shows between 2.6 and 5.7 KB reduction in the memory usage with respect to the smallest implementation in the literature.

2019

PKC

Decryption Failure Attacks on IND-CCA Secure Lattice-Based Schemes Abstract

Jan-Pieter D’Anvers Qian Guo Thomas Johansson Alexander Nilsson Frederik Vercauteren Ingrid Verbauwhede

In this paper we investigate the impact of decryption failures on the chosen-ciphertext security of lattice-based primitives. We discuss a generic framework for secret key recovery based on decryption failures and present an attack on the NIST Post-Quantum Proposal ss-ntru-pke. Our framework is split in three parts: First, we use a technique to increase the failure rate of lattice-based schemes called failure boosting. Based on this technique we investigate the minimal effort for an adversary to obtain a failure in three cases: when he has access to a quantum computer, when he mounts a multi-target attack or when he can only perform a limited number of oracle queries. Secondly, we examine the amount of information that an adversary can derive from failing ciphertexts. Finally, these techniques are combined in an overall analysis of the security of lattice based schemes under a decryption failure attack. We show that an attacker could significantly reduce the security of lattice based schemes that have a relatively high failure rate. However, for most of the NIST Post-Quantum Proposals, the number of required oracle queries is above practical limits. Furthermore, a new generic weak-key (multi-target) model on lattice-based schemes, which can be viewed as a variant of the previous framework, is proposed. This model further takes into consideration the weak-key phenomenon that a small fraction of keys can have much larger decoding error probability for ciphertexts with certain key-related properties. We apply this model and present an attack in detail on the NIST Post-Quantum Proposal – ss-ntru-pke – with complexity below the claimed security level.

2018

TCHES

A Cautionary Note When Looking for a Truly Reconfigurable Resistive RAM PUF 📺 Abstract

Kai-Hsin Chuang Robin Degraeve Andrea Fantini Guido Groeseneken Dimitri Linten Ingrid Verbauwhede

The reconfigurable physically unclonable function (PUF) is an advanced security hardware primitive, suitable for applications requiring key renewal or similar refresh functions. The Oxygen vacancies-based resistive RAM (RRAM), has been claimed to be a physically reconfigurable PUF due to its intrinsic switching variability. This paper first analyzes and compares various previously published RRAM-based PUFs with a physics-based RRAM model. We next discuss their possible reconfigurability assuming an ideal configuration-to-configuration behavior. The RRAM-to-RRAM variability, which mainly originates from a variable number of unremovable vacancies inside the RRAM filament, however, has been observed to have significant impact on the reconfigurability. We show by quantitative analysis on the clear uniqueness degradation from the ideal situation in all the discussed implementations. Thus we conclude that true reconfigurability with RRAM PUFs might be unachievable due to this physical phenomena.

2018

TCHES

Saber on ARM CCA-secure module lattice-based key encapsulation on ARM Abstract

Angshuman Karmakar Jose M. Bermudo Mera Sujoy Sinha Roy Ingrid Verbauwhede

The CCA-secure lattice-based post-quantum key encapsulation scheme Saber is a candidate in the NIST’s post-quantum cryptography standardization process. In this paper, we study the implementation aspects of Saber in resourceconstrained microcontrollers from the ARM Cortex-M series which are very popular for realizing IoT applications. In this work, we carefully optimize various parts of Saber for speed and memory. We exploit digital signal processing instructions and efficient memory access for a fast implementation of polynomial multiplication. We also use memory efficient Karatsuba and just-in-time strategy for generating the public matrix of the module lattice to reduce the memory footprint. We also show that our optimizations can be combined with each other seamlessly to provide various speed-memory trade-offs. Our speed optimized software takes just 1,147K, 1,444K, and 1,543K clock cycles on a Cortex-M4 platform for key generation, encapsulation and decapsulation respectively. Our memory efficient software takes 4,786K, 6,328K, and 7,509K clock cycles on an ultra resource-constrained Cortex-M0 platform for key generation, encapsulation, and decapsulation respectively while consuming only 6.2 KB of memory at most. These results show that lattice-based key encapsulation schemes are perfectly practical for securing IoT devices from quantum computing attacks.

2018

TCHES

ES-TRNG: A High-throughput, Low-area True Random Number Generator based on Edge Sampling Abstract

Bohan Yang Vladimir Rozic Milos Grujic Nele Mentens Ingrid Verbauwhede

In this paper we present a novel true random number generator based on high-precision edge sampling. We use two novel techniques to increase the throughput and reduce the area of the proposed randomness source: variable-precision phase encoding and repetitive sampling. The first technique consists of encoding the oscillator phase with high precision in the regions around the signal edges and with low precision everywhere else. This technique results in a compact implementation at the expense of reduced entropy in some samples. The second technique consists of repeating the sampling at high frequency until the phase region encoded with high precision is captured. This technique ensures that only the high-entropy bits are sent to the output. The combination of the two proposed techniques results in a secure TRNG, which suits both ASIC and FPGA implementations. The core part of the proposed generator is implemented with 10 look-up tables (LUTs) and 5 flip-flops (FFs) of a Xilinx Spartan-6 FPGA, and achieves a throughput of 1.15 Mbps with 0.997 bits of Shannon entropy. On Intel Cyclone V FPGAs, this implementation uses 10 LUTs and 6 FFs, and achieves a throughput of 1.07 Mbps. This TRNG design is supported by a stochastic model and a formal security evaluation.

2017

CHES

Fast Leakage Assessment Abstract

Oscar Reparaz Benedikt Gierlichs Ingrid Verbauwhede

We describe a fast technique for performing the computationally heavy part of leakage assessment, in any statistical moment (or other property) of the leakage samples distributions. The proposed technique outperforms by orders of magnitude the approach presented at CHES 2015 by Schneider and Moradi. We can carry out evaluations that before took 90 CPU-days in 4 CPU-hours (about a 500-fold speed-up). As a bonus, we can work with exact arithmetic, we can apply kernel-based density estimation methods, we can employ arbitrary pre-processing functions such as absolute value to power traces, and we can perform information-theoretic leakage assessment. Our trick is simple and elegant, and lends itself to an easy and compact implementation. We fit a prototype implementation in about 130 lines of C code.

2016

CHES

Efficient Fuzzy Extraction of PUF-Induced Secrets: Theory and Applications 📺

Jeroen Delvaux Dawu Gu Ingrid Verbauwhede Matthias Hiller Meng-Day (Mandel) Yu

2015

CRYPTO

Consolidating Masking Schemes

Oscar Reparaz Begül Bilgin Svetla Nikova Benedikt Gierlichs Ingrid Verbauwhede

2015

CHES

Modular Hardware Architecture for Somewhat Homomorphic Function Evaluation

Sujoy Sinha Roy Kimmo Järvinen Frederik Vercauteren Vassil S. Dimitrov Ingrid Verbauwhede