International Association for Cryptologic Research

International Association
for Cryptologic Research


Francesco Regazzoni


Compact Circuits for Efficient Möbius Transform
Subhadeep Banik Francesco Regazzoni
The Möbius transform is a linear circuit used to compute the evaluations of a Boolean function over all points on its input domain. The operation is very useful in finding the solution of a system of polynomial equations over GF(2) for obvious reasons. However the operation, although linear, needs exponential number of logic operations (around n · 2n−1 bit xors) for an n-variable Boolean function. As such, the only known hardware circuit to efficiently compute the Möbius Transform requires silicon area that is exponential in n. For Boolean functions whose algebraic degree is bound by some parameter d, recursive definitions of the Möbius Transform exist that requires only O(nd+1) space in software. However converting the mathematical definition of this space-efficient algorithm into a hardware architecture is a non-trivial task, primarily because the recursion calls notionally lead to a depth-first search in a transition graph that requires context switches at each recursion call for which straightforward mapping to hardware is difficult. In this paper we look to overcome these very challenges in an engineering sense. We propose a space efficient sequential hardware circuit for the Möbius Transform that requires only polynomial circuit area (i.e. O(nd+1)) provided the algebraic degree of the Boolean function is limited to d. We show how this circuit can be used as a component to efficiently solve polynomial equations of degree at most d by using fast exhaustive search. We propose three different circuit architectures for this, each of which uses the Möbius Transform circuit as a core component. We show that asymptotically, all the solutions of a system of m polynomials in n unknowns and algebraic degree d over GF(2) can be found using a circuit of silicon area proportional to m · nd+1 and circuit depth proportional to 2 · log2(n − d).In the second part of the paper we introduce a fourth hardware solver that additionally aims to achieve energy efficiency. The main idea is to reduce the solution space to a small enough value by parallel application of Möbius Transform circuits over the first few equations of the system. This is done so that one can check individually whether the vectors of this reduced solution space satisfy each of the remaining equations of the system using lower power consumption. The new circuit has area also bound by m · nd+1 and has circuit depth proportional to d · log2 n. We also show that further optimizations with respect to energy consumption may be obtained by using depth-bound Möbius circuits that exponentially decrease run time at the cost of additional logic area and depth.
The QARMAv2 Family of Tweakable Block Ciphers
We introduce the QARMAv2 family of tweakable block ciphers. It is a redesign of QARMA (from FSE 2017) to improve its security bounds and allow for longer tweaks, while keeping similar latency and area. The wider tweak input caters to both specific use cases and the design of modes of operation with higher security bounds. This is achieved through new key and tweak schedules, revised S-Box and linear layer choices, and a more comprehensive security analysis. QARMAv2 offers competitive latency and area in fully unrolled hardware implementations.Some of our results may be of independent interest. These include: new MILP models of certain classes of diffusion matrices; the comparative analysis of a full reflection cipher against an iterative half-cipher; our boomerang attack framework; and an improved approach to doubling the width of a block cipher.
An Instruction Set Extension to Support Software-Based Masking 📺
In both hardware and software, masking can represent an effective means of hardening an implementation against side-channel attack vectors such as Differential Power Analysis (DPA). Focusing on software, however, the use of masking can present various challenges: specifically, it often 1) requires significant effort to translate any theoretical security properties into practice, and, even then, 2) imposes a significant overhead in terms of efficiency. To address both challenges, this paper explores the use of an Instruction Set Extension (ISE) to support masking in software-based implementations of a range of (symmetric) cryptographic kernels including AES: we design, implement, and evaluate such an ISE, using RISC-V as the base ISA. Our ISE-supported first-order masked implementation of AES, for example, is an order of magnitude more efficient than a software-only alternative with respect to both execution latency and memory footprint; this renders it comparable to an unmasked implementation using the same metrics, but also first-order secure.
Friet: an Authenticated Encryption Scheme with Built-in Fault Detection 📺
In this work we present a duplex-based authenticated encryption scheme Friet based on a new permutation called Friet-P. We designed Friet-P with a novel approach for cryptographic permutations and block ciphers that takes fault-attack resistance into account and that we introduce in this paper. In this method, we build a permutation f_C to be embedded in a larger one f. First, we define f as a sequence of steps that all abide a chosen error-correcting code C, i.e., that map C-codewords to C-codewords. Then, we embed f_C in f by first encoding its input to an element of C, applying f and then decoding back from C. This last step detects a fault when the output of f is not in C. We motivate the design of the permutation we use in Friet and report on performance in soft- and hardware. We evaluate the fault-detection capabilities of the software and simulated hardware implementations with attacks. Finally, we perform a leakage evaluation. Our code is available at
Swap and Rotate: Lightweight Linear Layers for SPN-based Blockciphers 📺
In CHES 2017, Jean et al. presented a paper on “Bit-Sliding” in which the authors proposed lightweight constructions for SPN based block ciphers like AES, PRESENT and SKINNY. The main idea behind these constructions was to reduce the length of the datapath to 1 bit and to reformulate the linear layer for these ciphers so that they require fewer scan flip-flops (which have built-in multiplexer functionality and so larger in area as compared to a simple flip-flop). In this paper, we develop their idea even further in few separate directions.First, we prove that given an arbitrary linear transformation, it is always possible to construct the linear layer using merely 2 scan flip-flops. This points to an optimistic venue to follow to gain further GE reductions, yet the straightforward application of the techniques in our proof to PRESENT and GIFT leads to inefficient implementations of the linear layer, as reducing ourselves to 2 scan flip-flops setting requires thousands of clock cycles and leads to very high latency.Equipped with the well-established formalism on permutation groups, we explore whether we can reduce the number of clock cycles to a practical level, i.e. few hundreds, by adding few more pairs of scan flip flops. For PRESENT, we show that 4 (resp. 8, 12) scan flip-flops are sufficient to complete the permutation layer in 384 (resp. 256, 128) clock cycles. For GIFT, we show that 4 (resp. 8, 10) scan flip flops correspond to 320 (resp. 192, 128) clock cycles. Finally, in order to provide the best of the two worlds (i.e. circuit area and latency), we push our scan flip-flop choices even further to completely eliminate the latency incurred by the permutation layer, without compromising our stringent GE budget. We show that not only 12 scan flip flops are sufficient to execute PRESENT permutation in 64 clock cycles, but also the same scan flip flops can be used readily in a combined encryption decryption circuit. Our final design of PRESENT and GIFT beat the record of Jean et al. and Banik et al. in both latency and in circuit-size metric. We believe that the techniques presented in our work can also be used at choosing bit-sliding-friendly linear layer permutations for the future SPN-based designs.
The Curse of Class Imbalance and Conflicting Metrics with Machine Learning for Side-channel Evaluations 📺
We concentrate on machine learning techniques used for profiled sidechannel analysis in the presence of imbalanced data. Such scenarios are realistic and often occurring, for instance in the Hamming weight or Hamming distance leakage models. In order to deal with the imbalanced data, we use various balancing techniques and we show that most of them help in mounting successful attacks when the data is highly imbalanced. Especially, the results with the SMOTE technique are encouraging, since we observe some scenarios where it reduces the number of necessary measurements more than 8 times. Next, we provide extensive results on comparison of machine learning and side-channel metrics, where we show that machine learning metrics (and especially accuracy as the most often used one) can be extremely deceptive. This finding opens a need to revisit the previous works and their results in order to properly assess the performance of machine learning in side-channel analysis.
Towards Low Energy Stream Ciphers 📺
Energy optimization is an important design aspect of lightweight cryptography. Since low energy ciphers drain less battery, they are invaluable components of devices that operate on a tight energy budget such as handheld devices or RFID tags. At Asiacrypt 2015, Banik et al. presented the block cipher family Midori which was designed to optimize the energy consumed per encryption and which reduces the energy consumption by more than 30% compared to previous block ciphers. However, if one has to encrypt/decrypt longer streams of data, i.e. for bulk data encryption/decryption, it is expected that a stream cipher should perform even better than block ciphers in terms of energy required to encrypt. In this paper, we address the question of designing low energy stream ciphers. To this end, we analyze for common stream cipher design components their impact on the energy consumption. Based on this, we give arguments why indeed stream ciphers allow for encrypting long data streams with less energy than block ciphers and validate our findings by implementations. Afterwards, we use the analysis results to identify energy minimizing design principles for stream ciphers.

Program Committees

CHES 2021
CHES 2019
CHES 2018
CHES 2017
CHES 2016
CHES 2015
CHES 2014
CHES 2013