International Association for Cryptologic Research

International Association
for Cryptologic Research

CryptoDB

Papers from Transactions on Cryptographic Hardware and Embedded Systems 2023

Year
Venue
Title
2023
TCHES
“Whispering MLaaS”: Exploiting Timing Channels to Compromise User Privacy in Deep Neural Networks
While recent advancements of Deep Learning (DL) in solving complex real-world tasks have spurred their popularity, the usage of privacy-rich data for their training in varied applications has made them an overly-exposed threat surface for privacy violations. Moreover, the rapid adoption of cloud-based Machine-Learning-asa-Service (MLaaS) has broadened the threat surface to various remote side-channel attacks. In this paper, for the first time, we show one such privacy violation by observing a data-dependent timing side-channel (naming this to be Class-Leakage) originating from non-constant time branching operation in a widely popular DL framework, namely PyTorch. We further escalate this timing variability to a practical inference-time attack where an adversary with user level privileges and having hard-label black-box access to an MLaaS can exploit Class-Leakage to compromise the privacy of MLaaS users. DL models have also been shown to be vulnerable to Membership Inference Attack (MIA), where the primary objective of an adversary is to deduce whether any particular data has been used while training the model. Differential Privacy (DP) has been proposed in recent literature as a popular countermeasure against MIA, where inclusivity and exclusivity of a data-point in a dataset cannot be ascertained by definition. In this paper, we also demonstrate that the existence of a data-point within the training dataset of a DL model secured with DP can still be distinguished using the identified timing side-channel. In addition, we propose an efficient countermeasure to the problem by introducing constant-time branching operation that alleviates the Class-Leakage. We validate the approach using five pre-trained DL models trained on two standard benchmarking image classification datasets, CIFAR-10 and CIFAR-100, over two different computing environments having Intel Xeon and Intel i7 processors.
2023
TCHES
1LUTSensor: Detecting FPGA Voltage Fluctuations using LookUp Tables
Remote Power Analysis (RPA) attacks use transient voltage fluctuation side channels detected via delay sensors/ on-chip voltage sensors to reveal secret keys from cryptographic circuits. The state-of-the-art research proposed five on-chip voltage sensors for Field Programmable Gate Arrays (FPGAs). This paper proposes a novel on-chip voltage sensor, 1LUTSensor, which uses FPGA LookUp Table (LUT) structure to deduce voltage fluctuations. 1LUTSensor uses LUT multiplexers to create a run-time adjustable delay line to detect voltage fluctuations and uses dedicated paths which are fabricated signal connections and cannot be changed in the FPGA LUT to form the delay line. 1LUTSensor uses only a single LUT and a single flip-flop for the delay line to sense voltage fluctuations and uses a single tapped delay element for calibration. The output of the 1LUTSensor is a single bit. Compared to the state-of-the-art on-chip sensors, 1LUTSensor proposed in this paper is the smallest and fastest on-chip voltage sensor proposed thus far. 1LUTSensor is at least 3x smaller than the smallest on-chip sensor proposed in the literature. Compared to the state-of-the-art, the proposed 1LUTSensor can be operated at 600MHz. 1LUTSensor is evaluated using RPA attacks, and a complete secret key of an AES circuit can be extracted within 100,000 traces.
2023
TCHES
A Closer Look at the Chaotic Ring Oscillators based TRNG Design
TRNG is an essential component for security applications. A vulnerable TRNG could be exploited to facilitate potential attacks or be related to a reduced key space, and eventually results in a compromised cryptographic system. A digital FIRO-/GARO-based TRNG with high throughput and high entropy rate was introduced by Jovan Dj. Golic (TC’06). However, the fact that periodic oscillation is a main failure of FIRO-/GARO-based TRNGs is noticed in the paper (Markus Dichtl, ePrint’15). We verify this problem and estimate the consequential entropy loss using Lyapunov exponents and the test suite of the NIST SP 800-90B standard. To address the problem of periodic oscillations, we propose several implementation guidelines based on a gate-level model, a design methodology to build a reliable GARO-based TRNG, and an online test to improve the robustness of FIRO-/GARO-based TRNGs. The gate-level implementation guidelines illustrate the causes of periodic oscillations, which are verified by actual implementation and bifurcation diagram. Based on the design methodology, a suitable feedback polynomial can be selected by evaluating the feedback polynomials. The analysis and understanding of periodic oscillation and FIRO-/GARO-based TRNGs are deepened by delay adjustment. A TRNG with the selected feedback polynomial may occasionally enter periodic oscillations, due to active attacks and the delay inconstancy of implementations. This inconstancy might be caused by self-heating, temperature and voltage fluctuation, and the process variation among different silicon chips. Thus, an online test module, as one indispensable component of TRNGs, is proposed to detect periodic oscillations. The detected periodic oscillation can be eliminated by adjusting feedback polynomial or delays to improve the robustness. The online test module is composed of a lightweight and responsive detector with a high detection rate, outperforming the existing detector design and statistical tests. The areas, power consumptions and frequencies are evaluated based on the ASIC implementations of a GARO, the sampling circuit and the online test module. The gate-level implementation guidelines promote the future establishment of the stochastic model of FIRO-/GARO-based TRNGs with a deeper understanding.
2023
TCHES
A Tale of Snakes and Horses: Amplifying Correlation Power Analysis on Quadratic Maps
We study the success probabilities of two variants of Correlation Power Analysis (CPA) to retrieve multiple secret bits. The target is a permutation-based symmetric cryptographic construction with a quadratic map as an S-box. More precisely, we focus on the nonlinear mapping X used in the Xoodoo and Keccak-p permutations, which is affine equivalent to the nonlinear mapping of Ascon. We thus consider three-bit and five-bit S-boxes. Our leakage model is the difference in power consumption of register cells before and after one round. It reflects that, >in hardware, the aforesaid cryptographic algorithms are usually implemented by deploying a round-based architecture. The power consumption difference depends on whether the targeted bits in the register flip. In particular, we describe two attacks based on the CPA methodology. First, we start with a standard CPA approach, for which, to the best of our knowledge, we are the first to point out the differences between attacking a three-bit and a five-bit S-box. For CPA, the highest correlation coefficient is the most likely secret hypothesis. We improve on this with our novel combined Correlation Power Analysis (combined CPA), or Snake attack, which uses quadratic map cryptanalysis (e.g., of the function X) to achieve a better attack in terms of the number of traces required and computational complexity. For the Snake attack, sums of absolute or squared values of correlation coefficients are used to determine the most likely guess. As a result, we effectively show that our proposed Snake attack can recover the secret in n22 (or n22+n) intermediate results, compared to 22n for the CPA, where n is the length of the targeted S-Box. We collected power measurements from a hardware setup to demonstrate practical attack success probabilities according to the rank of the correct secret hypothesis both for Xoodoo and Keccak-p. In addition, we explain our success probabilities thanks to the Henery model developed for horse races. In short, after performing 16,896 attacks, the Snake attack or combined CPA on Xoodoo consistently recovers the correct secret bits with each attack using 43,860 traces on average and with only 12 correlations, compared to 61,380 traces for the standard CPA attack with 64 correlations. The Snake attack requires about one-fifth as many correlation values as the standard CPA. For Keccak-p, the difference is more drastic: the Snake attack recovers the secret bits invariably after 771,600 traces with just 20 correlations. In contrast, the standard CPA attack operates on 1,024 correlations (about fifty times more than the Snake attack) and requires 1,223,400 traces.
2023
TCHES
All You Need Is Fault: Zero-Value Attacks on AES and a New λ-Detection M&M
Deploying cryptography on embedded systems requires security against physical attacks. At CHES 2019, M&M was proposed as a combined countermeasure applying masking against SCAs and information-theoretic MAC tags against FAs. In this paper, we show that one of the protected AES implementations in the M&M paper is vulnerable to a zero-value SIFA2-like attack. A practical attack is demonstrated on an ASIC board. We propose two versions of the attack: the first follows the SIFA approach to inject faults in the last round, while the second one is an extension of SIFA and FTA but applied to the first round with chosen plaintext. The two versions work at the byte level, but the latter version considerably improves the efficiency of the attack. Moreover, we show that this zero-value SIFA2 attack is specific to the AES tower-field decomposed S-box design. Hence, such attacks are applicable to any implementation featuring this AES S-box architecture.Then, we propose a countermeasure that prevents these attacks. We extend M&M with a fine-grained detection-based feature capable of detecting the zero-value glitch attacks. In this effort, we also solve the problem of a combined attack on the ciphertext output check of M&M scheme by using Kronecker’s delta function. We deploy the countermeasure on FPGA and verify its security against both fault and side-channel analysis with practical experiments.
2023
TCHES
Areion: Highly-Efficient Permutations and Its Applications to Hash Functions for Short Input
In the real-world applications, the overwhelming majority of cases require hashing with relatively short input, say up to 2K bytes. The length of almost all TCP/IP packets is between 40 to 1.5K bytes, and the maximum packet lengths of major protocols, e.g., Zigbee, Bluetooth low energy, and Controller Area Network (CAN) are less than 128 bytes. However, existing schemes are not well optimized for short input. To bridge the gap between real-world needs (in future) and limited performances of state-of-the-art hash functions for short input, we design a family of wide-block permutations Areion that fully leverages the power of AES instructions, which are widely deployed in many devices. As its applications, we propose several hash functions. Areion significantly outperforms existing schemes for short input and even competitive to relatively long message. Indeed, our hash function is surprisingly fast, and its performance is less than 3 cycles/byte in the latest Intel architecture for any message size. Especially, it is about 10 times faster than existing state-of-the-art schemes for short message up to around 100 bytes, which are most widely-used input size in real-world applications, on both the latest CPU architectures (IceLake, Tiger Lake, and Alder Lake) and mobile platforms (Pixel 6 and iPhone 13).
2023
TCHES
Automatic Search of Meet-in-the-Middle Differential Fault Analysis on AES-like Ciphers
Fault analysis is a powerful technique to retrieve secret keys by exploiting side-channel information. Differential fault analysis (DFA) is one of the most powerful threats utilizing differential information between correct and faulty ciphertexts and can recover keys for symmetric-key cryptosystems efficiently. Since DFA usually targets the first or last few rounds of the block ciphers, some countermeasures against DFA only protect the first and last few rounds for efficiency. Therefore, to explore how many rounds DFA can affect is very important to make sure how many rounds to protect in practice. At CHES 2011, Derbez et al. proposed an improved DFA on AES based on MitM approach, which covers one more round than previous DFAs. To perform good (or optimal) MitM DFA on block ciphers, the good (or optimal) attack configurations should be identified, such as the location where the faults inject, the matching point with differential relationship, and the two independent computation paths where two independent subsets of the key are involved. In this paper, we formulate the essential ideas of the construction of the attack, and translate the problem of searching for the best MitM DFA into optimization problems under constraints in Mixed-Integer-Linear-Programming (MILP) models. With the models, we achieve more powerful and practical DFA attacks on SKINNY, CRAFT, QARMA, PRINCE, PRINCEv2, and MIDORI with faults injected in 1 to 9 earlier rounds than the best previous DFAs.
2023
TCHES
BASALISC: Programmable Hardware Accelerator for BGV Fully Homomorphic Encryption
Fully Homomorphic Encryption (FHE) allows for secure computation on encrypted data. Unfortunately, huge memory size, computational cost and bandwidth requirements limit its practicality. We present BASALISC, an architecture family of hardware accelerators that aims to substantially accelerate FHE computations in the cloud. BASALISC is the first to implement the BGV scheme with fully-packed bootstrapping – the noise removal capability necessary for arbitrary-depth computation. It supports a customized version of bootstrapping that can be instantiated with hardware multipliers optimized for area and power.BASALISC is a three-abstraction-layer RISC architecture, designed for a 1 GHz ASIC implementation and underway toward 150mm2 die tape-out in a 12nm GF process. BASALISC’s four-layer memory hierarchy includes a two-dimensional conflict-free inner memory layer that enables 32 Tb/s radix-256 NTT computations without pipeline stalls. Its conflict-resolution permutation hardware is generalized and re-used to compute BGV automorphisms without throughput penalty. BASALISC also has a custom multiply-accumulate unit to accelerate BGV key switching.The BASALISC toolchain comprises a custom compiler and a joint performance and correctness simulator. To evaluate BASALISC, we study its physical realizability, emulate and formally verify its core functional units, and we study its performance on a set of benchmarks. Simulation results show a speedup of more than 5,000× over HElib – a popular software FHE library.
2023
TCHES
Belief Propagation Meets Lattice Reduction: Security Estimates for Error-Tolerant Key Recovery from Decryption Errors
In LWE-based KEMs, observed decryption errors leak information about the secret key in the form of equations or inequalities. Several practical fault attacks have already exploited such leakage by either directly applying a fault or enabling a chosen-ciphertext attack using a fault. When the leaked information is in the form of inequalities, the recovery of the secret key is not trivial. Recent methods use either statistical or algebraic methods (but not both), with some being able to handle incorrect information. Having in mind that integration of the side-channel information is a crucial part of several classes of implementation attacks on LWEbased schemes, it is an important question whether statistically processed information can be successfully integrated in lattice reduction algorithms.We answer this question positively by proposing an error-tolerant combination of statistical and algebraic methods that make use of the advantages of both approaches. The combination enables us to improve upon existing methods – we use both fewer inequalities and are more resistant to errors. We further provide precise security estimates based on the number of available inequalities.Our recovery method applies to several types of implementation attacks in which decryption errors are used in a chosen-ciphertext attack. We practically demonstrate the improved performance of our approach in a key-recovery attack against Kyber with fault-induced decryption errors.
2023
TCHES
Cache-Timing Attack Against HQC
In this paper, we present the first chosen-ciphertext (CC) cache-timing attacks on the reference implementation of HQC. We build a cache-timing based distinguisher for implementing a plaintext-checking (PC) oracle. The PC oracle uses side-channel information to check if a given ciphertext decrypts to a given message. This is done by identifying a vulnerability during the generating process of two vectors in the reference implementation of HQC. We also propose a new method of using PC oracles for chosen-ciphertext side-channel attacks against HQC, which may have independent interest.We show a general proof-of-concept attack, where we use the Flush+Reload technique and also derive, in more detail, a practical attack on an HQC execution on Intel SGX, where the Prime+Probe technique is used. We show the exact path to do key recovery by explaining the detailed steps, using the PC oracle. In both scenarios, the new attack requires 53, 857 traces on average with much fewer PC oracle calls than the timing attack of Guo et al. CHES 2022 on an HQC implementation.
2023
TCHES
CalyPSO: An Enhanced Search Optimization based Framework to Model Delay-based PUFs
Delay-based Physically Unclonable Functions (PUFs) are a popular choice for “keyless” cryptography in low-power devices. However, they have been subjected to modeling attacks using Machine Learning (ML) approaches, leading to improved PUF designs that resist ML-based attacks. On the contrary, evolutionary search (ES) based modeling approaches have garnered little attention compared to their ML counterparts due to their limited success. In this work, we revisit the problem of modeling delaybased PUFs using ES algorithms and identify drawbacks in present state-of-the-art genetic algorithms (GA) when applied to PUFs. This leads to the design of a new ES-based algorithm called CalyPSO, inspired by Particle Swarm Optimization (PSO) techniques, which is fundamentally different from classic genetic algorithm design rationale. This allows CalyPSO to avoid the pitfalls of textbook GA and mount successful modeling attacks on a variety of delay-based PUFs, including k-XOR APUF variants. Empirically, we show attacks for the parameter choices of k as high as 20, for which there are no reported ML or ES-based attacks without exploiting additional information like reliability or power/timing side-channels. We further show that CalyPSO can invade PUF designs like interpose-PUFs (i-PUFs) and (previously unattacked) LP-PUFs, which attempt to enhance ML robustness by obfuscating the input challenge. Furthermore, we evolve CalyPSO to CalyPSO++ by observing that the PUF compositions do not alter the input challenge dimensions, allowing the attacker to investigate cross-architecture modeling. This allows us to model a k-XOR APUF using a (k − 1)-XOR APUF as well as perform cross-architectural modeling of BRPUF and i-PUF using k-XOR APUF variants. CalyPSO++ provides the first modeling attack on 4 LP-PUF by reducing it to a 4-XOR APUF. Finally, we demonstrate the potency of CalyPSO and CalyPSO++ by successfully modeling various PUF architectures on noisy simulations as well as real-world hardware implementations.
2023
TCHES
Carry-based Differential Power Analysis (CDPA) and its Application to Attacking HMAC-SHA-2
In this paper, we introduce Carry-based Differential Power Analysis (CDPA), a novel methodology that allows for attacking schemes that use arithmetical addition. We apply this methodology to attacking HMAC-SHA-2. We provide full mathematical analysis of the method and show that under certain assumptions and with a sufficient amount of traces any key can be revealed. In the experimental part of the paper, we demonstrate successful application of the attack both in software simulation and on an FPGA board using power consumption measurements. With as few as 30K traces measured on the FPGA board, we recover the secrets that allow for forging the HMAC-SHA-2 signature of any message in 3% of the cases — while with 275K traces the success rate reaches 100%. This means that any implementation of HMAC-SHA-2, even in pure parallel hardware, is vulnerable to side-channel attacks, unless it is adequately protected. To the best of our knowledge, this is the first published full-fledged attack on pure hardware implementations of HMAC-SHA-2, which does not require a profiling stage.
2023
TCHES
Conditional Variational AutoEncoder based on Stochastic Attacks
Over the recent years, the cryptanalysis community leveraged the potential of research on Deep Learning to enhance attacks. In particular, several studies have recently highlighted the benefits of Deep Learning based Side-Channel Attacks (DLSCA) to target real-world cryptographic implementations. While this new research area on applied cryptography provides impressive result to recover a secret key even when countermeasures are implemented (e.g. desynchronization, masking schemes), the lack of theoretical results make the construction of appropriate and powerful models a notoriously hard problem. This can be problematic during an evaluation process where a security bound is required. In this work, we propose the first solution that bridges DL and SCA in order to get this security bound. Based on theoretical results, we develop the first Machine Learning generative model, called Conditional Variational AutoEncoder based on Stochastic Attacks (cVAE-SA), designed from the well-known Stochastic Attacks, that have been introduced by Schindler et al. in 2005. This model reduces the black-box property of DL and eases the architecture design for every real-world crypto-system as we define theoretical complexity bounds which only depend on the dimension of the (reduced) trace and the targeting variable over F2n . We validate our theoretical proposition through simulations and public datasets on a wide range of use cases, including multi-task learning, curse of dimensionality and masking scheme.
2023
TCHES
Cryptanalysis of ARX-based White-box Implementations
At CRYPTO’22, Ranea, Vandersmissen, and Preneel proposed a new way to design white-box implementations of ARX-based ciphers using so-called implicit functions and quadratic-affine encodings. They suggest the Speck block-cipher as an example target.In this work, we describe practical attacks on the construction. For the implementation without one of the external encodings, we describe a simple algebraic key recovery attack. If both external encodings are used (the main scenario suggested by the authors), we propose optimization and inversion attacks, followed by our main result - a multiple-step round decomposition attack and a decomposition-based key recovery attack.Our attacks only use the white-box round functions as oracles and do not rely on their description. We implemented and verified experimentally attacks on white-box instances of Speck-32/64 and Speck-64/128. We conclude that a single ARX-round is too weak to be used as a white-box round.
2023
TCHES
cuZK: Accelerating Zero-Knowledge Proof with A Faster Parallel Multi-Scalar Multiplication Algorithm on GPUs
Zero-knowledge proof is a critical cryptographic primitive. Its most practical type, called zero-knowledge Succinct Non-interactive ARgument of Knowledge (zkSNARK), has been deployed in various privacy-preserving applications such as cryptocurrencies and verifiable machine learning. Unfortunately, zkSNARK like Groth16 has a high overhead on its proof generation step, which consists of several time-consuming operations, including large-scale matrix-vector multiplication (MUL), number-theoretic transform (NTT), and multi-scalar multiplication (MSM). Therefore, this paper presents cuZK, an efficient GPU implementation of zkSNARK with the following three techniques to achieve high performance. First, we propose a new parallel MSM algorithm. This MSM algorithm achieves nearly perfect linear speedup over the Pippenger algorithm, a well-known serial MSM algorithm. Second, we parallelize the MUL operation. Along with our self-designed MSM scheme and well-studied NTT scheme, cuZK achieves the parallelization of all operations in the proof generation step. Third, cuZK reduces the latency overhead caused by CPU-GPU data transfer by 1) reducing redundant data transfer and 2) overlapping data transfer and device computation. The evaluation results show that our MSM module provides over 2.08x (up to 2.94x) speedup versus the state-of-the-art GPU implementation. cuZK achieves over 2.65x (up to 4.86x) speedup on standard benchmarks and 2.18× speedup on a GPU-accelerated cryptocurrency application, Filecoin.
2023
TCHES
Deep Learning Side-Channel Collision Attack
With the breakthrough of Deep Neural Networks, many fields benefited from its enormously increasing performance. Although there is an increasing trend to utilize Deep Learning (DL) for Side-Channel Analysis (SCA) attacks, previous works made specific assumptions for the attack to work. Especially the concept of template attacks is widely adapted while not much attention was paid to other attack strategies. In this work, we present a new methodology, that is able to exploit side-channel collisions in a black-box setting. In particular, our attack is performed in a non-profiled setting and requires neither a hypothetical power model (or let’s say a many-to-one function) nor details about the underlying implementation. While the existing non-profiled DL attacks utilize training metrics to distinguish the correct key, our attack is more efficient by training a model that can be applied to recover multiple key portions, e.g., bytes. In order to perform our attack on raw traces instead of pre-selected samples, we further introduce a DL-based technique that can localize input-dependent leakages in masked implementations, e.g., the leakages associated to one byte of the cipher state in case of AES. We validated our approach by targeting several publicly available power consumption datasets measured from implementations protected by different masking schemes. As a concrete example, we demonstrate how to successfully recover the key bytes of the ASCAD dataset with only a single trained model in a non-profiled setting.
2023
TCHES
Dlog is Practically as Hard (or Easy) as DH – Solving Dlogs via DH Oracles on EC Standards
Assume that we have a group G of known order q, in which we want to solve discrete logarithms (dlogs). In 1994, Maurer showed how to compute dlogs in G in poly time given a Diffie-Hellman (DH) oracle in G, and an auxiliary elliptic curve ˆÊ (Fq) of smooth order. The problem of Maurer’s reduction of solving dlogs via DH oracles is that no efficient algorithm for constructing such a smooth auxiliary curve is known. Thus, the implications of Maurer’s approach to real-world applications remained widely unclear.In this work, we explicitly construct smooth auxiliary curves for 13 commonly used, standardized elliptic curves of bit-sizes in the range [204, 256], including e.g., NIST P-256, Curve25519, SM2 and GOST R34.10. For all these curves we construct a corresponding cyclic auxiliary curve ˆÊ(Fq), whose order is 39-bit smooth, i.e., its largest factor is of bit-length at most 39 bits.This in turn allows us to compute for all divisors of the order of ˆÊ(Fq) exhaustively a codebook for all discrete logarithms. As a consequence, dlogs on ˆÊ(Fq) can efficiently be computed in a matter of seconds. Our resulting codebook sizes for each auxiliary curve are less than 29 TByte individually, and fit on our hard disk.We also construct auxiliary curves for NIST P-384 and NIST P-521 with a 65-bit and 110-bit smooth order.Further, we provide an efficient implementation of Maurer’s reduction from the dlog computation in G with order q to the dlog computation on its auxiliary curve ˆÊ (Fq). Let us provide a flavor of our results, e.g., when G is the NIST P-256 group, the results for other curves are similar. With the help of our codebook for the auxiliary curve Ê(Fq), and less than 24,000 calls to a DH oracle in G (that we simulate), we can solve discrete logarithms on NIST P-256 in around 30 secs.From a security perspective, our results show that for current elliptic curve standards< the difficulty of solving DH is practically tightly related to the difficulty of computing dlogs. Namely, unless dlogs are easy to compute on these curves G, we provide a very concrete security guarantee that DH in G must also be hard. From a cryptanalytic perspective, our results show a way to efficiently solve discrete logarithms in the presence of a DH oracle.
2023
TCHES
Don’t Forget Pairing-Friendly Curves with Odd Prime Embedding Degrees
Pairing-friendly curves with odd prime embedding degrees at the 128-bit security level, such as BW13-310 and BW19-286, sparked interest in the field of public-key cryptography as small sizes of the prime fields. However, compared to mainstream pairing-friendly curves at the same security level, i.e., BN446 and BLS12-446, the performance of pairing computations on BW13-310 and BW19-286 is usually considered inefficient. In this paper we investigate high performance software implementations of pairing computation on BW13-310 and corresponding building blocks used in pairing-based protocols, including hashing, group exponentiations and membership testings. Firstly, we propose efficient explicit formulas for pairing computation on this curve. Moreover, we also exploit the state-of-art techniques to implement hashing in G1 and G2, group exponentiations and membership testings. In particular, for exponentiations in G2 and GT , we present new optimizations to speed up computational efficiency. Our implementation results on a 64-bit processor show that the gap in the performance of pairing computation between BW13-310 and BN446 (resp. BLS12-446) is only up to 4.9% (resp. 26%). More importantly, compared to BN446 and BLS12-446, BW13-310 is about 109.1% − 227.3%, 100% − 192.6%, 24.5%−108.5% and 68.2%−145.5% faster in terms of hashing to G1, exponentiations in G1 and GT , and membership testing for GT , respectively. These results reveal that BW13-310 would be an interesting candidate in pairing-based cryptographic protocols.
2023
TCHES
Efficient Algorithms for Large Prime Characteristic Fields and Their Application to Bilinear Pairings
We propose a novel approach that generalizes interleaved modular multiplication algorithms for the computation of sums of products over large prime fields. This operation has widespread use and is at the core of many cryptographic applications. The method reformulates the widely used lazy reduction technique, crucially avoiding the need for storage and computation of “double-precision” operations. Moreover, it can be easily adapted to the different methods that exist to compute modular multiplication, producing algorithms that are significantly more efficient and memory-friendly. We showcase the performance of the proposed approach in the computation of multiplication over an extension field Fpk , and demonstrate its impact with record-breaking implementations of bilinear pairings. Specifically, we accomplish a full optimal ate pairing computation over the popular BLS12-381 curve, designed for the 128-bit security level, in under half a millisecond on a 3.2GHz Intel Coffee Lake processor, which is about 1.40× faster than the state-of-the-art. Similarly, we perform the same computation over the BLS24-509 curve, targeting the 192-bit security level, in ~ 2.6 milliseconds, achieving a speedup of more than 1.30x. We also report a significant impact on other applications, including protocols based on supersingular isogenies.
2023
TCHES
Efficient Persistent Fault Analysis with Small Number of Chosen Plaintexts
In 2018, Zhang et al. introduced the Persistent Fault Analysis (PFA) for the first time, which uses statistical features of ciphertexts caused by faulty Sbox to recover the key of block ciphers. However, for most of the variants of PFA, the prior knowledge of the fault (location and value) is required, where the corresponding analysis will get more difficult under the scenario of multiple faults. To bypass such perquisite and improve the analysis efficiency for multiple faults, we propose Chosen-Plaintext based Persistent Fault Analysis (CPPFA). CPPFA introduces chosen-plaintext to facilitate PFA and can reduce the key search space of AES-128 to extremely small. Our proposal requires 256 ciphertexts, while previous state-of-the-art work still requires 1509 and 1448 ciphertexts under 8 and 16 faults, respectively, at the only cost of requiring 256 chosen plaintexts. In particular, CPPFA can be applied to the multiple faults scenarios where all fault locations, values and quantity are unknown, and the worst time complexity of CPPFA is O(28+nf ) for AES-128, where nf represents the number of faults. The experimental results show that when nf > 4, 256 pairs of plaintext-ciphertext can recover the master key of AES-128. As for LED-64, only 16 pairs of plaintext-ciphertext reduce the remaining key search space to 210.
2023
TCHES
Efficient Private Circuits with Precomputation
At CHES 2022, Wang et al. described a new paradigm for masked implementations using private circuits, where most intermediates can be precomputed before the input shares are accessed, significantly accelerating the online execution of masked functions. However, the masking scheme they proposed mainly featured (and was designed for) the cost amortization, leaving its (limited) suitability in the above precomputation-based paradigm just as a bonus. This paper aims to provide an efficient, reliable, easy-to-use, and precomputation-compatible masking scheme. We propose a new masked multiplication over the finite field Fq suitable for the precomputation, and prove its security in the composable notion called Probing-Isolating Non-Inference (PINI). Particularly, the operations (e.g., AND and XOR) in the binary field can be achieved by assigning q = 2, allowing the bitsliced implementation that has been shown to be quite efficient for the software implementations. The new masking scheme is applied to leverage the masking of AES and SKINNY block ciphers on ARM Cortex M architecture. The performance results show that the new scheme contributes to a significant speed-up compared with the state-of-the-art implementations. For SKINNY with block size 64, the speed and RAM requirement can be significantly improved (saving around 45% cycles in the online-computation and 60% RAM space for precomputed values) from AES-128, thanks to its smaller number of AND gates. Besides the security proof by hand, we provide formal verifications for the multiplication and T-test evaluations for the masked implementations of AES and SKINNY. Because of the structure of the new masked multiplication, our formal verification can be performed for security orders up to 16.
2023
TCHES
Efficient Regression-Based Linear Discriminant Analysis for Side-Channel Security Evaluations: Towards Analytical Attacks against 32-bit Implementations
32-bit software implementations become increasingly popular for embedded security applications. As a result, profiling 32-bit target intermediate values becomes increasingly needed to evaluate their side-channel security. This implies the need of statistical tools that can deal with long traces and large number of classes. While there are good options to solve these issues separately (e.g., linear regression and linear discriminant analysis), the current state of the art lacks efficient tools to solve them jointly. To the best of our knowledge, the best-known option is to fragment the profiling in smaller parts, which is suboptimal from the information theoretic viewpoint. In this paper, we therefore revisit regression-based linear discriminant analysis, which combines linear regression and linear discriminant analysis, and improve its efficiency so that it can be used for profiling long traces corresponding to 32-bit implementations. Besides introducing the optimizations needed for this purpose, we show how to use regression-based linear discriminant analysis in order to obtain efficient bounds for the perceived information, an information theoretic metric characterizing the security of an implementation against profiled attacks. We also combine this tool with optimizations of soft analytical side-channel attack that apply to bitslice implementations. We use these results to attack a 32-bit implementation of SAP instantiated with Ascon’s permutation, and show that breaking the initialization of its re-keying in one trace is feasible for determined adversaries.
2023
TCHES
Enabling FrodoKEM on Embedded Devices
FrodoKEM is a lattice-based Key Encapsulation Mechanism (KEM) based on unstructured lattices. From a security point of view this makes it a conservative option to achieve post-quantum security, hence why it is favored by several European authorities (e.g., German BSI and French ANSSI). Relying on unstructured instead of structured lattices (e.g., CRYSTALS-Kyber) comes at the cost of additional memory usage, which is particularly critical for embedded security applications such as smart cards. For example, prior FrodoKEM-640 implementations (using AES) on Cortex-M4 require more than 80 kB of stack making it impossible to run on some embedded systems. In this work, we explore several stack reduction strategies and the resulting time versus memory trade-offs. Concretely, we reduce the stack consumption of FrodoKEM by a factor 2–3x compared to the smallest known implementations with almost no impact on performance. We also present various time-memory trade-offs going as low as 8 kB for all AES parameter sets, and below 4 kB for FrodoKEM-640. By introducing a minor tweak to the FrodoKEM specifications, we additionally reduce the stack consumption down to 8 kB for all the SHAKE versions. As a result, this work enables FrodoKEM on more resource constrained embedded systems.
2023
TCHES
Enhancing Quality and Security of the PLL-TRNG
Field Programmable Gate Arrays (FPGAs) are used more and more frequently to implement cryptographic systems, which need random number generators (RNGs) to be embedded in the same device. The main challenge related to the implementation of a generator running inside FPGAs is that the physical source of randomness, such as jittered clock generator, is implemented in the configurable logic area, i.e. in the close vicinity of noisy running algorithms, which can have significant impact on generated numbers or even serve to attack the generator. A possible approach to prevent such influence is the use of Phase-Lock Loops (PLLs), which are separated from the re-configurable logic area inside the FPGA chip. In this paper, we propose a new architecture of the PLL-based TRNG including a method to avoid correlation in the output through control of timing in the sampling process, as well as new embedded tests based on the enhanced stochastic model. We also propose a workflow to help find the best parameters, such as output bitrate and entropy rate. We show that bitrates of around 400 kb/s or more can be achieved, while guaranteeing min-entropy rates per bit higher than 0.98 as required by the latest security standards.
2023
TCHES
EstraNet: An Efficient Shift-Invariant Transformer Network for Side-Channel Analysis
Deep Learning (DL) based Side-Channel Analysis (SCA) has been extremely popular recently. DL-based SCA can easily break implementations protected by masking countermeasures. DL-based SCA has also been highly successful against implementations protected by various trace desynchronization-based countermeasures like random delay, clock jitter and shuffling. Over the years, many DL models have been explored to perform SCA. Recently, Transformer Network (TN) based model has also been introduced for SCA. Though the previously introduced TN-based model is successful against implementations jointly protected by masking and random delay countermeasures, it is not scalable to long traces (having a length greater than a few thousand) due to its quadratic time and memory complexity. This work proposes a novel shift-invariant TN-based model with linear time and memory complexity. he contributions of the work are two-fold. First, we introduce a novel TN-based model called EstraNet for SCA. EstraNet has linear time and memory complexity in trace length, significantly improving over the previously proposed TN-based model’s quadratic time and memory cost. EstraNet is also shift-invariant, making it highly effective against countermeasures like random delay and clock jitter. Secondly, we evaluated EstraNet on three SCA datasets of masked implementations with random delay and clock jitter effect. Our experimental results show that EstraNet significantly outperforms several benchmark models, demonstrating up to an order of magnitude reduction in the number of attack traces required to reach guessing entropy 1.
2023
TCHES
Exploiting Intermediate Value Leakage in Dilithium: A Template-Based Approach
This paper presents a new profiling side-channel attack on CRYSTALSDilithium, the new NIST primary standard for quantum-safe digital signatures. An open source implementation of CRYSTALS-Dilithium is already available, with constant-time property as a consideration for side-channel resilience. However, this implementation does not protect against attacks that exploit intermediate data leakage. We show how to exploit a new leakage on a vector generated during the signing process, for which the costly protection by masking is still a matter of debate. With a corpus of 700 000 messages, we design a template attack that enables us to efficiently predict whether a given coefficient in one coordinate of this vector is zero or not. By gathering signatures and being able to make the correct predictions for each index, and then using linear algebra methods, this paper demonstrates that one can recover part of the secret key that is sufficient to produce universal forgeries. While our paper deeply discusses the theoretical attack path, it also demonstrates the validity of the assumption regarding the required leakage model from practical experiments with the reference implementation on an ARM Cortex-M4. We need approximately a day to collect enough representatives and one more day to perform the traces acquisition on our target.
2023
TCHES
Fallen Sanctuary: A Higher-Order and Leakage-Resilient Rekeying Scheme
This paper presents a provably secure, higher-order, and leakage-resilient (LR) rekeying scheme named LR Rekeying with Random oracle Repetition (LR4), along with a quantitative security evaluation methodology. Many existing LR primitives are based on a concept of leveled implementation, which still essentially require a leak-free sanctuary (i.e., differential power analysis (DPA)-resistant component(s)) for some parts. In addition, although several LR pseudorandom functions (PRFs) based on only bounded DPA-resistant components have been developed, their validity and effectiveness for rekeying usage still need to be determined. In contrast, LR4 is formally proven under a leakage model that captures the practical goal of side-channel attack (SCA) protection (e.g., masking with a practical order) and assumes no unbounded DPA-resistant sanctuary. This proof suggests that LR4 resists exponential invocations (up to the birthday bound of key size) without using any unbounded leak-free component, which is the first of its kind. Moreover, we present a quantitative SCA success rate evaluation methodology for LR4 that combines the bounded leakage models for LR cryptography and a state-of-the-art information-theoretical SCA evaluation method. We validate its soundness and effectiveness as a DPA countermeasure through a numerical evaluation; that is, the number of secure calls of a symmetric primitive increases exponentially by increasing a security parameter under practical conditions.
2023
TCHES
Fast and Accurate: Efficient Full-Domain Functional Bootstrap and Digit Decomposition for Homomorphic Computation
The functional bootstrap in FHEW/TFHE allows for fast table lookups on ciphertexts and is a powerful tool for privacy-preserving computations. However, the functional bootstrap suffers from two limitations: the negacyclic constraint of the lookup table (LUT) and the limited ability to evaluate large-precision LUTs. To overcome the first limitation, several full-domain functional bootstraps (FDFB) have been developed, enabling the evaluation of arbitrary LUTs. Meanwhile, algorithms based on homomorphic digit decomposition have been proposed to address the second limitation. Although these algorithms provide effective solutions, they are yet to be optimized. This paper presents four new FDFB algorithms and two new homomorphic decomposition algorithms that improve the state-of-the-art. Our FDFB algorithms reduce the output noise, thus allowing for more efficient and compact parameter selection. Across all parameter settings, our algorithms reduce the runtime by up to 39.2%. Our homomorphic decomposition algorithms also run at 2.0x and 1.5x the speed of prior algorithms. We have implemented and benchmarked all previous FDFB and homomorphic decomposition algorithms and our methods in OpenFHE.
2023
TCHES
Fast and Clean: Auditable high-performance assembly via constraint solving
Handwritten assembly is a widely used tool in the development of highperformance cryptography: By providing full control over instruction selection, instruction scheduling, and register allocation, highest performance can be unlocked. On the flip side, developing handwritten assembly is not only time-consuming, but the artifacts produced also tend to be difficult to review and maintain – threatening their suitability for use in practice.In this work, we present SLOTHY (Super (Lazy) Optimization of Tricky Handwritten assemblY), a framework for the automated superoptimization of assembly with respect to instruction scheduling, register allocation, and loop optimization (software pipelining): With SLOTHY, the developer controls and focuses on algorithm and instruction selection, providing a readable “base” implementation in assembly, while SLOTHY automatically finds optimal and traceable instruction scheduling and register allocation strategies with respect to a model of the target (micro)architecture.We demonstrate the flexibility of SLOTHY by instantiating it with models of the Cortex-M55, Cortex-M85, Cortex-A55 and Cortex-A72microarchitectures, implementing the Armv8.1-M+Helium and AArch64+Neon architectures. We use the resulting tools to optimize three workloads: First, for Cortex-M55 and Cortex-M85, a radix-4 complex Fast Fourier Transform (FFT) in fixed-point and floating-point arithmetic, fundamental in Digital Signal Processing. Second, on Cortex-M55, Cortex-M85, Cortex-A55 and Cortex-A72, the instances of the Number Theoretic Transform (NTT) underlying CRYSTALS-Kyber and CRYSTALS-Dilithium, two recently announced winners of the NIST Post-Quantum Cryptography standardization project. Third, for Cortex-A55, the scalar multiplication for the elliptic curve key exchange X25519. The SLOTHY-optimized code matches or beats the performance of prior art in all cases, while maintaining compactness and readability.
2023
TCHES
Faster Bootstrapping via Modulus Raising and Composite NTT
FHEW-like schemes utilize exact gadget decomposition to reduce error growth and ensure that the bootstrapping incurs only polynomial error growth. However, the exact gadget decomposition method requires higher computation complexity and larger memory storage. In this paper, we improve the efficiency of the FHEWlike schemes by utilizing the composite NTT that performs the Number Theoretic Transform (NTT) with a composite modulus. Specifically, based on the composite NTT, we integrate modulus raising and gadget decomposition in the external product, which reduces the number of NTTs required in the blind rotation from 2(dg + 1)n to 2(⌈dg⌉/2 + 1)n. Furthermore, we develop a modulus packing technique that uses the Chinese Remainder Theorem (CRT) and composite NTT to bootstrap multiple LWE ciphertexts through one blind rotation process.We implement the bootstrapping algorithms and evaluate the performance on various benchmark computations using both binary and ternary secret keys. Our results of the single bootstrapping process indicate that the proposed approach achieves speedups of up to 1.7 x, and reduces the size of the blind rotation key by 50% under specific parameters. Finally, we instantiate two ciphertexts in the packing procedure, and the experimental results show that our technique is around 1.5 x faster than the two bootstrapping processes under the 127-bit security level.
2023
TCHES
Faster Montgomery multiplication and Multi-Scalar-Multiplication for SNARKs
The bottleneck in the proving algorithm of most of elliptic-curve-based SNARK proof systems is the Multi-Scalar-Multiplication (MSM) algorithm. In this paper we give an overview of a variant of the Pippenger MSM algorithm together with a set of optimizations tailored for curves that admit a twisted Edwards form. We prove that this is the case for SNARK-friendly chains and cycles of elliptic curves, which are useful for recursive constructions. Our contribution is twofold: first, we optimize the arithmetic of finite fields by improving on the well-known Coarsely Integrated Operand Scanning (CIOS) modular multiplication. This is a contribution of independent interest that applies to many different contexts. Second, we propose a new coordinate system for twisted Edwards curves tailored for the Pippenger MSM algorithm.Accelerating the MSM over these curves is critical for deployment of recursive proof< systems applications such as proof-carrying-data, blockchain rollups and blockchain light clients. We implement our work in Go and benchmark it on two different CPU architectures (x86 and arm64). We show that our implementation achieves a 40-47% speedup over the state-of-the-art implementation (which was implemented in Rust). This MSM implementation won the first place in the ZPrize competition in the open division “Accelerating MSM on Mobile” and will be deployed in two real-world applications: Linea zkEVM by ConsenSys and probably Celo network.
2023
TCHES
FaultMeter: Quantitative Fault Attack Assessment of Block Cipher Software
Fault attacks are a potent class of physical attacks that exploit a fault njected during device operation to steal secret keys from a cryptographic device. The success of a fault attack depends intricately on (a) the cryptographic properties of the cipher, (b) the program structure, and (c) the underlying hardware architecture. While there are several tools that automate the process of fault attack evaluation, none of them consider all three influencing aspects.This paper proposes a framework called FaultMeter that builds on the state-of-art by not just identifying fault vulnerable locations in a block cipher software, but also providing a quantification for each vulnerable location. The quantification provides a probability that an injected fault can be successfully exploited. It takes into consideration the cryptographic properties of the cipher, structure of the implementation, and the underlying Instruction Set Architecture’s (ISA) susceptibility to faults. We demonstrate an application of FaultMeter to automatically insert optimal amounts of countermeasures in a program to meet the user’s security requirements while minimizing overheads. We demonstrate the versatility of the FaultMeter framework by evaluating five cipher implementations on multiple hardware platforms, namely, ARM (32 and 64 bit), RISC-V (32 and 64 bit), TI MSP-430 (16-bit) and Intel x86 (64-bit).
2023
TCHES
Fiddling the Twiddle Constants - Fault Injection Analysis of the Number Theoretic Transform
In this work, we present the first fault injection analysis of the Number Theoretic Transform (NTT). The NTT is an integral computation unit, widely used for polynomial multiplication in several structured lattice-based key encapsulation mechanisms (KEMs) and digital signature schemes. We identify a critical single fault vulnerability in the NTT, which severely reduces the entropy of its output. This in turn enables us to perform a wide-range of attacks applicable to lattice-based KEMs as well as signature schemes. In particular, we demonstrate novel key recovery and message recovery attacks targeting the key generation and encryption procedure of Kyber KEM. We also propose novel existential forgery attacks targeting deterministic and probabilistic signing procedure of Dilithium, followed by a novel verification bypass attack targeting its verification procedure. All proposed exploits are demonstrated with high success rate using electromagnetic fault injection on optimized implementations of Kyber and Dilithium, from the open-source pqm4 library on the ARM Cortex-M4 microcontroller. We also demonstrate that our proposed attacks are capable of bypassing concrete countermeasures against existing fault attacks on lattice-based KEMs and signature schemes. We believe our work motivates the need for more research towards development of countermeasures for the NTT against fault injection attacks.
2023
TCHES
Formally verifying Kyber: Episode IV: Implementation correctness
In this paper we present the first formally verified implementations of Kyber and, to the best of our knowledge, the first such implementations of any post-quantum cryptosystem. We give a (readable) formal specification of Kyber in the EasyCrypt proof assistant, which is syntactically very close to the pseudocode description of the scheme as given in the most recent version of the NIST submission. We present high-assurance open-source implementations of Kyber written in the Jasmin language, along with machine-checked proofs that they are functionally correct with respect to the EasyCrypt specification. We describe a number of improvements to the EasyCrypt and Jasmin frameworks that were needed for this implementation and verification effort, and we present detailed benchmarks of our implementations, showing that our code achieves performance close to existing hand-optimized implementations in C and assembly.
2023
TCHES
From MLWE to RLWE: A Differential Fault Attack on Randomized & Deterministic Dilithium
The post-quantum digital signature scheme CRYSTALS-Dilithium has been recently selected by the NIST for standardization. Implementing CRYSTALSDilithium, and other post-quantum cryptography schemes, on embedded devices raises a new set of challenges, including ones related to performance in terms of speed and memory requirements, but also related to side-channel and fault injection attacks security. In this work, we investigated the latter and describe a differential fault attack on the randomized and deterministic versions of CRYSTALS-Dilithium. Notably, the attack requires a few instructions skips and is able to reduce the MLWE problem that Dilithium is based on to a smaller RLWE problem which can be practically solved with lattice reduction techniques. Accordingly, we demonstrated key recoveries using hints extracted on the secret keys from the same faulted signatures using the LWE with side-information framework introduced by Dachman-Soled et al. at CRYPTO’20. As a final contribution, we proposed algorithmic countermeasures against this attack and in particular showed that the second one can be parameterized to only induce a negligible overhead over the signature generation.
2023
TCHES
Gadget-based Masking of Streamlined NTRU Prime Decapsulation in Hardware
Streamlined NTRU Prime is a lattice-based Key Encapsulation Mechanism (KEM) that is, together with X25519, the default algorithm in OpenSSH 9. Based on lattice assumptions, it is assumed to be secure also against attackers with access to< large-scale quantum computers. While Post-Quantum Cryptography (PQC) schemes have been subject to extensive research in recent years, challenges remain with respect to protection mechanisms against attackers that have additional side-channel information, such as the power consumption of a device processing secret data. As a countermeasure to such attacks, masking has been shown to be a promising and effective approach. For public-key schemes, including any recent PQC schemes, usually, a mixture of Boolean and arithmetic techniques is applied on an algorithmic level. Our generic hardware implementation of Streamlined NTRU Prime decapsulation, however, follows an idea that until now was assumed to be solely applicable efficiently to symmetric cryptography: gadget-based masking. The hardware design is transformed into a secure implementation by replacing each gate with a composable secure gadget that operates on uniform random shares of secret values. In our work, we show the feasibility of applying this approach also to PQC schemes and present the first Public-Key Cryptography (PKC) – pre- and post-quantum – implementation masked with the gadget-based approach considering several trade-offs and design choices. By the nature of gadget-based masking, the implementation can be instantiated at arbitrary masking order. We synthesize our implementation both for Artix-7 Field-Programmable Gate Arrays (FPGAs) and 45nm Application-Specific Integrated Circuits (ASICs), yielding practically feasible results regarding the area, randomness requirement, and latency. We verify the side-channel security of our implementation using formal verification on the one hand, and practically using Test Vector Leakage Assessment (TVLA) on the other. Finally, we also analyze the applicability of our concept to Kyber and Dilithium, which will be standardized by the National Institute of Standards and Technology (NIST).
2023
TCHES
Garbled Circuits from an SCA Perspective: Free XOR can be Quite Expensive. . .
Garbling schemes, invented in the 80’s by Yao (FOCS’86), have been a versatile and fundamental tool in modern cryptography. A prominent application of garbled circuits is constant round secure two-party computation, which led to a long line of study of this object, where one of the most influential optimizations is Free-XOR (Kolesnikov and Schneider ICALP’08), introducing a global offset Δ for all garbled wire values where XOR gates are computed locally without garbling them. To date, garbling schemes were not studied per their side-channel attacks (SCA) security characteristics, even though SCA pose a significant security threat to cryptographic devices. In this research we, demonstrate that adversaries utilizing advanced SCA tools such as horizontal attacks, mixed with advanced hypothesis building and standard (vertical) SCA tools, can jeopardize garbling implementations.Our main observation is that garbling schemes utilizing a global secret Δ open a door to quite trivial side-channel attacks. We model our side-channel attacks on the garbler’s device and discuss the asymmetric setting where various computations are not performed on the evaluator side. This enables dangerous leakage extraction on the garbler and renders our attack impossible on the evaluator’s side.Theoretically, we first demonstrate on a simulated environment, that such attacks are quite devastating. Concretely, our attack is capable of extracting Δ when the circuit embeds only 8 input non-linear gates with fifth/first-order attack Success-Rates of 0.65/0.7. With as little as 3 such gates, our attack reduces the first-order Guessing Entropy of Δ from 128 to ∼ 48-bits. We further demonstrate our attack via an implementation and power measurements data over an STM 32-bit processor software implementing circuit garbling, and discuss their limitations and mitigation tactics on logical, protocol and implementation layers.
2023
TCHES
High-assurance zeroization
In this paper we revisit the problem of erasing sensitive data from memory and registers during return from a cryptographic routine. While the problem and related attacker model is fairly easy to phrase, it turns out to be surprisingly hard to guarantee security in this model when implementing cryptography in common languages such as C/C++ or Rust. We revisit the issues surrounding zeroization and then present a principled solution in the sense that it guarantees that sensitive data is erased and it clearly defines when this happens. We implement our solution as extension to the formally verified Jasmin compiler and extend the correctness proof of the compiler to cover zeroization. We show that the approach seamlessly integrates with state-of-the-art protections against microarchitectural attacks by integrating zeroization into Libjade, a cryptographic library written in Jasmin with systematic protections against timing and Spectre-v1 attacks. We present benchmarks showing that in many cases the overhead of zeroization is barely measurable and that it stays below 2% except for highly optimized symmetric crypto routines on short inputs.
2023
TCHES
High-order masking of NTRU
The main protection against side-channel attacks consists in computing every function with multiple shares via the masking countermeasure. While the masking countermeasure was originally developed for securing block-ciphers such as AES, the protection of lattice-based cryptosystems is often more challenging, because of the diversity of the underlying algorithms. In this paper, we introduce new gadgets for the high-order masking of the NTRU cryptosystem, with security proofs in the classical ISW probing model. We then describe the first fully masked implementation of the NTRU Key Encapsulation Mechanism submitted to NIST, including the key generation. To assess the practicality of our countermeasures, we provide a concrete implementation on ARM Cortex-M3 architecture, and eventually a t-test leakage evaluation.
2023
TCHES
How Secure is Exponent-blinded RSA–CRT with Sliding Window Exponentiation?
This paper presents the first security evaluation of exponent-blinded RSA–CRT implementation with sliding window exponentiation against cache attacks. Our main contributions are threefold. (1) We demonstrate an improved cache attack using Flush+Reload on RSA–CRT to estimate the squaring–multiplication operational sequence. The proposed method can estimate a correct squaring–multiplication sequence from one Flush+Reload trace, while the existing Flush+Reload attacks always contain errors in the sequence estimation. This is mandatory for the subsequent steps in the proposed attack. (2) We present a new and first partial key exposure attack on exponent-blinded RSA–CRT with a random-bit leak. The proposed attack first estimates a random mask for blinding exponent using a modification of the Schindler–Wiemers continued fraction attack, and then recovers the secret key using an extension of the Heninger–Shacham branch-and-prune attack. We experimentally show that the proposed attack on RSA–CRT using a practical window size of 5 with 16-, 32-, and 64-bit masks is carried out with complexity of 225.6, 267.7, and 2161, respectively. (3) We then investigate the tradeoffs between mask bit length and implementation performance. The computational cost of exponent-blinded RSA–CRT using a sliding window with a 32- and 64-bit mask are 15% and 10% faster than that with a 128-bit mask, respectively, as we confirmed that 32- and 64-bit masks are sufficient to defeat the proposed attack. Our source code used in the experiment is publicly available.
2023
TCHES
Improved Attacks on (EC)DSA with Nonce Leakage by Lattice Sieving with Predicate
Lattice reduction algorithms have been proved to be one of the most powerful and versatile tools in public key cryptanalysis. In this work, we primarily concentrate on lattice attacks against (EC)DSA with nonce leakage via some sidechannel analysis. Previous works relying on lattice reduction algorithms such as LLL and BKZ will finally lead to the “lattice barrier”: lattice algorithms become infeasible when only fewer nonce is known. Recently, Albrecht and Heninger introduced lattice algorithms augmented with a predicate and broke the lattice barrier (Eurocrypt 2021). We improve their work in several aspects.We first propose a more efficient predicate algorithm which aims to search for the target lattice vector in a large database. Then, we combine sieving with predicate algorithm with the “dimensions for free” and “progressive sieving” techniques to further improve the performance of our attacks. Furthermore, we give a theoretic analysis on how to choose the optimal Kannan embedding factor.As a result, our algorithm outperforms the state-of-the-art lattice attacks for existing records such as 3-bit nonce leakage for a 256-bit curve and 2-bit nonce leakage for a 160-bit curve in terms of running time, sample numbers and success probability. We also break the lattice records on the 384-bit curve with 3-bit nonce leakage and the 256-bit curve with 2-bit nonce leakage which are thought infeasible previously. Finally, we give the first lattice attack against ECDSA with a single-bit nonce leakage, which enables us to break a 112-bit curve with 1-bit nonce leakage in practical time.
2023
TCHES
Improved Gadgets for the High-Order Masking of Dilithium
We present novel and improved high-order masking gadgets for Dilithium, a post-quantum signature scheme that has been standardized by the National Institute of Standards and Technologies (NIST). Our proposed gadgets include the ShiftMod gadget, which is used for efficient arithmetic shifts and serves as a component in other masking gadgets. Additionally, we propose a new algorithm for Boolean-to-arithmetic masking conversion of a μ-bit integer x modulo any integer q, with a complexity that is independent of both μ and q. This algorithm is used in Dilithium to mask the generation of the random variable y modulo q. Moreover, we describe improved techniques for masking the Decompose function in Dilithium. Our new gadgets are proven to be secure in the t-probing model.We demonstrate the effectiveness of our countermeasures by presenting a complete high-order masked implementation of Dilithium that utilizes the improved gadgets described above. We provide practical results obtained from a C implementation and compare the performance improvements provided by our new gadgets with those of previous work.
2023
TCHES
Information Bounds and Convergence Rates for Side-Channel Security Evaluators
Current side-channel evaluation methodologies exhibit a gap between inefficient tools offering strong theoretical guarantees and efficient tools only offering heuristic (sometimes case-specific) guarantees. Profiled attacks based on the empirical leakage distribution correspond to the first category. Bronchain et al. showed at Crypto 2019 that they allow bounding the worst-case security level of an implementation, but the bounds become loose as the leakage dimensionality increases. Template attacks and machine learning models are examples of the second category. In view of the increasing popularity of such parametric tools in the literature, a natural question is whether the information they can extract can be bounded.In this paper, we first show that a metric conjectured to be useful for this purpose, the hypothetical information, does not offer such a general bound. It only does when the assumptions exploited by a parametric model match the true leakage distribution. We therefore introduce a new metric, the training information, that provides the guarantees that were conjectured for the hypothetical information for practically-relevant models. We next initiate a study of the convergence rates of profiled side-channel distinguishers which clarifies, to the best of our knowledge for the first time, the parameters that influence the complexity of a profiling. On the one hand, the latter has practical consequences for evaluators as it can guide them in choosing the appropriate modeling tool depending on the implementation (e.g., protected or not) and contexts (e.g., granting them access to the countermeasures’ randomness or not). It also allows anticipating the amount of measurements needed to guarantee a sufficient model quality. On the other hand, our results connect and exhibit differences between side-channel analysis and statistical learning theory.
2023
TCHES
JitSCA: Jitter-based Side-Channel Analysis in Picoscale Resolution
In safety and security conscious environments, isolated communication channels are often deemed necessary. Galvanically isolated communication channels are typically expected not to allow physical side-channel attacks through that channel. However, in this paper, we show that they can inadvertently leak side channel information in the form of minuscule jitter on the communication signal. We observe worst-case signal jitter within 54 ± 45 ps using an FPGA-based receiver employing a time-to-digital converter (TDC), which is a higher time resolution than a typical oscilloscope can measure, while in many other systems such measurements are also possible. A transmitter device runs a cryptographic accelerator, while we connect an FPGA on the receiver side and measure the signal jitter using a TDC. We can indeed show sufficient side-channel leakage in the jitter of the signal by performing a key recovery of an AES accelerator running on the transmitter. Furthermore, we compare this leakage to a power side channel also measured with a TDC and prove that the timing jitter alone contains sufficient side-channel information. While for an on-chip power analysis attack about 27k traces are needed for key recovery, our cross-device jitter-based attack only needs as few as 47k traces, depending on the setup. Galvanic isolation does not change that significantly. That is an increase by only 1.7x, showing that fine-grained jitter timing information can be a very potent attack vector even under galvanic isolation. In summary, we introduce a new side-channel attack vector that can leak information in many presumably secure systems. Communication channels can inadvertently leak information through tiny timing variations, known as signal jitter. This could affect millions of devices and needs to be considered.
2023
TCHES
Kavach: Lightweight masking techniques for polynomial arithmetic in lattice-based cryptography
Lattice-based cryptography has laid the foundation of various modern-day cryptosystems that cater to several applications, including post-quantum cryptography. For structured lattice-based schemes, polynomial arithmetic is a fundamental part. In several instances, the performance optimizations come from implementing compact multipliers due to the small range of the secret polynomial coefficients. However, this optimization does not easily translate to side-channel protected implementations since masking requires secret polynomial coefficients to be distributed over a large range. In this work, we address this problem and propose two novel generalized techniques, one for the number theoretic transform (NTT) based and another for the non-NTT-based polynomial arithmetic. Both these proposals enable masked polynomial multiplication while utilizing and retaining the small secret property.For demonstration, we used the proposed technique and instantiated masked multipliers for schoolbook as well as NTT-based polynomial multiplication. Both of these can utilize the compact multipliers used in the unmasked implementations. The schoolbook multiplication requires an extra polynomial accumulation along with the two polynomial multiplications for a first-order protected implementation. However, this cost is nothing compared to the area saved by utilizing the existing cheap multiplication units. We also extensively test the side-channel resistance of the proposed design through TVLA to guarantee its first-order security.
2023
TCHES
Loop Aborts Strike Back: Defeating Fault Countermeasures in Lattice Signatures with ILP
At SAC 2016, Espitau et al. presented a loop-abort fault attack against lattice-based signature schemes following the Fiat–Shamir with aborts paradigm. Their attack recovered the signing key by injecting faults in the sampling of the commitment vector (also called masking vector) y, leaving its coefficients at their initial zero value. As possible countermeasures, they proposed to carry out the sampling of the coefficients of y in shuffled order, or to ensure that the masking polynomials in y are not of low degree. In this paper, we show that both of these countermeasures are insufficient. We demonstrate a new loop-abort fault injection attack against Fiat–Shamir with aborts lattice-based signatures that can recover the secret key from faulty signatures even when the proposed countermeasures are implemented. The key idea of our attack is that faulted signatures give rise to a noisy linear system of equations, which can be solved using integer linear programming. We present an integer linear program that recovers the secret key efficiently in practice, and validate the efficacy of our attack by conducting a practical end-to-end attack against a shuffled version of the Dilithium reference implementation, mounted on an ARM Cortex M4. We achieve a full (equivalent) key recovery in under 3 minutes total execution time (including signature generation), using only 5 faulted signatures. In addition, we conduct extensive theoretical simulations of the attack against Dilithium. We find that our method can achieve key recovery in under 5 minutes given a (sufficiently large) set of signatures where just one of the coefficients of y is zeroed out (or left at its initial value of zero). Furthermore, we find that our attack works against all security levels of Dilithium. Our attack shows that protecting Fiat–Shamir with aborts lattice-based signatures against fault injection attacks cannot be achieved using the simple countermeasures proposed by Espitau et al. and likely requires significantly more expensive countermeasures.
2023
TCHES
Low Cost and Precise Jitter Measurement Method for TRNG Entropy Assessment
Random number generators and specifically true random number generators (TRNGs) are essential in cryptography. TRNGs implemented in logic devices usually exploit the time instability of clock signals generated in freely running oscillators as source of randomness. To assess the performance and quality of oscillator-based TRNGs, accurate measurement of clock jitter originating from thermal noise is of paramount importance. We propose a novel jitter measurement method, in which the required jitter accumulation time can be reduced to around 100 reference clock periods. Reduction of the jitter accumulation time reduces the impact of the flicker noise on the measured jitter and increases the precision of the estimated contribution of thermal noise. In addition, the method can be easily embedded in logic devices. The fact that the jitter measurement can be placed in the same device as the TRNG is important since it can be used as a basis for efficient embedded statistical tests. In contrast to other methods, we propose a thorough theoretical analysis of the measurement error. This makes it possible to tune the parameters of the method to guarantee a relative error smaller than 12% even in the worst cases.
2023
TCHES
Low Trace-Count Template Attacks on 32-bit Implementations of ASCON AEAD
The recently adopted Ascon standard by NIST offers a lightweight authenticated encryption algorithm for use in resource-constrained cryptographic devices. To help assess side-channel attack risks of Ascon implementations, we present the first template attack based on analyzing power traces, recorded from an STM32F303 microcontroller board running Weatherley’s 32-bit implementations of Ascon-128. Our analysis combines a fragment template attack with belief-propagation and key-enumeration techniques. The main results are three-fold: (1) we reached 100% success rate from a single trace if the C compiler optimized the unmasked implementation for space, (2) the success rate was about 95% after three traces if the compiler optimized instead for time, and (3) we also attacked a masked version, where the success rate was over 90% with 20 traces of executions with the same key, all after enumerating up to 224 key candidates. These results show that suitably-designed template attacks can pose a real threat to Ascon implementations, even if protected by first-order masking, but we also learnt how some differences in programming style, and even compiler optimization settings, can significantly affect the result.
2023
TCHES
LPN-based Attacks in the White-box Setting
In white-box cryptography, early protection techniques have fallen to the automated Differential Computation Analysis attack (DCA), leading to new countermeasures and attacks. A standard side-channel countermeasure, Ishai-Sahai-Wagner’s masking scheme (ISW, CRYPTO 2003) prevents Differential Computation Analysis but was shown to be vulnerable in the white-box context to the Linear Decoding Analysis attack (LDA). However, recent quadratic and cubic masking schemes by Biryukov-Udovenko (ASIACRYPT 2018) and Seker-Eisenbarth-Liskiewicz (CHES 2021) prevent LDA and force to use its higher-degree generalizations with much higher complexity.In this work, we study the relationship between the security of these and related schemes to the Learning Parity with Noise (LPN) problem and propose a new automated attack by applying an LPN-solving algorithm to white-box implementations. The attack effectively exploits strong linear approximations of the masking scheme and thus can be seen as a combination of the DCA and LDA techniques. Different from previous attacks, the complexity of this algorithm depends on the approximation error, henceforth allowing new practical attacks on masking schemes which previously resisted automated analysis. We demonstrate it theoretically and experimentally, exposing multiple cases where the LPN-based method significantly outperforms LDA and DCA methods, including their higher-order variants.This work applies the LPN problem beyond its usual post-quantum cryptography boundary, strengthening its interest for the cryptographic community, while expanding the range of automated attacks by presenting a new direction for breaking masking schemes in the white-box model.
2023
TCHES
MMM: Authenticated Encryption with Minimum Secret State for Masking
We propose a new authenticated encryption (AE) mode MMM that achieves the minimum memory size with masking. Minimizing the secret state is the crucial challenge in the low-memory AE suitable for masking. Here, the minimum secret state is s + b bits, composed of s bits for a secret key and b bits for a plaintext block. HOMA appeared in CRYPTO 2022 achieved this goal with b = 64, but choosing a smaller b was difficult because b = s/2 is bound to the block size of the underlying primitive, meaning that a block cipher with an unrealistically small block size (e.g., 8 bits) is necessary for further improvement. MMM addresses the issue by making b independent of the underlying primitive while achieving the minimum (s + b)-bit secret state. Moreover, MMM provides additional advantages over HOMA, including (i) a better rate, (ii) the security under the multi-user model, (iii) and a smaller transmission cost. We instantiate two variants, MMM-8 (with b = 8) and MMM-64 (with b = 64), using the standard tweakable block cipher SKINNY-64/192. With a (d + 1)-masking scheme, MMM-8 (resp. MMM-64) is smaller by 56d + 184 (resp. 128) bits compared with HOMA. As a result of hardware performance evaluation, MMM-8 and MMM-64 achieved smaller circuit areas than HOMA with all the examined protection orders d ∈ [0, 5]. MMM-8’s circuit area is only 81% of HOMA with d = 5, and MMM-64 achieves more than x3 speed-up with a smaller circuit area.
2023
TCHES
ModHE: Modular Homomorphic Encryption Using Module Lattices: Potentials and Limitations
The promising field of homomorphic encryption enables functions to be evaluated on encrypted data and produce results for the same computations done on plaintexts. It, therefore, comes as no surprise that many ventures at constructing homomorphic encryption schemes have come into the limelight in recent years. Most popular are those that rely on the hard lattice problem, called the Ring Learning with Errors problem (RLWE). One major limitation of these homomorphic encryption schemes is that in order to securely increase the maximum multiplicative depth, they need to increase the polynomial-size (degree of the polynomial ring) thereby also ncreasing the complexity of the design. We aim to bridge this gap by proposing a homomorphic encryption (HE) scheme based on the Module Learning with Errors problem (MLWE), ModHE that allows us to break the big computations into smaller ones. Given the popularity of module lattice-based post-quantum schemes, it is an evidently interesting research endeavor to also formulate module lattice-based homomorphic encryption schemes. While our proposed scheme is general, as a case study, we port the well-known RLWE-based CKKS scheme to the MLWE setting. The module version of the scheme completely stops the polynomial-size blowups when aiming for a greater circuit depth. Additionally, it presents greater opportunities for designing flexible, reusable, and parallelizable hardware architecture. A hardware implementation is provided to support our claims. We also acknowledge that as we try to decrease the complexity of computations, the amount of computations (such as relinearizations) increases. We hope that the potential and limitations of using such a hardware-friendly scheme will spark further research.
2023
TCHES
Multiple-Valued Plaintext-Checking Side-Channel Attacks on Post-Quantum KEMs
In this paper, we present a side-channel analysis (SCA) on key encapsulation mechanisms (KEMs) based on the Fujisaki–Okamoto (FO) transformation and its variants. Many post-quantum KEMs usually perform re-encryption during key decapsulation to achieve chosen-ciphertext attack (CCA) security. The side-channel leakage of re-encryption can be exploited to mount a key-recovery plaintext-checking attack (KR-PCA), even if the chosen-plaintext attack (CCA) secure decryption constructing the KEM is securely implemented. Herein, we propose an efficient side-channel-assisted KR-PCA on post-quantum KEMs, and achieve a key recovery with significantly fewer attack traces than existing ones in TCHES 2022 and 2023. The basic concept of the proposed attack is to introduce a new KR-PCA based on a multiple-valued (MV-)PC oracle and then implement a dedicated MV-PC oracle based on a multi-classification neural network (NN). The proposed attack is applicable to the NIST PQC selected algorithm Kyber and the similar lattice-based Saber, FrodoKEM and NTRU Prime, as well as SIKE. We also present how to realize a sufficiently reliable MV-PC oracle from NN model outputs that are not 100% accurate, and analyze the tradeoff between the key recovery success rate and the number of attack traces. We assess the feasibility of the proposed attack through attack experiments on three typical symmetric primitives to instantiate a random oracle (SHAKE, SHA3, and AES software). The proposed attack reduces the number of attack traces required for a reliable key recovery by up to 87% compared to the existing attacks against Kyber and other lattice-based KEMs, under the condition of 99.9999% success rate for key recovery. The proposed attack can also reduce the number of attack traces by 85% for SIKE.
2023
TCHES
Oil and Vinegar: Modern Parameters and Implementations
Two multivariate digital signature schemes, Rainbow and GeMSS, made it into the third round of the NIST PQC competition. However, neither made its way to being a standard due to devastating attacks (in one case by Beullens, the other by Tao, Petzoldt, and Ding). How should multivariate cryptography recover from this blow? We propose that, rather than trying to fix Rainbow and HFEv- by introducing countermeasures, the better approach is to return to the classical Oil and Vinegar scheme. We show that, if parametrized appropriately, Oil and Vinegar still provides competitive performance compared to the new NIST standards by most measures (except for key size). At NIST security level 1, this results in either 128-byte signatures with 44 kB public keys or 96-byte signatures with 67 kB public keys. We revamp the state-of-the-art of Oil and Vinegar implementations for the Intel/AMD AVX2, the Arm Cortex-M4 microprocessor, the Xilinx Artix-7 FPGA, and the Armv8-A microarchitecture with the Neon vector instructions set.
2023
TCHES
On Protecting SPHINCS+ Against Fault Attacks
SPHINCS+ is a hash-based digital signature scheme that was selected by NIST in their post-quantum cryptography standardization process. The establishment of a universal forgery on the seminal scheme SPHINCS was shown to be feasible in practice by injecting a fault when the signing device constructs any non-top subtree. Ever since the attack has been made public, little effort was spent to protect the SPHINCS family against attacks by faults. This paper works in this direction in the context of SPHINCS+ and analyzes the current algorithms that aim to prevent fault-based forgeries.First, the paper adapts the original attack to SPHINCS+ reinforced with randomized signing and extends the applicability of the attack to any combination of faulty and valid signatures. Considering the adaptation, the paper then presents a thorough analysis of the attack. In particular, the analysis shows that, with high probability, the security guarantees of SPHINCS+ significantly drop when a single random bit flip occurs anywhere in the signing procedure and that the resulting faulty signature cannot be detected with the verification procedure. The paper shows both in theory and experimentally that the countermeasures based on caching the intermediate W-OTS+s offer a marginally greater protection against unintentional faults, and that such countermeasures are circumvented with a tolerable number of queries in an active attack. Based on these results, the paper recommends real-world deployments of SPHINCS+ to implement redundancy checks.
2023
TCHES
On Provable White-Box Security in the Strong Incompressibility Model
Incompressibility is a popular security notion for white-box cryptography and captures that a large encryption program cannot be compressed without losing functionality. Fouque, Karpman, Kirchner and Minaud (FKKM) defined strong incompressibility, where a compressed program should not even help to distinguish encryptions of two messages of equal length. Equivalently, the notion can be phrased as indistinguishability under chosen-plaintext attacks and key-leakage (LK-IND-CPA), where the leakage rate is high.In this paper, we show that LK-IND-CPA security with superlogarithmic-length leakage, and thus strong incompressibility, cannot be proven under standard (i.e. single-stage) assumptions, if the encryption scheme is key-fixing, i.e. a polynomial number of message-ciphertext pairs uniquely determine the key with high probability. Our impossibility result refutes a claim by FKKM that their big-key generation mechanism achieves strong incompressibility when combined with any PRG or any conventional encryption scheme, since the claim is not true for encryption schemes which are key-fixing (or for PRGs which are injective). In particular, we prove that the cipher block chaining (CBC) block cipher mode is key-fixing when modelling the cipher as a truly random permutation for each key. Subsequent to and inspired by our work, FKKM prove that their original big-key generation mechanism can be combined with a random oracle into an LK-IND-CPA-secure encryption scheme, circumventing the impossibility result by the use of an idealised model.Along the way, our work also helps clarifying the relations between incompressible white-box cryptography, big-key symmetric encryption, and general leakage resilient cryptography, and their limitations.
2023
TCHES
Pasta: A Case for Hybrid Homomorphic Encryption
The idea of hybrid homomorphic encryption (HHE) is to drastically reduce bandwidth requirements when using homomorphic encryption (HE) at the cost of more expensive computations in the encrypted domain. To this end, various dedicated schemes for symmetric encryption have already been proposed. However, it is still unclear if those ideas are already practically useful, because (1) no cost-benefit analysis was done for use cases and (2) very few implementations are publicly available. We address this situation in several ways. We build an open-source benchmarking r framework, we explore properties of the respective HHE proposals. It turns out that even medium-sized use cases are infeasible, especially when involving integer arithmetic. Next, we propose Pasta, a cipher thoroughly optimized for integer HHE use cases. Pasta is designed to minimize the multiplicative depth, while also leveraging the structure of two state-of-the-art integer HE schemes (BFV and BGV) to minimize the homomorphic evaluation latency. Using our new benchmarking environment, we extensively evaluate Pasta in SEAL and HElib and compare its properties to 8 existing ciphers in two use cases. Our evaluations show that Pasta outperforms its competitors for HHE both in terms of homomorphic evaluation time and noise consumption, showing its efficiency for applications in real-world HE use cases. Concretely, Pasta outperforms Agrasta by a factor of up to 82, Masta by a factor of up to 6 and Hera up to a factor of 11 when applied to the two use cases.
2023
TCHES
Peek into the Black-Box: Interpretable Neural Network using SAT Equations in Side-Channel Analysis
Deep neural networks (DNN) have become a significant threat to the security of cryptographic implementations with regards to side-channel analysis (SCA), as they automatically combine the leakages without any preprocessing needed, leading to a more efficient attack. However, these DNNs for SCA remain mostly black-box algorithms that are very difficult to interpret. Benamira et al. recently proposed an interpretable neural network called Truth Table Deep Convolutional Neural Network (TT-DCNN), which is both expressive and easier to interpret. In particular, a TT-DCNN has a transparent inner structure that can entirely be transformed into SAT equations after training. In this work, we analyze the SAT equations extracted from a TT-DCNN when applied in SCA context, eventually obtaining the rules and decisions that the neural networks learned when retrieving the secret key from the cryptographic primitive (i.e., exact formula). As a result, we can pinpoint the critical rules that the neural network uses to locate the exact Points of Interest (PoIs). We validate our approach first on simulated traces for higher-order masking. However, applying TT-DCNN on real traces is not straightforward. We propose a method to adapt TT-DCNN for application on real SCA traces containing thousands of sample points. Experimental validation is performed on software-based ASCADv1 and hardware-based AES_HD_ext datasets. In addition, TT-DCNN is shown to be able to learn the exact countermeasure in a best-case setting.
2023
TCHES
Pincering SKINNY by Exploiting Slow Diffusion: Enhancing Differential Power Analysis with Cluster Graph Inference
Lightweight cryptography is an emerging field where designers are testing the limits of symmetric cryptography. We investigate the resistance against sidechannel attacks of a new class of lighter blockciphers, which use a classic substitution–permutation network with slow diffusion and many rounds.Among these ciphers, we focus on SKINNY, a primitive used up to the final round ofNIST’s recent lightweight standardisation effort. We show that the lack of diffusion in the key scheduler allows an attacker to combine leakage from the first and the last rounds, effectively pincering its target. Furthermore, the slow diffusion used by its partial key-absorption and linear layers enable, on both sides, to target S-Boxes from several rounds deep.As some of these S-boxes leak on the same part of the key, full key recovery exploiting all leakage requires a clever combining strategy. We introduce the use of cluster graph inference (an established tool from probabilistic graphical model theory) to enhance both unprofiled or profiled differential power analysis, enabling us to handlethe increase of S-Boxes with their intertwined leakage.We evaluate the strength of our attack both in the Hamming weight model and against two implementations running on an STM32F303 ARM Cortex-M4 hosted on a ChipWhisperer target board, showing that our attack reduces the number of traces required to attack SKINNY by a factor of around 2.75.
2023
TCHES
PMFault: Faulting and Bricking Server CPUs through Management Interfaces: Or: A Modern Example of Halt and Catch Fire
Apart from the actual CPU, modern server motherboards contain other auxiliary components, for example voltage regulators for power management. Those are connected to the CPU and the separate Baseboard Management Controller (BMC) via the I2C-based PMBus. In this paper, using the case study of the widely used Supermicro X11SSL motherboard, we show how remotely exploitable software weaknesses in the BMC (or other processors with PMBus access) can be used to access the PMBus and then perform hardware-based fault injection attacks on the main CPU. The underlying weaknesses include insecure firmware encryption and signing mechanisms, a lack of authentication for the firmware upgrade process and the IPMI KCS control interface, as well as the motherboard design (with the PMBus connected to the BMC and SMBus by default). First, we show that undervolting through the PMBus allows breaking the integrity guarantees of SGX enclaves, bypassing Intel’s countermeasures against previous undervolting attacks like Plundervolt/V0ltPwn. Second, we experimentally show that overvolting outside the specified range has the potential of permanently damaging Intel Xeon CPUs, rendering the server inoperable. We assess the impact of our findings on other server motherboards made by Supermicro and ASRock. Our attacks, dubbed PMFault, can be carried out by a privileged software adversary and do not require physical access to the server motherboard or knowledge of the BMC login credentials. We responsibly disclosed the issues reported in this paper to Supermicro and discuss possible countermeasures at different levels. To the best of our knowledge, the 12th generation of Supermicro motherboards, which was designed before we reported PMFault to Supermicro, is not vulnerable.
2023
TCHES
Prime-Field Masking in Hardware and its Soundness against Low-Noise SCA Attacks
A recent study suggests that arithmetic masking in prime fields leads to stronger security guarantees against passive physical adversaries than Boolean masking. Indeed, it is a common observation that the desired security amplification of Boolean masking collapses when the noise level in the measurements is too low. Arithmetic encodings in prime fields can help to maintain an exponential increase of the attack complexity in the number of shares even in such a challenging context. In this work, we contribute to this emerging topic in two main directions. First, we propose novel masked hardware gadgets for secure squaring in prime fields (since squaring is non-linear in non-binary fields) which prove to be significantly more resource-friendly than corresponding masked multiplications. We then formally show their local and compositional security for arbitrary orders. Second, we attempt to >experimentally evaluate the performance vs. security tradeoff of prime-field masking. In order to enable a first comparative case study in this regard, we exemplarily consider masked implementations of the AES as well as the recently proposed AESprime. AES-prime is a block cipher partially resembling the standard AES, but based on arithmetic operations modulo a small Mersenne prime. We present cost and performance figures for masked AES and AES-prime implementations, and experimentally evaluate their susceptibility to low-noise side-channel attacks. We consider both the dynamic and the static power consumption for our low-noise analyses and emulate strong adversaries. Static power attacks are indeed known as a threat for side-channel countermeasures that require a certain noise level to be effective because of the adversary’s ability to reduce the noise through intra-trace averaging. Our results show consistently that for the noise levels in our practical experiments, the masked prime-field implementations provide much higher security for the same number of shares. This compensates for the overheads prime computations lead to and remains true even if / despite leaking each share with a similar Signal-to-Noise Ratio (SNR) as their binary equivalents. We hope our results open the way towards new cipher designs tailored to best exploit the advantages of prime-field masking.
2023
TCHES
PROLEAD_SW: Probing-Based Software Leakage Detection for ARM Binaries
A decisive contribution to the all-embracing protection of cryptographic software, especially on embedded devices, is the protection against Side-Channel Analysis (SCA) attacks. Masking countermeasures can usually be integrated into the software during the design phase. In theory, this should provide reliable protection against such physical attacks. However, the correct application of masking is a non-trivial task that often causes even experts to make mistakes. In addition to human-caused errors, micro-architectural Central Processing Unit (CPU) effects can lead even a seemingly theoretically correct implementation to fail to satisfy the desired level of security in practice. This originates from different components of< the underlying CPU which complicates the tracing of leakage back to a particular source and hence avoids making general and device-independent statements about its security.PROLEAD has recently been presented at CHES 2022 and has originally been developed as a simulation-based tool to evaluate masked hardware designs. In this work, we adapt PROLEAD for the evaluation of masked software, and enable the transfer of the already known benefits of PROLEAD into the software world. These include (1) evaluation of larger designs compared to the state of the art, e.g. a full Advanced Encryption Standard (AES) masked implementation, and (2) formal verification under our new generic leakage model for CPUs. Concretely, we formalize leakages, observed across different CPU architectures, into a generic abstraction model that includes all these leakages and is therefore independent of a specific CPU design. Our resulting tool PROLEAD_SW allows to provide a formal statement on the security based on the derived generic model. As a concrete result, using PROLEAD_SW we evaluated the security of several publicly available masked software implementations in our new generic leakage model and reveal multiple vulnerabilities.
2023
TCHES
Protecting Dilithium against Leakage: Revisited Sensitivity Analysis and Improved Implementations
CRYSTALS-Dilithium has been selected by the NIST as the new standard for post-quantum digital signatures. In this work, we revisit the side-channel countermeasures of Dilithium in three directions. First, we improve its sensitivity analysis by classifying intermediate computations according to their physical security requirements. Second, we provide improved gadgets dedicated to Dilithium, taking advantage of recent advances in masking conversion algorithms. Third, we combine these contributions and report performance for side-channel protected Dilithium implementations. Our benchmarking results additionally put forward that the randomized version of Dilithium can lead to significantly more efficient implementations (than its deterministic version) when side-channel attacks are a concern.
2023
TCHES
Provable Secure Parallel Gadgets
Side-channel attacks are a fundamental threat to the security of cryptographic implementations. One of the most prominent countermeasures against side-channel attacks is masking, where each intermediate value of the computation is secret shared, thereby concealing the computation’s sensitive information. An important security model to study the security of masking schemes is the random probing model, in which the adversary obtains each intermediate value of the computation with some probability p. To construct secure masking schemes, an important building block is the refreshing gadget, which updates the randomness of the secret shared intermediate values. Recently, Dziembowski, Faust, and Zebrowski (ASIACRYPT’19) analyzed the security of a simple refreshing gadget by using a new technique called the leakage diagram. In this work, we follow the approach of Dziembowski et al. and significantly improve its methodology. Concretely, we refine the notion of a leakage diagram via so-called dependency graphs, and show how to use this technique for arbitrary complex circuits via composition results and approximation techniques. To illustrate the power of our new techniques, as a case study, we designed provably secure parallel gadgets for the random probing model, and adapted the ISW multiplication such that all gadgets can be parallelized. Finally, we evaluate concrete security levels, and show how our new methodology can further improve the concrete security level of masking schemes. This results in a compiler provable secure up to a noise level of O(1) for affine circuits and O(1/√n) in general.
2023
TCHES
Pushing the Limits of Generic Side-Channel Attacks on LWE-based KEMs - Parallel PC Oracle Attacks on Kyber KEM and Beyond
In this work, we propose generic and novel adaptations to the binary Plaintext-Checking (PC) oracle based side-channel attacks for Kyber KEM. These attacks operate in a chosen-ciphertext setting, and are fairly generic and easy to mount on a given target, as the attacker requires very minimal information about the target device. However, these attacks have an inherent disadvantage of requiring a few thousand traces to perform full key recovery. This is due to the fact that these attacks typically work by recovering a single bit of information about the secret key per query/trace. In this respect, we propose novel parallel PC oracle based side-channel attacks, which are capable of recovering a generic P number of bits of information about the secret key in a single query/trace. We propose novel techniques to build chosen-ciphertexts so as to efficiently realize a parallel PC oracle for Kyber KEM. We also build a multi-class classifier, which is capable of realizing a practical side-channel based parallel PC oracle with very high success rate. We experimentally validated the proposed attacks (upto P = 10) on the fastest implementation of unprotected Kyber KEM in the pqm4 library. Our experiments yielded improvements in the range of 2.89× and 7.65× in the number of queries, compared to state-of-the-art binary PC oracle attacks, while arbitrarily higher improvements are possible for a motivated attacker, given the generic nature of the proposed attacks. We further conduct a thorough study on applicability to different scenarios, based on the presence/absence of a clone device, and also partial key recovery. Finally, we also show that the proposed attacks are able to achieve the lowest number of queries for key recovery, even for implementations protected with low-cost countermeasures such as shuffling. Our work therefore, concretely demonstrates the power of PC oracle attacks on Kyber KEM, thereby stressing the need for concrete countermeasures such as masking for Kyber and other lattice-based KEMs.
2023
TCHES
Quantile: Quantifying Information Leakage
The masking countermeasure is very effective against side-channel attacks such as differential power analysis. However, the design of masked circuits is a challenging problem since one has to ensure security while minimizing performance overheads. The security of masking is often studied in the t-probing model, and multiple formal verification tools can verify this notion. However, these tools generally cannot verify large masked computations due to computational complexity.We introduce a new verification tool named Quantile, which performs randomized simulations of the masked circuit in order to bound the mutual information between the leakage and the secret variables. Our approach ensures good scalability with the circuit size and results in proven statistical security bounds. Further, our bounds are quantitative and, therefore, more nuanced than t-probing security claims: by bounding the amount of information contained in the lower-order leakage, Quantile can evaluate the security provided by masking even when they are not 1-probing secure, i.e., when they are classically considered as insecure. As an example, we apply Quantile to masked circuits of Prince and AES, where randomness is aggressively reused.
2023
TCHES
Quasi-linear masking against SCA and FIA, with cost amortization
The implementation of cryptographic algorithms must be protected against physical attacks. Side-channel and fault injection analyses are two prominent such implementation-level attacks. Protections against either do exist. Against sidechannel attacks, they are characterized by SNI security orders: the higher the order, the more difficult the attack.In this paper, we leverage fast discrete Fourier transform to reduce the complexity of high-order masking. The security paradigm is that of code-based masking. Coding theory is amenable both to mask material at a prescribed order, by mixing the information, and to detect and/or correct errors purposely injected by an attacker. For the first time, we show that quasi-linear masking (pioneered by Goudarzi, Joux and Rivain at ASIACRYPT 2018) can be achieved alongside with cost amortisation. This technique consists in masking several symbols/bytes with the same masking material, therefore improving the efficiency of the masking. We provide a security proof, leveraging both coding and probing security arguments. Regarding fault detection, our masking is capable of detecting up to d faults, where 2d + 1 is the length of the code, at any place of the algorithm, including within gadgets. In addition to the theory, that makes use of the Frobenius Additive Fast Fourier Transform, we show performance results, in a C language implementation, which confirms in practice that the complexity is quasi-linear in the code length.
2023
TCHES
RAFA: Redundancies-assisted Algebraic Fault Analysis and its implementation on SPN block ciphers
Algebraic Fault Analysis (AFA) is a cryptanalysis for block ciphers proposed by Courtois et al., which incorporates algebraic cryptanalysis to overcome the complexity of manual analysis within the context of Differential Fault Analysis (DFA). The effectiveness of AFA on lightweight block ciphers has been demonstrated. However, the complexity of the algebraic systems prevents it from attacking heavyweight block ciphers efficiently. In this paper, we propose a novel cryptanalysis called Redundancies-assisted Algebraic Fault Analysis (RAFA) to facilitate the solution of algebraic systems in the setting of heavyweight block ciphers. The core idea of RAFA is to expedite SAT solvers by modifying the algebraic systems, which is accomplished via two methods. The first method introduces redundant constraints, which is proposed for the first time in the context of algebraic cryptanalysis. The second one is a sophisticated linearization of the nonlinear Algebraic Normal Form (ANF). It takes RAFA for about 9.68 hours to attack AES-128. To the best of our knowledge, this is the first work that uses a general SAT solver to attack AES with only a single injection of byte-fault. Moreover, RAFA can attack AES-128 in 50.92 and 27.54 minutes for nibble- and bit-based fault model, respectively. In comparison, the traditional DFA algorithm implemented by pure C takes 4 ~ 5 hours under all three fault models investigated in this work. Moreover, in order to show the generality of RAFA, we also apply it to other heavyweight block ciphers. The best results show that RAFA could recover the key of Serpent-256 and SPEEDY-r-192 in 20.7 and 1.5 hours using only three faults, respectively. In comparison, AFA could not break these two ciphers even when 30 bits and 50 bits of their keys are known, respectively. Furthermore, no DFA work on Serpent or SPEEDY is known using comparable fault models.
2023
TCHES
RDS: FPGA Routing Delay Sensors for Effective Remote Power Analysis Attacks
State-of-the-art sensors for measuring FPGA voltage fluctuations are time-to-digital converters (TDCs). They allow detecting voltage fluctuations in the order of a few nanoseconds. The key building component of a TDC is a delay line, typically implemented as a chain of fast carry propagation multiplexers. In FPGAs, the fast carry chains are constrained to dedicated logic and routing, and need to be routed strictly vertically. In this work, we present an alternative approach to designing on-chip voltage sensors, in which the FPGA routing resources replace the carry logic. We present three variants of what we name a routing delay sensor (RDS): one vertically constrained, one horizontally constrained, and one free of any constraints. We perform a thorough experimental evaluation on both the Sakura-X side-channel evaluation board and the Alveo U200 datacenter card, to evaluate the performance of RDS sensors in the context of a remote power side-channel analysis attack. The results show that our best RDS implementation in most cases outperforms the TDC. On average, for breaking the full 128-bit key of an AES-128 cryptographic core, an adversary requires 35% fewer side-channel traces when using the RDS than when using the TDC. Besides making the attack more effective, given the absence of the placement and routing constraint, the RDS sensor is also easier to deploy.
2023
TCHES
Revisiting the Computation Analysis against Internal Encodings in White-Box Implementations
White-box implementations aim to prevent the key extraction of the cryptographic algorithm even if the attacker has full access to the execution environment. To obfuscate the round functions, Chow et al. proposed a pivotal principle of white-box implementations to convert the round functions as look-up tables which are encoded by random internal encodings. These encodings consist of a linear mapping and a non-linear nibble permutation. At CHES 2016, Bos et al. introduced differential computation analysis (DCA) to extract the secret key from the runtime information, such as accessed memory and registers. Following this attack, many computation analysis methods were proposed to break the white-box implementations by leveraging some properties of the linear internal encodings, such as Hamming weight and imbalance. Therefore, it becomes an alternative choice to use a non-linear byte encoding to thwart DCA. At CHES 2021, Carlet et al. proposed a structural attack and revealed the weakness of the non-linear byte encodings which are combined with a non-invertible linear mapping. However, such a structural attack requires the details of the implementation, which relies on extra reverse engineering efforts in practice. To the best of our knowledge, it still lacks a thorough investigation of whether the non-linear byte encodings can resist the computation analyses.In this paper, we revisit the proposed computation analyses by investigating their capabilities against internal encodings with different algebraic degrees. Particularly, the algebraic degree of encodings is leveraged to explain the key leakage on the non-linear encodings. Based on this observation, we propose a new algebraic degree computation analysis (ADCA), which targets the mappings from the inputs to each sample of the computation traces. Different from the previous computation analyses, ADCA is a higher-degree attack that can distinguish the correct key by matching the algebraic degrees of the mappings. The experimental results prove that ADCA can break the internal encodings from degree 1 to 6 with the lowest time complexity. nstead of running different computation analyses separately, ADCA can be used as a generic tool to attack the white-box implementations.
2023
TCHES
Separating Oil and Vinegar with a Single Trace: Side-Channel Assisted Kipnis-Shamir Attack on UOV
Due to recent cryptanalytical breakthroughs, the multivariate signature schemes that seemed to be most promising in the past years are no longer in the focus of the research community. Hence, the cryptographically mature UOV scheme is of great interest again. Since it has not been part of the NIST process for standardizing post-quantum cryptography so far, it has not been studied intensively for its physical security.In this work, we present a side-channel attack on the latest implementation of UOV. In the first part of the attack, a single side-channel trace of the signing process is used to learn all vinegar variables used in the computation. Then, we employ a combination of the Kipnis-Shamir attack and the reconciliation attack to reveal the complete secret key. Our attack, unlike previous work, targets the inversion of the central map and not the subsequent linear transformation. It further does not require the attacker to control the message to be signed.We have verified the practicality of our attack on a ChipWhisperer-Lite board with a 32-bit STM32F3 ARM Cortex-M4 target mounted on a CW308 UFO board. We publicly provide the code and both reference and target traces. Additionally, we discuss several countermeasures that can at least make our attack less efficient.
2023
TCHES
SEV-Step A Single-Stepping Framework for AMD-SEV
The ever increasing popularity and availability of Trusted Execution Environments (TEEs) had a stark influence on microarchitectural attack research in academia, as their strong attacker model both boosts existing attack vectors and introduces several new ones. While many works have focused on Intel SGX, other TEEs like AMD SEV have recently also started to receive more attention. A common technique when attacking SGX enclaves is single-stepping, where the system’s APIC timer is used to interrupt the enclave after every instruction. Single-stepping increases the temporal resolution of subsequent microarchitectural attacks to a maximum. A key driver in the proliferation of this complex attack technique was the SGX-Step framework, which offered a stable reference implementation for single-stepping and a relatively easy setup. In this paper, we demonstrate that SEV VMs can also be reliably single-stepped. To lay the foundation for further microarchitectural attack research against SEV, we introduce the reusable SEV-Step framework. Besides reliable single-stepping, SEV-Step provides easy access to common attack primitives like page fault tracking and cache attacks against SEV. All features can be used interactively from user space. We demonstrate SEV-Step’s capabilities by carrying out an end-toend cache attack against SEV that leaks the volume key of a LUKS2-encrypted disk. Finally, we show for the first time that SEV is vulnerable to Nemesis-style attacks, which allow to extract information about the type and operands of single-stepped instructions from SEV-protected VMs.
2023
TCHES
Silicon Echoes: Non-Invasive Trojan and Tamper Detection using Frequency-Selective Impedance Analysis
The threat of chip-level tampering and its detection has been widely researched. Hardware Trojan insertions are prominent examples of such tamper events. Altering the placement and routing of a design or removing a part of a circuit for side-channel leakage/fault sensitivity amplification are other instances of such attacks. While semi- and fully-invasive physical verification methods can confidently detect such stealthy tamper events, they are costly, time-consuming, and destructive. On the other hand, virtually all proposed non-invasive side-channel methods suffer from noise and, therefore, have low confidence. Moreover, they require activating the tampered part of the circuit (e.g., the Trojan trigger) to compare and detect the modifications. In this work, we introduce a non-invasive post-silicon tamper detection technique applicable to different classes of tamper events at the chip level without requiring the activation of the malicious circuit. Our method relies on the fact that physical modifications (regardless of their physical, activation, or action characteristics) alter the impedance of the chip. Hence, characterizing the impedance can lead to the detection of the tamper events. To sense the changes in the impedance, we deploy known RF tools, namely, scattering parameters, in which we inject sine wave signals with high frequencies to the power distribution network (PDN) of the system and measure the “echo” of the signal. The reflected signals in various frequency bands reveal different tamper events based on their impact size on the die. To validate our claims, we performed measurements on several proof-ofconcept tampered hardware implementations realized on FPGAs manufactured with a 28 nm technology. We further show that deploying the Dynamic Time Warping (DTW) distance can distinguish between tamper events and noise resulting from manufacturing process variation of different chips/boards. Based on the acquired results, we demonstrate that stealthy hardware Trojans, as well as sophisticated modifications of P&R, can be detected.
2023
TCHES
Smooth Passage with the Guards: Second-Order Hardware Masking of the AES with Low Randomness and Low Latency
Cryptographic devices in hostile environments can be vulnerable to physical attacks such as power analysis. Masking is a popular countermeasure against such attacks, which works by splitting every sensitive variable into d+1 randomized shares. The implementation cost of the masking countermeasure in hardware increases significantly with the masking order d, and protecting designs often results in a large overhead. One of the main drivers of the cost is the required amount of fresh randomness for masking the non-linear parts of a cipher. In the case of AES, first-order designs have been built without the need for any fresh randomness, but state-of-the-art higher-order designs still require a significant number of random bits per encryption. Attempts to reduce the randomness however often result in a considerable latency overhead, which is not favorable in practice. This raises the need for AES designs offering a decent performance tradeoff, which are efficient both in terms of required randomness and latency.In this work, we present a second-order AES design with the minimal number of three shares, requiring only 3 200 random bits per encryption at a latency of 5 cycles per round. Our design represents a significant improvement compared to state-of-the-art designs that require more randomness and/or have a higher latency. The core of the design is an optimized 5-cycle AES S-box which needs 78 bits of fresh randomness. We use this S-box to construct a round-based AES design, for which we present a concept for sharing randomness across the S-boxes based on the changing of the guards (COTG) technique. We assess the security of our design in the probing model using a formal verification tool. Furthermore, we evaluate the practical side-channel resistance on an FPGA.
2023
TCHES
Some New Methods to Generate Short Addition Chains
Modular exponentiation and scalar multiplication are important operations in most public-key cryptosystems, and their efficient computation is essential to cryptosystems. The shortest addition chain is one of the most important mathematical concepts to realize the optimization of computation. However, finding a shortest addition chain of length r is generally regarded as an NP-hard problem, whose time complexity is comparable to O(r!). This paper proposes some novel methods to generate short addition chains. We firstly present a Simplified Power-tree method by deeply deleting the power-tree whose time complexity is reduced to O(r2). In this paper, a Cross Window method and its variant are introduced by improving the Window method. The Cross Window method uses the cross correlation to deal with the windows and its pre-computation is optimized by a new Addition Sequence Algorithm. The theoretical analysis is conducted to show the correctness and effectiveness. Meanwhile, our experiments show that the new methods can obtain shorter addition chains compared to the existing methods. The Cross Window method with the Addition Sequence algorithm can attain 44.74% and 9.51% reduction of the addition chain length, in the best case, compared to the Binary method and the Window method respectively.
2023
TCHES
Speeding Up Multi-Scalar Multiplication over Fixed Points Towards Efficient zkSNARKs
The arithmetic of computing multiple scalar multiplications in an elliptic curve group then adding them together is called multi-scalar multiplication (MSM). MSM over fixed points dominates the time consumption in the pairing-based trusted setup zero-knowledge succinct non-interactive argument of knowledge (zkSNARK), thus for practical applications we would appreciate fast algorithms to compute it. This paper proposes a bucket set construction that can be utilized in the context of Pippenger’s bucket method to speed up MSM over fixed points with the help of precomputation. If instantiating the proposed construction over BLS12-381 curve, when computing n-scalar multiplications for n = 2e (10 ≤ e ≤ 21), theoretical analysis ndicates that the proposed construction saves more than 21% computational cost compared to Pippenger’s bucket method, and that it saves 2.6% to 9.6% computational cost compared to the most popular variant of Pippenger’s bucket method. Finally, our experimental result demonstrates the feasibility of accelerating the computation of MSM over fixed points using large precomputation tables as well as the effectiveness of our new construction.
2023
TCHES
StaTI: Protecting against Fault Attacks Using Stable Threshold Implementations
Fault attacks impose a serious threat against the practical implementations of cryptographic algorithms. Statistical Ineffective Fault Attacks (SIFA), exploiting the dependency between the secret data and the fault propagation overcame many of the known countermeasures. Later, several countermeasures have been proposed to tackle this attack using error detection methods. However, the efficiency of the countermeasures, in part governed by the number of error checks, still remains a challenge.In this work, we propose a fault countermeasure, StaTI, based on threshold implementations and linear encoding techniques. The proposed countermeasure protects the implementations of cryptographic algorithms against both side-channel and fault adversaries in a non-combined attack setting. We present a new composable notion, stability, to protect a threshold implementation against a formal gate/register-faulting adversary. Stability ensures fault propagation, making a single error check of the output suffice. To illustrate the stability notion, first, we provide stable encodings of the XOR and AND gates. Then, we present techniques to encode threshold implementations of S-boxes, and provide stable encodings of some quadratic S-boxes together with their security and performance evaluation. Additionally, we propose general encoding techniques to transform a threshold implementation of any function (e.g., non-injective functions) to a stable one. We then provide an encoding technique to use in symmetric primitives which encodes state elements together significantly reducing the encoded state size. Finally, we used StaTI to implement a secure Keccak on FPGA and report on its efficiency.
2023
TCHES
TeeJam: Sub-Cache-Line Leakages Strike Back
The microarchitectural behavior of modern CPUs is mostly hidden from developers and users of computer software. Due to a plethora of attacks exploiting microarchitectural behavior, developers of security-critical software must, e.g., ensure their code is constant-time, which is cumbersome and usually results in slower programs. In practice, small leakages which are deemed not exploitable still remain in the codebase. For example, sub-cache-line leakages have previously been investigated in the CacheBleed and MemJam attacks, which are deemed impractical on modern platforms.In this work, we revisit and carefully analyze the 4k-aliasing effect and discover that the measurable delay introduced by this microarchitectural effect is higher than found by previous work and described by Intel. By combining the rediscovered effect with a high temporal resolution possible when single-stepping an SGX enclave, we construct a very precise, yet widely applicable attack with sub-cache-line leakage resolution. o demonstrate the significance of our findings, we apply the new attack primitive to break a hardened AES T-Table implementation that features constant cache line access patterns. The attack is up to three orders of magnitude more efficient than previous sub-cache-line attacks on AES in SGX. Furthermore, we improve upon the recent work of Sieck et al. which showed partial exploitability of very faint leakages in a utility function loading base64-encoded RSA keys. With reliable sub-cache-line resolution, we build an end-to-end attack exploiting the faint leakage that can recover 4096-bit keys in minutes on a laptop. Finally, we extend the key recovery algorithm to also work for RSA keys following the standard that uses Carmichael’s totient function, while previous attacks were restricted to RSA keys using Euler’s totient function.
2023
TCHES
Threshold Implementations in Software: Micro-architectural Leakages in Algorithms
This paper provides necessary properties to algorithmically secure firstorder maskings in scalar micro-architectures. The security notions of threshold implementations are adapted following micro-processor leakage effects which are known to the literature. The resulting notions, which are based on the placement of shares, are applied to a two-share randomness-free PRESENT cipher and Keccak-f. The assembly implementations are put on a RISC-V and an ARM Cortex-M4 core. All designs are validated in the glitch and transition extended probing model and their implementations via practical lab analysis.
2023
TCHES
Vectorized and Parallel Computation of Large Smooth-Degree Isogenies using Precedence-Constrained Scheduling
Strategies and their evaluations play important roles in speeding up the computation of large smooth-degree isogenies. The concept of optimal strategies for such computation was introduced by De Feo et al., and virtually all implementations of isogeny-based protocols have adopted this approach, which is provably optimal for single-core platforms. In spite of its inherent sequential nature, several recent works have studied ways of speeding up this isogeny computation by exploiting the rich parallelism available in vectorized and multi-core platforms. One obstacle to taking full advantage of this parallelism, however, is that De Feo et al.’s strategies are not necessarily optimal in multi-core environments. To illustrate how the speed of vectorized and parallel isogeny computation can be improved at the strategylevel, we present two novel software implementations that utilize a state-of-the-art evaluation technique, called precedence-constrained scheduling (PCS), presented by Phalakarn et al., with our proposed strategies crafted for these environments. Our first implementation relies only on the parallelism provided by multi-core processors. The second implementation targets multi-core processors supporting the latest generation of the Intel’s Advanced Vector eXtensions (AVX) technology, commonly known as AVX-512IFMA instructions. To better handle the computational concurrency associated with PCS, we equip both implementations with extensive synchronization techniques. Our first implementation outperforms the implementation of Cervantes-Vázquez et al. by yielding up to 14.36% reduction in the execution time, when targeting platforms with two- to four-core processors. Our second implementation, equipped with four cores, achieves up to 34.05% reduction in the execution time compared to the single-core implementation of Cheng et al. of CHES 2022.
2023
TCHES
Who Watches the Watchers: Attacking Glitch Detection Circuits
Over the last decades, fault injection attacks have been demonstrated to be an effective method for breaking the security of electronic devices. Some types of fault injection attacks, like clock and voltage glitching, require very few resources by the attacker and are practical and simple to execute. A cost-effective countermeasure against these attacks is the use of a detector circuit which detects timing violations - the underlying effect that glitch attacks rely on. In this paper, we take a closer look at three examples of such detectors that have been presented in the literature. We demonstrate four high-speed clock glitching attacks, which successfully inject faults in systems, where detectors have been implemented to protect. The attacks remain unnoticed by the glitch detectors. We verify our attacks with practical experiments on FPGA.