International Association for Cryptologic Research

International Association
for Cryptologic Research

CryptoDB

Recently updated IACR publications

CryptoDB is periodically updated by manual and automatic processes. Whenever a paper is added or modified it will appear in this list, e.g., when a video appears.

A separate history of changes tracks schema and process changes. There is further information about CryptoDB in the documentation.

Year
Venue
Title
2024
ASIACRYPT
Practical Blind Signatures in Pairing-Free Groups
Blind signatures have garnered significant attention in recent years, with several efficient constructions in the random oracle model relying on well-understood assumptions. However, this progress does not apply to pairing-free cyclic groups: fully secure constructions over cyclic groups rely on pairings, remain inefficient, or depend on the algebraic group model or strong interactive assumptions. To address this gap, Chairattana-Apirom, Tessaro, and Zhu (CTZ, Crypto 2024) proposed a new scheme based on the CDH assumption. Unfortunately, their construction results in large signatures and high communication complexity. In this work, we propose a new blind signature construction in the random oracle model that significantly improves upon the CTZ scheme. Compared to CTZ, our scheme reduces communication complexity by a factor of more than 10 and decreases the signature size by a factor of more than 45, achieving a compact signature size of only 224~Bytes. The security of our scheme is based on the DDH assumption over pairing-free cyclic groups, and we show how to generalize it to the partially blind setting.
2024
ASIACRYPT
Deletions and Dishonesty: Probabilistic Data Structures in Adversarial Settings
Probabilistic data structures (PDS) are compact representations of high-volume data that provide approximate answers to queries about the data. They are commonplace in today's computing systems, finding use in databases, networking and more. While PDS are designed to perform well under benign inputs, they are frequently used in applications where inputs may be adversarially chosen. This may lead to a violation of their expected behaviour, for example an increase in false positive rate. In this work, we focus on PDS that handle approximate membership queries (AMQ). We consider adversarial users with the capability of making adaptive insertions, deletions and membership queries to AMQ-PDS, and analyse the performance of AMQ-PDS under such adversarial inputs. We argue that deletions significantly empower adversaries, presenting a challenge to enforcing honest behaviour when compared to insertion-only AMQ-PDS.To address this, we introduce a new concept of an honest setting for AMQ-PDS with deletions. By leveraging simulation-based security definitions, we then quantify how much harm can be caused by adversarial users to the functionality of AMQ-PDS. Our resulting bounds only require calculating the maximal false positive probability and insertion failure probability achievable in our novel honest setting. We apply our results to Cuckoo filters and Counting filters. We show how to protect these AMQ-PDS at low cost, by replacing or composing the hash functions with keyed pseudorandom functions in their construction. This strategy involves establishing practical bounds for the probabilities mentioned above. Using our new techniques, we demonstrate that achieving security against adversarial users making both insertions *and* deletions remains practical.
2024
ASIACRYPT
Reducing the Number of Qubits in Quantum Information Set Decoding
This paper presents an optimization of the memory cost of the quantum \emph{Information Set Decoding} (ISD) algorithm proposed by Bernstein (PQCrypto 2010), obtained by combining Prange's ISD with Grover's quantum search. When the code has constant rate and length $n$, this algorithm essentially performs a quantum search which, at each iterate, solves a linear system of dimension $\mathcal{O}(n)$. The typical code lengths used in post-quantum public-key cryptosystems range from $10^3$ to $10^5$. Gaussian elimination, which was used in previous works, needs $\mathcal{O}(n^2)$ space to represent the matrix, resulting in millions or billions of (logical) qubits for these schemes. In this paper, we propose instead to use the algorithm for sparse matrix inversion of Wiedemann (IEEE Trans. inf. theory 1986). The interest of Wiedemann's method is that one relies only on the implementation of a matrix-vector product, where the matrix can be represented in an implicit way. This is the case here. We propose two main trade-offs, which we have fully implemented, tested on small instances, and benchmarked for larger instances. The first one is a quantum circuit using $\mathcal{O}(n)$ qubits, $\mathcal{O}(n^3)$ Toffoli gates like Gaussian elimination, and depth $\mathcal{O}(n^2 \log n)$. The second one is a quantum circuit using $\mathcal{O}(n \log^2 n)$ qubits, $\mathcal{O}(n^3)$ gates in total but only $\mathcal{O}( n^2 \log^2 n)$ Toffoli gates, which relies on a different representation of the search space. As an example, for the smallest Classic McEliece parameters we estimate that the Quantum Prange's algorithm can run with 18098 qubits, while previous works would have required at least half a million qubits.
2024
TOSC
FRAST: TFHE-Friendly Cipher Based on Random S-Boxes
A transciphering framework, also known as hybrid homomorphic encryption, is a practical method of combining a homomorphic encryption (HE) scheme with a symmetric cipher in the client-server model to reduce computational and communication overload on the client side. As a server homomorphically evaluates a symmetric cipher in this framework, new design rationales are required for “HE-friendly” ciphers that take into account the specific properties of the HE schemes. In this paper, we propose a new TFHE-friendly cipher, dubbed FRAST, with a TFHE-friendly round function based on a random S-box to minimize the number of rounds. The round function of FRAST can be efficiently evaluated in TFHE by a new optimization technique, dubbed double blind rotation. Combined with our new WoP-PBS method, the double blind rotation allows computing multiple S-box calls in the round function of FRAST at the cost of a single S-box call. In this way, FRAST enjoys 2.768 (resp. 10.57) times higher throughput compared to Kreyvium (resp. Elisabeth) for TFHE keystream evaluation in the offline phase of the transciphering framework at the cost of slightly larger communication overload.
2024
TOSC
Monolith: Circuit-Friendly Hash Functions with New Nonlinear Layers for Fast and Constant-Time Implementations
Hash functions are a crucial component in incrementally verifiable computation (IVC) protocols and applications. Among those, recursive SNARKs and folding schemes require hash functions to be both fast in native CPU computations and compact in algebraic descriptions (constraints). However, neither SHA-2/3 nor newer algebraic constructions, such as Poseidon, achieve both requirements. In this work we overcome this problem in several steps. First, for certain prime field domains we propose a new design strategy called Kintsugi, which explains how to construct nonlinear layers of high algebraic degree which allow fast native implementations and at the same time also an efficient circuit description for zeroknowledge applications. Then we suggest another layer, based on the Feistel Type-3 scheme, and prove wide trail bounds for its combination with an MDS matrix. We propose a new permutation design named Monolith to be used as a sponge or compression function. It is the first arithmetization-oriented function with a native performance comparable to SHA3-256. At the same time, it outperforms Poseidon in a circuit using the Merkle tree prover in the Plonky2 framework. Contrary to previously proposed designs, Monolith also allows for efficient constant-time native implementations which mitigates the risk of side-channel attacks.
2024
TOSC
Finding Complete Impossible Differential Attacks on AndRX Ciphers and Efficient Distinguishers for ARX Designs
The impossible differential (ID) attack is one of the most important cryptanalytic techniques for block ciphers. There are two phases to finding an ID attack: searching for the distinguisher and building a key recovery upon it. Previous works only focused on automated distinguisher discovery, leaving key recovery as a manual post-processing task, which may lead to a suboptimal final complexity. At EUROCRYPT 2023, Hadipour et al. introduced a unified constraint programming (CP) approach based on satisfiability for finding optimal complete ID attacks in strongly aligned ciphers. While this approach was extended to weakly-aligned designs like PRESENT at ToSC 2024, its application to ARX and AndRX ciphers remained as future work. Moreover, this method only exploited ID distinguishers with direct contradictions at the junction of two deterministic transitions. In contrast, some ID distinguishers, particularly for ARX and AndRX designs, may not be detectable by checking only the existence of direct contradictions.This paper fills these gaps by extending Hadipour et al.’s method to handle indirect contradictions and adapting it for ARX and AndRX designs. We also present a similar method for identifying zero-correlation (ZC) distinguishers. Moreover, we extend our new model for finding ID distinguishers to a unified optimization problem that includes both the distinguisher and the key recovery for AndRX designs. Our method improves ID attacks and introduces new distinguishers for several ciphers, such as SIMON, SPECK, Simeck, ChaCha, Chaskey, LEA, and SipHash. For example, we achieve a one-round improvement in ID attacks against SIMON-64-96, SIMON-64-128, SIMON-128-128, SIMON-128-256 and a two-round improvement against SIMON-128- 192. These results significantly contribute to our understanding of the effectiveness of automated tools in the cryptanalysis of different design paradigms.
2024
TOSC
Perfect Monomial Prediction for Modular Addition
Modular addition is often the most complex component of typical Addition- Rotation-XOR (ARX) ciphers, and the division property is the most effective tool for detecting integral distinguishers. Thus, having a precise division property model for modular addition is crucial in the search for integral distinguishers in ARX ciphers. Current division property models for modular addition either (a) express the operation as a Boolean circuit and apply standard propagation rules for basic operations (COPY, XOR, AND), or (b) treat it as a sequence of smaller functions with carry bits, modeling each function individually. Both approaches were originally proposed for the twosubset bit-based division property (2BDP), which is theoretically imprecise and may overlook some balanced bits.Recently, more precise versions of the division property, such as parity sets, threesubset bit-based division property without unknown subsets (3BDPwoU) or monomial prediction (MP), and algebraic transition matrices have been proposed. However, little attention has been given to modular addition within these precise models.The propagation rule for the precise division property of a vectorial Boolean function f requires that u can propagate to v if and only if the monomial πu(x) appears in πv(f). Braeken and Semaev (FSE 2005) studied the algebraic structure of modular addition and showed that for x ⊞ y = z, the monomial πu(x)πv(y) appears in πw(z) if and only if u + v = w. Their theorem directly leads to a precise division property model for modular addition. Surprisingly, this model has not been applied in division property searches, to the best of our knowledge.In this paper, we apply Braeken and Semaev’s theorem to search for integral distinguishers in ARX ciphers, leading to several new results. First, we improve the state-of-the-art integral distinguishers for all variants of the Speck family, significantly enhancing search efficiency for Speck-32/48/64/96 and detecting new integral distinguishers for Speck-48/64/96/128. Second, we determine the exact degrees of output bits for 7-round Speck-32 and all/16/2 output bits for 2/3/4-round Alzette for the first time. Third, we revisit the choice of rotation parameters in Speck instances, providing a criterion that enhances resistance against integral distinguishers. Additionally, we offer a simpler proof for Braeken and Semaev’s theorem using monomial prediction, demonstrating the potential of division property methods in the study of Boolean functions.We hope that the proposed methods will be valuable in the future design of ARX ciphers.
2024
TOSC
Reconstructing S-Boxes from Cryptographic Tables with Milp
Reconstructing an S-box from a cryptographic table such as difference distribution table (DDT), linear approximation table (LAT), differential-linear connectivity table (DLCT) or boomerang connectivity table (BCT) is one of the fundamental problems in symmetric-key cryptography. Till now, there are only very few known methods which can reconstruct an S-box from a given table: guess-and-determine algorithms of Boura et al. (DCC 2019) and Tian et al. (DCC 2020), sign determination algorithm of Dunkelman et al. (ToSC 2019) and STP based approach of Lu et al. (DCC 2022). In this paper we consider the reconstruction problem in an even more challenging setup where one needs to reconstruct S-boxes from a partial cryptographic table. We are able to reconstruct S-boxes when only a few number of rows of a cryptographic table is given. This problem has never been studied in the literature. We apply mixed integer linear programming (MILP) as the key tool for solving this problem. Needless to say that we can solve the reconstruction problem when the full table is given and this is the first ever application of MILP tool in solving such fundamental problems. As a further application of our method, we provide the generic MILP models which can search for S-boxes with a given cryptographic property such as differential uniformity, linearity, differential-linear uniformity or boomerang uniformity. Additionally, our method can recover a Boolean function from a given Walsh spectrum or a Boolean function with a given nonlinearity. We also introduce a new heuristic called Optimistic MILP objective that guides the model towards obtaining multiple S-boxes or Boolean functions with the same cryptographic property. We give detailed experimental results for up to 6-bit S-boxes showing the effectiveness of our technique.
2024
TOSC
Improved Quantum Rebound Attacks on Double Block Length Hashing with Round-Reduced AES-256 and ARIA-256
At EUROCRYPT 2020, Hosoyamada and Sasaki proposed the first dedicated quantum collision attacks on hash functions. Their proposal presented a quantum adaptation of the rebound attack and revealed that differential trails, which have too low probability for use in classical settings, might be exploitable in quantum settings. After their work, subsequent research has actively delved into analyzing the security of hash functions in the quantum setting.In this paper, we revisit the quantum rebound attacks on the double block hash function Hirose instantiated with 10-round AES-256 (HCF-AES-256) and 7-round ARIA-256 (HCF-ARIA-256) proposed by Chauhan et al. and Baek et al., respectively. Initially, we identify the flaws in their work and reevaluate the complexity of the attacks. We reveal that the flaws stem from not considering the issue that the S-box differential equation has one solution on average. Earlier research addressed this problem by adding auxiliary bits to the search space. If this method is used to correct the flaws, the resulting time complexities are 217.36 and 220.94 times higher than their proposals. Consequently, in some settings, their attacks become less efficient than generic attacks.Subsequently, we propose improved quantum rebound attacks using nested quantum amplitude amplification and quantum state preparation. Our improved attack efficiently pre-filters the search space, leading to a reduction in overall time complexity. We first classically reduce the search space and employ quantum state preparation to generate a superposition state over the pre-filtered search space. We then use nested quantum amplitude amplification to further reduce the search space quantumly. As a result, we achieve a reduction in the time complexity of the quantum rebound attacks on HCF-AES-256 and HCF-ARIA-256 by factors of 211.2 and 219.5, respectively, making the attacks more efficient than generic attacks again.
2024
TOSC
Single-Query Quantum Hidden Shift Attacks
Quantum attacks using superposition queries are known to break many classically secure modes of operation. While these attacks do not necessarily threaten the security of the modes themselves, since they rely on a strong adversary model, they help us to draw limits on their provable security.Typically these attacks use the structure of the mode (stream cipher, MAC or authenticated encryption scheme) to embed a period-finding problem, which can be solved with a dedicated quantum algorithm. The hidden period can be recovered with a few superposition queries (e.g., O(n) for Simon’s algorithm), leading to state or key-recovery attacks. However, this strategy breaks down if the period changes at each query, e.g., if it depends on a nonce.In this paper, we focus on this case and give dedicated state-recovery attacks on the authenticated encryption schemes Rocca, Rocca-S, Tiaoxin-346 and AEGIS- 128L. These attacks rely on a procedure to find a Boolean hidden shift with a single superposition query, which overcomes the change of nonce at each query. This approach has the drawback of a lower success probability, meaning multiple independent (and parallelizable) runs are needed.We stress that these attacks do not break any security claim of the authors, and do not threaten the schemes if the adversary only makes classical queries.
2024
TOSC
Addendum to Classification of All t-Resilient Boolean Functions with t + 4 Variables: Classification of Quadratic and Cubic t-Resilient Boolean Functions with t + 5 Variables
In ToSC 2023(3), Rasoolzadeh presented an algorithm for classifying (n−m)-resilient Boolean functions with n variables, up to extended variable-permutation equivalence, for a small given positive integer m and any positive integer n with n ≥ m. By applying this algorithm along with several speed-up techniques, he classified n-variable (n − 4)-resilient Boolean functions up to equivalence for any n ≥ 4. However, for m = 5, due to the large number of representative functions, he was unable to classify n-variable (n − 5)-resilient Boolean functions for n > 6.In this work, we apply this algorithm together with a technique to restrict the ANF degree to classify quadratic and cubic (n − 5)-resilient Boolean functions with n variables, up to the same equivalence. We show that there are only 131 quadratic representative functions for any n ≥ 8. Additionally, we show that there are 359 078 cubic representative functions for any n ≥ 14.
2024
ASIACRYPT
Jackpot: Non-Interactive Aggregatable Lotteries
In proof-of-stake blockchains, liveness is ensured by repeatedly selecting random groups of parties as leaders, who are then in charge of proposing new blocks and driving consensus forward. The lotteries that elect those leaders need to ensure that adversarial parties are not elected disproportionately often and that an adversary can not tell who was elected before those parties decide to speak, as this would potentially allow for denial-of-service attacks. Whenever an elected party speaks, it needs to provide a winning lottery ticket, which proves that the party did indeed win the lottery. Current solutions require all published winning tickets to be stored individually on-chain, which introduces undesirable storage overheads. In this work, we introduce non-interactive aggregatable lotteries and show how these can be constructed efficiently. Our lotteries provide the same security guarantees as previous lottery constructions, but additionally allow any third party to take a set of published winning tickets and aggregate them into one short digest. We provide a formal model of our new primitive in the universal composability framework. As one of our technical contributions, which may be of independent interest, we introduce aggregatable vector commitments with simulation-extractability and present a concretely efficient construction thereof in the algebraic group model in the presence of a random oracle. We show how these commitments can be used to construct non-interactive aggregatable lotteries. We have implemented our construction, called Jackpot, and provide benchmarks that underline its concrete efficiency.
2024
ASIACRYPT
Tightly Secure Non-Interactive BLS Multi-Signatures
Due to their simplicity, compactness, and algebraic structure, BLS signatures are among the most widely used signatures in practice. For example, used as multi-signatures, they are integral in Ethereum's proof-of-stake consensus. From the perspective of concrete security, however, BLS (multi-)signatures suffer from a security loss linear in the number of signing queries. It is well-known that this loss can not be avoided using current proof techniques. In this paper, we introduce a new variant of BLS multi-signatures that achieves tight security while remaining fully compatible with regular BLS. In particular, our signatures can be seamlessly combined with regular BLS signatures, resulting in regular BLS signatures. Moreover, it can easily be implemented using existing BLS implementations in a black-box way. Our scheme is also one of the most efficient non-interactive multi-signatures, and in particular more efficient than previous tightly secure schemes. We demonstrate the practical applicability of our scheme by showing how proof-of-stake protocols that currently use BLS can adopt our variant for fully compatible opt-in tight security.
2024
ASIACRYPT
NTRU-based Bootstrapping for MK-FHEs without using Overstretched Parameters
Recent attacks on NTRU lattices given by Ducas and van Woerden (ASIACRYPT 2021) showed that for moduli $q$ larger than the so-called fatigue point $n^{2.484+o(1)}$, the security of NTRU is noticeably less than that of (ring)-LWE. Unlike NTRU-based PKE with $q$ typically lying in the secure regime of NTRU lattices (i.e., $q<n^{2.484+o(1)}$), the security of existing NTRU-based multi-key FHEs (MK-FHEs) requiring $q=O(n^k)$ for $k$ keys could be significantly affected by those attacks. In this paper, we first propose a (matrix) NTRU-based MK-FHE for super-constant number $k$ of keys without using overstretched NTRU parameters. Our scheme is essentially a combination of two components following the two-layer framework of TFHE/FHEW: - a simple first-layer matrix NTRU-based encryption which naturally supports multi-key NAND operations with moduli $q=O(k\cdot n^{1.5})$ only linear in the number $k$ of keys; - and a crucial second-layer NTRU-based encryption which supports efficient hybrid product between a single-key ciphertext and a multi-key ciphertext for gate bootstrapping. Then, by replacing the first-layer with a more efficient LWE-based multi-key encryption, we obtain an improved MK-FHE scheme with better performance. We also employ a light key-switching technique to reduce the key-switching key size from previous $O(n^2)$ bits to $O(n)$ bits. A proof-of-concept implementation shows that our two MK-FHE schemes outperform the state-of-the-art TFHE-like MK-FHE schemes in both computation efficiency and bootstrapping key size. Concretely, for $k=8$ at the same 100-bit security level, our improved MK-FHE scheme can bootstrap a ciphertext in {0.54s} on a laptop and only has a bootstrapping key of size {13.89}MB, which are respectively 2.2 times faster and 7.4 times smaller than the MK-FHE scheme (which relies on a second-layer encryption from the ring-LWE assumption) due to Chen, Chillotti and Song (ASIACRYPT 2019).
2024
ASIACRYPT
Revisiting Pairing-Friendly Curves with Embedding Degrees 10 and 14
Since 2015, there has been a significant decrease in the asymptotic complexity of computing discrete logarithms in finite fields. As a result, the key sizes of many mainstream pairing-friendly curves have to be updated to maintain the desired security level. In PKC'20, Guillevic conducted a comprehensive assessment of the security of a series of pairing-friendly curves with embedding degrees ranging from $9$ to $17$. In this paper, we focus on five pairing-friendly curves with embedding degrees 10 and 14 at the 128-bit security level, with BW14-351 emerging as the most competitive candidate. First, we extend the optimized formula for the optimal pairing on BW13-310, a 128-bit secure curve with a prime $p$ in 310 bits and embedding degree $13$, to our target curves. This generalization allows us to compute the optimal pairing in approximately $\log r/(2\varphi(k))$ Miller iterations, where $r$ and $k$ are the order of pairing groups and the embedding degree respectively. Second, we develop optimized algorithms for cofactor multiplication for $\G_1$ and $\G_2$, as well as subgroup membership testing for $\G_2$ on these curves. Finally, we provide detailed performance comparisons between BW14-351 and other popular curves on a 64-bit platform in terms of pairing computation, hashing to $\G_1$ and $\G_2$, group exponentiations, and subgroup membership testings. Our results demonstrate that BW14-351 is a strong candidate for building pairing-based cryptographic protocols.
2024
ASIACRYPT
Rare structures in tensor graphs - Bermuda triangles for cryptosystems based on the Tensor Isomorphism problem
Recently, there has been a lot of interest in improving the understanding of the practical hardness of the 3-Tensor Isomorphism (3-TI) problem, which, given two 3-tensors, asks for an isometry between the two. The current state-of-the-art for solving this problem is the algebraic algorithm of Ran et al. '23 and the graph-theoretic algorithm of Narayanan et al. '24 that have both slightly reduced the security of the signature schemes MEDS and ALTEQ, based on variants of the 3-TI problem (Matrix Code Equivalence (MCE) and Alternating Trilinear Form Equivalence (ATFE) respectively). In this paper, we propose a new combined technique for solving the 3-TI problem. Our algorithm, as typically done in graph-based algorithms, looks for an invariant in the graphs of the isomorphic tensors that can be used to recover the secret isometry. However, contrary to usual combinatorial approaches, our approach is purely algebraic. We model the invariant as a system of non-linear equations and solve it. Using this modelling we are able to find very rare invariant objects in the graphs of the tensors — cycles of length 3 (triangles) — that exist with probability approximately 1/q. For solving the system of non-linear equations we use Gröbner-basis techniques adapted to tri-graded polynomial rings. We analyze the algorithm theoretically, and we provide lower and upper bounds on its complexity. We further provide experimental support for our complexity claims. Finally, we describe two dedicated versions of our algorithm tailored to the specifics of the MCE and the ATFE problems. The implications of our algorithm are improved cryptanalysis of both MEDS and ALTEQ for the cases when a triangle exists, i.e. in approximately 1/q of the cases. While for MEDS, we only marginally reduce the security compared to previous work, for ALTEQ our results are much more significant with at least 60 bits improvement compared to previous work for all security levels. For Level I parameters, our attack is practical, and we are able to recover the secret key in only 1501 seconds. The code is available for testing and verification of our results.
2024
TCHES
Faster Complete Addition Laws for Montgomery Curves
An addition law for an elliptic curve is complete if it is defined for all possible pairs of input points on the elliptic curve. In Elliptic Curve Cryptography (ECC), a complete addition law provides a natural protection against side-channel attacks which are based on Simple Power Analysis (SPA). Montgomery curves are a specific family of elliptic curves that play a crucial role in ECC because of its well-known Montgomery ladder, particularly in the Elliptic Curve Diffie-Hellman Key Exchange (ECDHKE) protocol and the Elliptic Curve factorization Method (ECM). However, the complete addition law for Montgomery curves, as stated in the literature, has a computational cost of 14M+ 2D, where M,D denote the costs of a field multiplication and a field multiplication by a constant, respectively. The lack of a competitive complete addition law has led implementers towards twisted Edwards curves, which offer a complete addition law at a lower cost of 8M+ 1D for appropriately chosen curve constants.In this paper, we introduce extended Montgomery coordinates as a novel representation for points on Montgomery curves. This coordinate system enables us to define birational multiplication-free maps between the extended twisted Edwards coordinates and extended Montgomery coordinates. Using this map, we can transfer the complete addition laws from twisted Edwards curves to Montgomery curves without incurring additional multiplications or squarings. In addition, we employ a technique known as scaling to refine the addition laws for twisted Edwards curves, which results in having i) Complete addition laws with the costs varying between 8M+1D and 9M+1D for a broader range of twisted Edwards curves, ii) Incomplete addition laws for twisted Edwards curves with the cost of 8M. Consequently, by leveraging our birational multiplication-free maps, we present complete addition laws for Montgomery curves with the cost of 8M+1D. This shows a significant improvement for complete addition law for Montgomery curves by reducing the computational cost by 6M+ 1D. This improvement makes Montgomery curves a more attractive option for applications where an efficient complete addition law is essential.
2024
TCHES
Fast Transciphering Via Batched And Reconfigurable LUT Evaluation
Fully homomorphic encryption provides a way to perform computations in a privacy preserving manner. However, despite years of optimization, modern methods may still be too computationally expensive for devices limited by speed or memory constraints. A paradigm that may bridge this gap consists of transciphering: as fully homomorphic schemes can perform most computations obliviously, they can also execute the decryption circuit of any conventional block or stream cipher. Hence, less powerful systems may continue to encrypt their data using classical ciphers that may offer hardware support (e.g., AES) and outsourcing the task of transforming the ciphertexts into their homomorphic equivalent to more powerful systems. In this work, we advance transciphering methods that leverage accumulator-based schemes such as Torus-FHE (TFHE) or FHEW. To this end, we propose a novel method to homomorphically evaluate look-up tables in a setting in which encrypted digits are provided on base 2. At a high level, our method relies on the fact that functions with binary range, i.e., mapping values to {0, 1}, can be evaluated at the same computational cost as negacyclic functions, relying only on the default functionality of accumulator based schemes. To test our algorithm, we implement the AES-128 encryption circuit in OPENFHE and report timings of 67 s for a single block, which is 25% faster than the state of the art and in general, up to 300% faster than other recent works. Furthermore, we achieve this speedup without relying on an instantiation that leverages a power of 2 modulus and can exploit the natural modulo arithmetic of modern processors.
2024
TCHES
Breaking Ground: A New Area Record for Low-Latency First-Order Masked SHA-3: Advancing from the 4x Area Era to the 3x Area Era
SHA-3, the latest hash standard from NIST, is utilized by numerous cryptographic algorithms to handle sensitive information. Consequently, SHA-3 has become a prime target for side-channel attacks, with numerous studies demonstrating successful breaches in unprotected implementations. Masking, a countermeasure capable of providing theoretical security, has been explored in various studies to protect SHA-3. However, masking for hardware implementations may significantly increase area costs and introduce additional delays, substantially impacting the speed and area of higher-level algorithms. In particular, current low-latency first-order masked SHA-3 hardware implementations require more than four times the area of unprotected implementations. To date, the specific structure of SHA-3 has not been thoroughly analyzed for exploitation in the context of masking design, leading to difficulties in minimizing the associated area costs using existing methods. We bridge this gap by conducting detailed leakage path and data dependency analyses on two-share masked SHA-3 implementations. Based on these analyses, we propose a compact and low-latency first-order SHA-3 masked hardware implementation, requiring only three times the area of unprotected implementations and almost no fresh random number demand. We also present a complete theoretical security proof for the proposed implementation in the glitch+register-transition-robust probing model. Additionally, we conduct leakage detection experiments using PROLEAD, TVLA and VerMI to complement the theoretical evidence. Compared to state-of-theart designs, our implementation achieves a 28% reduction in area consumption. Our design can be integrated into first-order implementations of higher-level cryptographic algorithms, contributing to a reduction in overall area costs.
2024
TCHES
Elastic MSM: A Fast, Elastic and Modular Preprocessing Technique for Multi-Scalar Multiplication Algorithm on GPUs
Zero-knowledge proof (ZKP) is a cryptographic primitive that enables a prover to convince a verifier that a statement is true, without revealing any other information beyond the correctness of the statement itself. Due to its powerful capabilities, its most practical type, called zero-knowledge Succinct Non-interactive ARgument of Knowledge (zkSNARK), has been widely deployed in various privacypreserving applications such as cryptocurrencies and verifiable computation. Although state-of-the-art zkSNARKs are highly efficient for the verifier, the computational overhead for the prover is still orders of magnitude too high to warrant use in many applications. This overhead arises from several time-consuming operations, including large-scale matrix-vector multiplication (MUL), number-theoretic transform (NTT), and especially the multi-scalar multiplication (MSM) which constitutes the largest proportion. Therefore, further efficiency improvements are needed.In this paper, we focus on comprehensive optimization of running time and storage space required by the MSM algorithm on GPUs. Specifically, we propose a novel, modular and adaptive parameter configuration technique—elastic MSM to enable us to adjust the scale of MSM according to our own wishes by performing a corresponding amount of preprocessing. This technique enables us to fully unleash the potential of various efficient parallel MSM algorithms. We have implemented and tested elastic MSM over three prevailing parallel Pippenger algorithms on GPUs. Across various preprocessing space limitations (across various MSM scales), our constructions achieve up to about 1.90×, 1.08× and 1.36× (2.58×, 1.39× and 1.91×) speedup versus three state-of-the-art parallel Pippenger algorithms on GPUs, respectively.From another perspective, elastic MSM could also be regarded as a preprocessing technique over the well-known Pippenger algorithm, which is modular and could be used to accelerate almost all the most advanced parallel Pippenger algorithms on GPUs. Meanwhile, elastic MSM provides an adaptive trade-off between the running time and the extra storage space needed by parallel Pippenger algorithms on GPUs. This is the first preprocessing technique to retain the improved MSM computation brought by preprocessing under varying storage space limitations. Specifically, across various preprocessing space limitations (across various MSM scales), our constructions achieve up to about 192× and 223× (159× and 174×) speedup versus two state-ofthe- art preprocessing parallel Pippenger algorithms on GPUs, respectively.
2024
TCHES
TRNG Entropy Model in the Presence of Flicker FM Noise
Flicker Frequency Modulated (FM) noise, which influences free-running Ring Oscillators (ROs), can make a substantial contribution to the entropy generated by RO-based True Random Number Generators (TRNGs). While current TRNG stochastic models predominantly concentrate on white FM noise, the addition of flicker FM noise could remarkably enrich the analysis of the TRNG entropy production rate. This paper introduces an entropy model for TRNGs, employing Gaussian processes, to estimate entropy generation from both white FM and flicker FM noise. We analytically derive the flicker FM noise Auto-Correlation Function (ACF), enabling assessment of entropy contributions conditioned on partial knowledge of the TRNG’s internal state. Utilizing the developed model with commonly reported noise magnitudes found in literature, it is determined that flicker FM noise holds the potential to substantially enhance the TRNG’s entropy rate. However, due to considerable variation in reported magnitudes across limited available research on flicker FM noise, it cannot yet be universally accepted as a dependable source of TRNG entropy.
2024
TCHES
Combined Threshold Implementation
Physical security is an important aspect of devices for which an adversary can manipulate the physical execution environment. Recently, more and more attention has been directed towards a security model that combines the capabilities of passive and active physical attacks, i.e., an adversary that performs fault-injection and side-channel analysis at the same time. Implementing countermeasures against such a powerful adversary is not only costly but also requires the skillful combination of masking and redundancy to counteract all reciprocal effects.In this work, we propose a new methodology to generate combined-secure circuits. We show how to transform Threshold Implementation (TI)-like constructions to resist any adversary with the capability to tamper with internal gates and probe internal wires. For the resulting protection scheme, we can prove the combined security in a well-established theoretical security model.Since the transformation preserves the advantages of TI-like structures, the resulting circuits prove to be more efficient in the number of required bits of randomness (up to 100%), the latency in clock cycles (up to 40%), and even the area for pipelined designs (up to 40%) than the state of the art for an adversary restricted to manipulating a single gate and probing a single wire.
2024
TCHES
Improved High-Order Masked Generation of Masking Vector and Rejection Sampling in Dilithium
for Dilithium, the post-quantum signature scheme recently standardized by NIST. We improve the masked generation of the masking vector y, based on a fast Booleanto- arithmetic conversion modulo q. We also describe an optimized gadget for the high-order masked rejection sampling, with a complexity independent from the size of the modulus q. We prove the security of our gadgets in the classical ISW t-probing model. Finally, we detail our open-source C implementation of these gadgets integrated into a fully masked Dilithium implementation, and provide an efficiency comparison with previous works.
2024
TCHES
pyecsca: Reverse engineering black-box elliptic curve cryptography via side-channel analysis
Side-channel attacks on elliptic curve cryptography (ECC) often assume a white-box attacker who has detailed knowledge of the implementation choices taken by the target implementation. Due to the complex and layered nature of ECC, there are many choices that a developer makes to obtain a functional and interoperable implementation. These include the curve model, coordinate system, addition formulas, and the scalar multiplier, or lower-level details such as the finite-field multiplication algorithm. This creates a gap between the attack requirements and a real-world attacker that often only has black-box access to the target – i.e., has no access to the source code nor knowledge of specific implementation choices made. Yet, when the gap is closed, even real-world implementations of ECC succumb to side-channel attacks, as evidenced by attacks such as TPM-Fail, Minerva, the Side Journey to Titan, or TPMScan [MSE+20; JSS+20; RLM+21; SDB+24].We study this gap by first analyzing open-source ECC libraries for insight into realworld implementation choices. We then examine the space of all ECC implementations combinatorially. Finally, we present a set of novel methods for automated reverse engineering of black-box ECC implementations and release a documented and usable open-source toolkit for side-channel analysis of ECC called pyecsca.Our methods turn attacks around: instead of attempting to recover the private key, they attempt to recover the implementation configuration given control over the private and public inputs. We evaluate them on two simulation levels and study the effect of noise on their performance. Our methods are able to 1) reverse-engineer the scalar multiplication algorithm completely and 2) infer significant information about the coordinate system and addition formulas used in a target implementation. Furthermore, they can bypass coordinate and curve randomization countermeasures.
2024
TCHES
Phase Modulation Side Channels: Jittery JTAG for On-Chip Voltage Measurements
Measuring fluctuations of the clock phase was identified as a source of leakage in early electromagnetic side-channel investigations. Despite this, only recently was measuring the clock phase (or jitter) of digital signals (not electromagnetic signals) from a target used as a source of exploitable leakage. As the phase of a clock output will be related to signal propagation delay through the target, and this propagation delay is related to voltage, this means that most digital devices perform an unintended phase modulation (PM) of their internal voltage onto clock outputs.This paper first demonstrates an unprofiled CPA attack against a Cortex-M microcontroller using the phase of a clock output, observing the signal on both optically isolated and capacitively isolated paths. The unprofiled attack takes only 2–4x more traces than an attack using a classic shunt-resistor measurement.It is then demonstrated how the JTAG bypass mode can be used to force a clock through a digital device. This forced clock signal can then be used as a highly effective oscilloscope that is located on the target device. As the attack does not require modifications to the device (such as capacitor removal or heat spreader removal) it is difficult to detect using existing countermeasures. The example attack over JTAG uses an unprofiled CPA attack, requiring only about 5x more traces than an ideal shunt-resistor based measurement. In addition, a version of this attack using a fault correlation analysis attack is also demonstrated.Countermeasures are discussed, and a simple resampling countermeasure is tested. All tools both offensive and defensive presented in the paper have been released under open-source licenses.
2024
TCHES
Switching Off your Device Does Not Protect Against Fault Attacks
Physical attacks, and among them fault injection attacks, are a significant threat to the security of embedded systems. Among the means of fault injection, laser has the significant advantage of being extremely spatially accurate. Numerous state-of-the-art studies have investigated the use of lasers to inject faults into a target at run-time. However, the high precision of laser fault injection comes with requirements on the knowledge of the implementation and exact execution time of the victim code. The main contribution of this work is the demonstration on experimental basis that it is also possible to perform laser fault injection on an unpowered device. Specifically, we targeted the Flash non-volatile memory of a 32-bit microcontroller. The advantage of this new attack path is that it does not require any synchronisation between the victim and the attacker. We provide an experimental characterization of this phenomenon with a description of the fault model from the physical level up to the software level. Finally, we applied these results to carry out a persistent fault analysis on a 128-bit AES with a particularly realistic attacker model which reinforces the interest of the PFA.
2024
TCHES
Robust but Relaxed Probing Model
Masking has become a widely applied and heavily researched method to protect cryptographic implementations against Side-Channel Analysis (SCA) attacks. The success of masking is primarily attributed to its strong theoretical foundation enabling it to formally prove security by modeling physical properties through socalled probing models. Specifically, the robust d-probing model enables us to prove the security for arbitrarily masked hardware circuits, manually or with the assistance of automated tools, even when considering the imperfect nature of physical hardware, including the occurrence of physical defaults such as glitches. However, the generic strategy employed by the robust d-probing model comes with a downside: It tends to over-conservatively model the information leakage caused by glitches meaning that the robust d-probing model considers glitches that can never occur in practice. This implies that in theory, an adversary could gain more information than she would obtain in practice. From a designer’s perspective, this entails that (1) securely designed hardware circuits may need to be withdrawn due to potential insecurity under the robust d-probing model and (2) designs that satisfy the security requirements of the robust d-probing model may incur unnecessary overhead, such as increased circuit size or latency.In this work, we refine the formal treatment of glitches within the robust d-probing model to address glitches more accurately within a formal adversary model. Unlike the robust d-probing model, our approach considers glitches based on the operations performed and the data processed, ensuring that only manifesting glitches are accounted for. As a result, we introduce the Robust but Relaxed (RR) d-probing model, a formal adversary model maintaining the same level of security as the robust d-probing model but without the overly conservative treatment of glitches. Leveraging our new model, we prove the security of LUT-based Masked Dual-Rail with Pre-charge Logic (LMDPL) gadgets, a class of physically secure gadgets reported as insecure based on the robust d-probing model. We provide manual proofs and automated security evaluations employing an updated version of PROLEAD capable of verifying the security of masked circuits under our new model.
2024
TCHES
Masking FALCON’s Floating-Point Multiplication in Hardware
Floating-point arithmetic is a cornerstone in a wide array of computational domains, and it recently became a building block for the FALCON post-quantum digital signature algorithm. As a consequence, the side-channel security of these operations became under scrutiny. Recent works unveiled the first side-channel attack specifically targeting floating-point multiplication to steal secret cryptographic keys. Despite these new attacks on floating point arithmetic, there is no secure hardware design for side-channel leakage to date. A concurrent work has applied masking of floating-point multiplication in software [CC24], but their empirical validation still demonstrated significant first-order leakages. This paper presents the first hardware masking scheme for floating-point multiplication to mitigate side-channel attacks. Our technique extends the cryptographic masking principles that split all intermediate computations into multiple, random shares while preserving the output functionality. Our innovation also provides a design-time configurable first-order masked multiplier gadget that carries out integer multiplication, which can support future designs. To that end, we propose new hardware gadgets including Integer Multiplier, Carry Calculator, Secure MUX, Zero Check, and Mantissa Selection, and we prove their security in the PINI model. Moreover, we validate the desired firstorder side-channel security of our implementation on a Sakura-X FPGA board using 10 million measurements. We explore the design space with different architectural choices to trade-off performance for the area. Our implementation results show that masking overhead ranges between 5.42x-43.31x in the area and 2x-440x in throughput.
2024
TCHES
Static Leakage in Dual-Rail Precharge Logics
In recent research studies, an observable dependency has been found between the static power consumption of a Complementary Metal-Oxide-Semiconductor (CMOS) chip and its internally stored and processed data. For the most part, these studies have focused on utilizing the leakage currents as a side channel to conduct key-recovery attacks on cryptographic devices. There are two main reasons why information leakage through the static power side channel is considered particularly harmful for the security of implementations, namely 1) the low influence of noise due to averaging over time and 2) the ability to target secrets even outside of the time window that they are actively computed upon (data is leaked for as long as it is saved anywhere in the circuit). Hence, developing effective countermeasures against this threat is of significant importance for the security of cryptographic hardware. Hiding techniques known as Dual-Rail Precharge (DRP) logic have been proposed and studied in literature as an instrument to equalize a circuit’s dynamic power consumption irrespective of the processed data. The specific instance called improved Masked Dual-Rail Precharge Logic (iMDPL) is – despite its high overhead – known as one of the most potent and attractive DRP-based Side-Channel Analysis (SCA) countermeasures. While its ability to prevent data extraction through the dynamic power consumption is well studied and documented, we thoroughly analyze its susceptibility to Static Power Side-Channel Analysis (SPSCA) attacks in this work. To conduct our study we have taped-out a custom Application-Specific Integrated Circuit (ASIC) prototype in 65nm CMOS technology which contains multiple cryptographic co-processors protected by iMDPL, partially combined with other countermeasures. Additionally, it contains circuits protected by a new variant of iMDPL that we specifically hardened against SPSCA, which we call Static Robust iMDPL (SRiMDPL). Our careful experiments performed in a controlled environment under exploitation of voltage and temperature dependencies show that SRiMDPL circuits combined with modern hardware masking offer an extremely high level of security against both dynamic and static power SCA attacks. While the cost of such combinations is admittedly significant (≈ 108 kGE post-layout area for a corresponding PRESENT core), we obtain the strongest combined resistance to both power side channels that has been experimentally demonstrated on real silicon so far. In summary, we believe that our analysis can assist hardware designers in making important decisions on the trade-offs between cost and security that such countermeasures facilitate.
2024
TCHES
Through the Looking-Glass: Sensitive Data Extraction by Optical Probing of Scan Chains
There is an imminent trade-off between an Integrated Circuit (IC)’s testability and its physical security. While Design for Test (DfT) techniques, such as scan chains make the circuit’s physical behavior at runtime observable and easy to control, these techniques form a lucrative class of attack vectors with the potential to compromise the entire security architecture of the Device under Test (DuT). Moreover, with the rapid development of more complex technologies, the need for integration of DfT techniques even intensifies due to the requirement for faster time-to-market of cutting-edge ICs. In this work, we demonstrate that sensitive data can be extracted from the registers once their locations on the chip are identified by exploiting DfT structures and optically probing them — in this case, scan chains, even after the access to test mode is restricted. Furthermore, we show that also an obfuscated scan chain architecture can be fully reconstructed by using tools and techniques encountered in the Failure Analysis (FA) domain.
2024
TCHES
Bake It Till You Make It: Heat-induced Power Leakage from Masked Neural Networks
Masking has become one of the most effective approaches for securing hardware designs against side-channel attacks. Regardless of the effort put into correctly implementing masking schemes on a field-programmable gate array (FPGA), leakage can be unexpectedly observed. This is due to the fact that the assumption underlying all masked designs, i.e., the leakages of different shares are independent of each other, may no longer hold in practice. In this regard, extreme temperatures have been shown to be an important factor in inducing leakage, even in correctlymasked designs. This has previously been verified using an external heat generator (i.e., a climate chamber). In this paper, we examine whether the leakage can be induced using the circuit components themselves without making any changes to the design. Specifically, we target masked neural networks (NNs) in FPGAs, one of the main building blocks of which is block random access memory (BRAM). In this respect, thanks to the inherent characteristics of NNs, our novel internal heat generators leverage solely the memories devoted to storing the user’s input, especially when frequently writing alternating patterns into BRAMs. The possibility of observing first-order leakage is evaluated by considering one of the most recent and successful first-order secure masked NNs, namely ModuloNET. ModuloNET is specifically designed for FPGAs, where BRAMs are used to store inputs and intermediate computations. Our experimental results demonstrate that undesirable first-order leakage can be observed and exploited by increasing the temperature when an alternating input is applied to the masked NN. To give a better understanding of the impact of extreme heat, we further perform a similar test on the design using an external heat generator, where a similar conclusion can be drawn.
2024
TCHES
FaultDetective: Explainable to a Fault, from the Design Layout to the Software
Hardware faults are a known source of security vulnerabilities. Fault injection in secure embedded systems leads to information leakage and privilege escalation, and countless fault attacks have been demonstrated both in simulation and in practice. However, there is a significant gap between simulated fault attacks and physical fault attacks. Simulations use idealized fault models such as single-bit flips with uniform distribution. These ideal fault models may not hold in practice. On the other hand, practical experiments lack the white-box visibility necessary to determine the true nature of the fault, leading to probabilistic vulnerability assessments and unexplained results. In embedded software, this problem is further exacerbated by the layered abstractions between the hardware (where the fault originates) and the application software (where the fault effect is observed). We present FaultDetective, a method to investigate the root-cause of fault injection from fault detection in software. Our main insight is that fault detection in software is only the end-point of a chain of events that starts with a fault manifestation in hardware and propagates through the micro-architecture and architecture before reaching the software level. To understand the fault effects at the hardware level, we use a scan chain, a low-level hardware test structure. We then use white-box simulation to propagate and observe hardware faults in the embedded software. We efficiently visualize the fault propagation across abstraction levels using a hash-tree representation of the scan chain. We implement this concept in a multi-core MSP430 micro-controller that redundantly executes an application in lock-step. With this setup, we observe the fault effects for several different stressors, including clock glitching and thermal laser stimulation, and explain the root-cause in each case.
2024
TCHES
Improved Circuit Synthesis with Multi-Value Bootstrapping for FHEW-like Schemes
In recent years, the research community has made great progress in improving techniques for privacy-preserving computation, such as fully homomorphic encryption (FHE). Despite the progress, there remain open challenges, mainly in performance and usability, to further advance the adoption of these technologies. This work provides multiple contributions that improve the current state-of-the-art in both areas. More specifically, we significantly simplify the multi-value bootstrapping by Carpov, Izabachène, and Mollimard [CIM19] for Boolean-based FHE schemes such as FHEW or TFHE, making the concept usable in practice. Based on our simplifications, we implement an easy-to-use interface for multi-value bootstrapping in the open-source library FHE-Deck [fhe23], derive new parameter sets for multi-bit encryptions with state-of-the-art security, and build a toolset that translates high-level code to multi-bit operations on encrypted data using circuit synthesis. We propose and integrate the first non-trivial FHE-specific optimizations for privacy-preserving circuit synthesis: look-up table (LUT) grouping and adder substitution. Using LUT grouping, we reduce the number of bootstrapping operations by almost 40% on average, while for adder substitution, we reduce the number of required bootstrappings by up to 85% for certain use cases. Overall, the execution time is up to 4.2x faster with all optimizations enabled compared to previous state-of-the-art circuit synthesis.
2024
TCHES
An Algebraic Approach for Evaluating Random Probing Security With Application to AES
We employ an algebraic approach to estimate the success rate of a sidechannel adversary attacking secrets of a masked circuit within the Random Probing Model (RPM), where intermediate variables of the implementation leak with a probability p. Our method efficiently handles masked linear circuits, enabling security bound estimation for practically large masking orders. For non-linear circuits, we employ a linearization technique. To reason about the security of complex structures like an S-box, we introduce a composition theorem, reducing the RPM security of a circuit to that of its constituent gadgets. Moreover, we lower the complexity of the multiplication gadget of CHES 2016 from O(n2 log(n)) to O(n2) while demonstrating its conjectured RPM security. Collectively, these novel methods enable the development of a practical masking scheme with O(n2) complexity for AES, maintaining security for a considerably high leakage rate p ≤ 0.02 ≈ 2−5.6.
2024
TCHES
Prime Masking vs. Faults - Exponential Security Amplification against Selected Classes of Attacks
Fault injection attacks are a serious concern for cryptographic hardware. Adversaries may extract sensitive information from the faulty output that is produced by a cryptographic circuit after actively disturbing its computation. Alternatively, the information whether an output would have been faulty, even if it is withheld from being released, may be exploited. The former class of attacks, which requires the collection of faulty outputs, such as Differential Fault Analysis (DFA), then either exploits some knowledge about the position of the injected fault or about its value. The latter class of attacks, which can be applied without ever obtaining faulty outputs, such as Statistical Ineffective Fault Attacks (SIFA), then either exploits a dependency between the effectiveness of the fault injection and the value to be faulted (e.g., an LSB stuck-at-0 only affecting odd numbers), denoted as SIFA-1, or a conditional propagation of a faulted value based on a sensitive intermediate (e.g., multiplication of a faulted value by 0 prevents propagation), denoted as SIFA-2. The aptitude of additive masking schemes, which were designed to prevent side-channel analysis, to also thwart fault attacks is typically assumed to be limited. Common fault models, such as toggle/bit-flip, stuck-at-0 or stuck-at-1 survive the recombination of Boolean shares well enough for generic attacks to succeed. More precisely, injecting a fault into one or multiple Boolean shares often results in the same, or at least a predictable, error appearing in the sensitive variable after recombination. In this work, we show that additive masking in prime-order fields breaks such relationships, causing frequently exploited biases to decrease exponentially in the number of shares. As a result, prime masking offers surprisingly strong protection against generic statistical attacks, which require a dependency between the effectiveness of an injected fault and the secret variable that is manipulated, such as SIFA-1. Operation-dependent statistical attacks, such as SIFA-2 and Fault Template Attacks (FTA), may still be performed against certain prime-field structures, even if they are masked with many shares. Yet, we analyze the corresponding cases and are able to provide specific guidelines on how to avoid vulnerabilities either at the cipher design or implementation level by making informed decisions about the primes, non-linear mappings and masked gadgets used. Since prime-field masking appears to be one of the rare instances of affordable countermeasures that naturally provide sound protection against side-channel analysis and certain fault injection attacks, we believe there is a strong incentive for developing new ciphers to leverage these advantages.
2024
TCHES
SAT-based Formal Verification of Fault Injection Countermeasures for Cryptographic Circuits
Fault injection attacks represent a type of active, physical attack against cryptographic circuits. Various countermeasures have been proposed to thwart such attacks, however, the design and implementation of which are intricate, error-prone, and laborious. The current formal fault-resistance verification approaches are limited in efficiency and scalability. In this paper, we formalize the fault-resistance verification problem and show that it is coNP-complete. We then devise a novel approach for encoding the fault-resistance verification problem as the Boolean satisfiability (SAT) problem so that modern off-the-shelf SAT solvers can be utilized. The approach is implemented in an open-source tool FIRMER which is evaluated extensively on realistic cryptographic circuit benchmarks. The experimental results show that FIRMER is able to verify fault-resistance of almost all (72/76) benchmarks in 3 minutes (the other three are verified in 35 minutes and the hardest one is verified in 4 hours). In contrast, the prior approach fails on 31 fault-resistance verification tasks even after 24 hours (per task).
2024
TCHES
SPA-GPT: General Pulse Tailor for Simple Power Analysis Based on Reinforcement Learning: - Long Paper -
In side-channel analysis of public-key algorithms, we usually classify operations based on the differences in power traces produced by different basic operations (such as modular square or modular multiplication) to recover secret information like private keys. The more accurate the segmentation of power traces, the higher the efficiency of their classification. There exist two commonly used methods: one is equidistant segmentation, which requires a fixed number of basic operations and similar trace lengths for each type of operation, leading to limited application scenarios; the other is peak-based segmentation, which relies on personal experience to configure parameters, resulting in insufficient flexibility and poor universality. In this paper, we propose an automated trace segmentation method based on reinforcement learning applicable to a wide range of common implementation of public-key algorithms. The introduction of reinforcement learning, which doesn’t need labels, into trace processing for side-channel analysis marks its debut in this field. Our method has good universality on the traces with varying segment lengths and differing peak heights. By using prioritized experience replay optimized Deep Q-Network algorithm, we reduce the required number of parameters to one, which is the key length. We also employ various techniques to improve the segmentation effectiveness, such as clustering algorithm and enveloped-based feature enhancement. We validate the effectiveness of the new method in nine scenarios involving hardware and software implementations of different public-key algorithms executed on diverse platforms such as microcontrollers, SAKURA-G, and smart cards. Specifically, one of these implementations is protected by time randomization countermeasures. Experimental results show that a basic version of our method can correctly segment most traces. The enhanced version is capable of reconstructing the sequence of operations during trace segmentation, achieving an accuracy rate of 100% for the majority of the traces. For traces that cannot be entirely restored, we utilize reward values of reinforcement learning to correct errors and achieve fully recovery. We also conducted comparative experiments with supervised seq2seq methods, revealing our approach’s 42% higher accuracy in operation recovery and 96% faster time efficiency. In addition, we applied our method to the post-quantum cryptography Kyber, and successfully recovered an intermediate value crucial for deriving the secret key. Besides, power traces collected from these devices have been uploaded as open databases, which are available for researchers engaged in public-key algorithms to conduct related experiments or verify our method.
2024
TCHES
Enabling PERK and other MPC-in-the-Head Signatures on Resource-Constrained Devices
One category of the digital signatures submitted to the NIST Post-Quantum Cryptography Standardization Process for Additional Digital Signature Schemes comprises proposals constructed leveraging the MPC-in-the-Head (MPCitH) paradigm. Typically, this framework is characterized by the computation and storage in sequence of large data structures both in signing and verification algorithms, resulting in heavy memory consumption. While some research on the efficiency of these schemes on high-performance machines has been done, studying their performance and optimization on resource-constrained ones still needs to be explored. In this work, we aim to address this gap by (1) introducing a general method to reduce the memory footprint of MPCitH schemes and analyzing its application to several MPCitH proposed schemes in the NIST Standardization Process. Additionally, (2) we conduct a detailed examination of potential memory optimizations in PERK, resulting in a streamlined version of the signing and verification algorithms with a reduced memory footprint ranging from 22 to 85 KB, down from the original 0.3 to 6 MB. Finally, (3) we introduce the first implementation of PERK tailored for Arm Cortex M4 alongside extensive experiments and comparisons against reference implementations.
2024
TCHES
Closing the Gap: Leakage Contracts for Processors with Transitions and Glitches
Security verification of masked software implementations of cryptographic algorithms must account for microarchitectural side-effects of CPUs. Leakage contracts were proposed to provide a formal separation between hardware and software verification, ensuring interoperability and end-to-end security for independently verified components. However, previously proposed leakage contracts did not consider a class of ephemeral hardware effects called glitches, which leaves a considerable gap between security models and the capabilities of real-world attackers. We address this issue by extending the model for leakage contracts to account for glitches and transitions. We further present the first end-to-end verification tool for transient leakage contracts. Our hardware and software verification rely on the same contract as a single source of truth, facilitating fully machine-checked verification from the hardware gate level to the software. By allowing contracts to be written in the C programming language we make power contracts more accessible and intuitive for system-level engineers. To showcase the efficacy of our approach, we apply it to the RISC-V Ibex core. We show that it is possible to write a power contract for Ibex without any modifications to the hardware design. Using this contract, we prove end-to-end security between masked software and gate-level hardware.
2024
TCHES
Another Evidence to not Employ Customized Masked Hardware: Identifying and Fixing Flaws in SCARV
As a well-studied countermeasure against side-channel analysis attacks, there is a general interest in applying masking to different cryptographic functions executed on different platforms. On the one hand, despite their high performance, masked hardware implementations are dedicated to specific algorithms, making them inflexible. On the other hand, applying masking on software involves serious challenges including significant overhead in terms of efficiency and difficulties to maintain theoretical security guarantees in practice. As a result, a line of research has been devoted to enable masked operations in flexible platforms (i.e., microprocessors) by including some masked modules in their hardware, hence a combination of flexibility and performance. In such scenarios, RISC-V is a natural choice as hardware can be adjusted to the extended instruction set. One such attempt presented at CHES 2021 is known as SCARV, which extends the Instruction Set Architecture (ISA) of a RISC-V core with a rich number of first-order masked operations on both Boolean and arithmetic masked operands. In this work, we conduct a comprehensive analysis of SCARV. Instead of relying on empirical measurements to demonstrate security, we utilize tool-assisted evaluations. Through these evaluations, we identified a couple of design flaws that lead to leakage in the masked implementations hosted by the corresponding processor. These flaws are primarily due to the lack of composability of cascaded components. While heuristic and ad-hoc design principles can result in secure, small, and efficient designs, they lack formal security proofs, which may lead to security flaws, like those we report here. Consequently, this work serves as a motivation for using composable masked modules and tool-assisted evaluations when constructing complex circuits.
2024
TCHES
A Not So Discrete Sampler: Power Analysis Attacks on HAWK signature scheme
HAWK is a lattice-based signature scheme candidate to the fourth call of the NIST’s Post-Quantum standardization campaign. Considered as a cousin of Falcon (one of the future NIST post-quantum standards) one can wonder whether HAWK shares the same drawbacks as Falcon in terms of side-channel attacks. Indeed, Falcon signature algorithm and particularly its Gaussian sampler, has shown to be highly vulnerable to power-analysis attacks. Besides, efficiently protecting Falcon’s signature algorithm against these attacks seems a very challenging task. This work presents the first power analysis leakage review on HAWK signature scheme: it extensively assesses the vulnerabilities of a central and sensitive brick of the scheme, the discrete Gaussian sampler. Knowing the output x of the sampler for a given signature leads to linear information about the private key of the scheme. This paper includes several demonstrations of simple power analysis attacks targeting this sample x with various attacker strengths, all of them performed on the reference implementation on a ChipWhisperer Lite with STM32F3 target (ARM Cortex M4). We report being able to perform key recoveries with very low (to no) offline resources. As this reference implementation of HAWK is not claimed to be protected against side-channel attacks, the existence of such attacks is not surprising, but they still concretely warn about the use of this unprotected signature on physical devices. To go further, our study proposes a generic way of assessing the performance of a sidechannel attack on x even when less information is recovered, in a setting where some protections are implemented or when the attacker has less measurement possibilities. While it is easy to see that x is a sensitive value, quantifying the residual complexity of the key recovery with some knowledge about x (like the parity or the sign of some coefficients) is not straightforward as the underlying hardness assumption is the newly introduced Module-LIP problem. We propose to adapt the existing methodology of leaky LWE estimation tools (Dachman-Soled et al. at Crypto 2020) to exploit the retrieved information and lower down the residual key recovery complexity. To finish, we propose an ad-hoc technique to lower down the leakage on the identified vulnerability points. These modifications prevent our attacks on our platform and come with essentially no cost in terms of performance. It could be seen as a temporary solution and encourages more analysis on proven side-channel protection of HAWK like masking.
2024
TCHES
Fault-Resistant Partitioning of Secure CPUs for System Co-Verification against Faults
Fault injection attacks are a serious threat to system security, enabling attackers to bypass protection mechanisms or access sensitive information. To evaluate the robustness of CPU-based systems against these attacks, it is essential to analyze the consequences of the fault propagation resulting from the complex interplay between the software and the processor. However, current formal methodologies combining hardware and software face scalability issues due to the monolithic approach used. To address this challenge, this work formalizes the k-fault-resistant partitioning notion to solve the fault propagation problem when assessing redundancy-based hardware countermeasures in a first step. Proven security guarantees can then reduce the remaining hardware attack surface when introducing the software in a second step. First, we validate our approach against previous work by reproducing known results on cryptographic circuits. In particular, we outperform state-of-the-art tools for evaluating AES under a three-fault-injection attack. Then, we apply our methodology to the OpenTitan secure element and formally prove the security of its CPU’s hardware countermeasure to single bit-flip injections. Besides that, we demonstrate that previously intractable problems, such as analyzing the robustness of OpenTitan running a secure boot process, can now be solved by a co-verification methodology that leverages a k-fault-resistant partitioning. We also report a potential exploitation of the register file vulnerability in two other software use cases. Finally, we provide a security fix for the register file, prove its robustness, and integrate it into the OpenTitan project.
2024
TCHES
CrISA-X: Unleashing Performance Excellence in Lightweight Symmetric Cryptography for Extendable and Deeply Embedded Processors
The efficient execution of a Lightweight Cryptography (LWC) algorithm is essential for edge computing platforms. Dedicated Instruction Set Extensions (ISEs) are often included for this purpose. We propose the CrISA-X-a Cryptography Instruction Set Architecture eXtensions designed to improve cryptographic latency on extendable processors. CrISA-X, provides enhanced speed of various algorithms simultaneously while optimizing ISA adaptability, a feat yet to be accomplished. The extension, diverse for several computation levels, is first tailored explicitly for individual algorithms and sets of LWC algorithms, depending on performance, frequency, and area trade-offs. By diligently applying the Min-Max optimization technique, we have configured these extensions to achieve a delicate balance between performance, area utilization, code size, etc. Our study presents empirical evidence of the performance enhancement achieved on a synthesis modular RISC processor. We offer a framework for creating optimized processor hardware and ISA extensions. The CrISA-X outperforms ISA extensions by delivering significant performance boosts between 3x to 17x while experiencing a relative area cost increase of +12% and +47% in LUTs. Notably, as one important example, the utilization of the ASCON algorithm yields a 10x performance boost in contrast to the base ISA instruction implementation.
2024
TCHES
Faster NTRU-based Bootstrapping in less than 4 ms
Bootstrapping is a critical technique in constructing fully homomorphic encryption (FHE), which serves to refresh the noise in FHE ciphertexts, enabling an arbitrary number of homomorphic operations. Among published results, the TFHE-rs library [Zam22] offers the fastest bootstrapping implementation on CPU platforms by taking advantage of AVX-512 instructions.In this paper, we improve the efficiency of the bootstrapping algorithm based on the NTRU problem. First, we introduce the approximate gadget decomposition method tailored for NTRU ciphertext, reducing the number of NTT operations required for external products. Second, by integrating the approximate decomposition and key unrolling techniques, we improve the performance of CMux-based blind rotation. Third, for the automorphism-based blind rotation method, we present a hybrid window size technique that reduces the number of automorphisms by 34% compared to recent work [XZD+23](in Crypto23).Subsequently, we implement the proposed bootstrapping algorithm on the CPU platform with AVX instructions. Experimental results demonstrate that our method only takes 3.8ms, which achieves a 1.8× speedup compared to the TFHE-rs library. Finally, we propose an efficient FPGA accelerator based on the CMux method, which not only achieves the best performance but also exhibits high throughput advantages. Our accelerator can improve performance by 2x compared to state-of-the-art FPGA implementations (e.g., FPT).
2024
TCHES
On the (Im)possibility of Preventing Differential Computation Analysis with Internal Encodings
White-box cryptography aims at protecting implementations of cryptographic algorithms against a very powerful attacker who controls the execution environment. The first defensive brick traditionally embedded in such implementations consists of encodings, which are bijections supposed to conceal sensitive data manipulated by the white-box. Several previous works have sought to evaluate the relevance of encodings to protect white-box implementations against grey-box attacks such as Differential Computation Analysis (DCA). However, these works have been either probabilistic or partial in nature. In particular, while they showed that DCA succeeds with high probability against AES white-box implementations protected by random encodings, they did not refute the existence of a particular class of encodings that could prevent the attack. One could thus wonder if carefully crafting specific encodings instead of drawing random bijections could be a solution.This article bridges the gap between preceding research efforts and investigates this question. We first focus on the protection of the S-box output and we show that no 4-bit encoding can actually protect this sensitive value against side-channel attacks. We then argue that the use of random 8-bit encodings is both necessary and sufficient, but that this assertion holds exclusively for the S-box output. Indeed, while we define a class of 8-bit encodings that actually prevents a classical DCA targeting the MixColumns output, we also explain how to adapt this attack and exploit the correlation traces in order to defeat even these specific encodings. Our work thus rules out the existence of a set of practical encodings that could be used to protect an AES white-box implementation against DCA-like attacks.
2024
TCHES
Generalized Power Attacks against Crypto Hardware using Long-Range Deep Learning
To make cryptographic processors more resilient against side-channel attacks, engineers have developed various countermeasures. However, the effectiveness of these countermeasures is often uncertain, as it depends on the complex interplay between software and hardware. Assessing a countermeasure’s effectiveness using profiling techniques or machine learning so far requires significant expertise and effort to be adapted to new targets which makes those assessments expensive. We argue that including cost-effective automated attacks will help chip design teams to quickly evaluate their countermeasures during the development phase, paving the way to more secure chips.In this paper, we lay the foundations toward such automated system by proposing GPAM, the first deep-learning system for power side-channel analysis that generalizes across multiple cryptographic algorithms, implementations, and side-channel countermeasures without the need for manual tuning or trace preprocessing. We demonstrate GPAM’s capability by successfully attacking four hardened hardware-accelerated elliptic-curve digital-signature implementations. We showcase GPAM’s ability to generalize across multiple algorithms by attacking a protected AES implementation and achieving comparable performance to state-of-the-art attacks, but without manual trace curation and within a limited budget. We release our data and models as an open-source contribution to allow the community to independently replicate our results and build on them.
2024
TCHES
Compress: Generate Small and Fast Masked Pipelined Circuits
Masking is an effective countermeasure against side-channel attacks. It replaces every logic gate in a computation by a gadget that performs the operation over secret sharings of the circuit’s variables. When masking is implemented in hardware, care should be taken to protect against leakage from glitches, which could otherwise undermine the security of masking. This is generally done by adding registers, which stop the propagation of glitches, but introduce additional latency and area cost. In masked pipeline circuits, a high latency further increases the area overheads of masking, due to the need for additional registers that synchronize signals between pipeline stages. In this work, we propose a technique to minimize the number of such pipeline registers, which relies on optimizing the scheduling of the computations across the pipeline stages. We release an implementation of this technique as an open-source tool, Compress. Further, we introduce other optimizations to deduplicate logic between gadgets, perform an optimal selection of masked gadgets, and introduce new gadgets with smaller area. Overall, our optimizations lead to circuits that improve the state-of-the art in area and achieve state-of-the-art latency. For example, a masked AES based on an S-box generated by Compress reduces latency by 19% and area by 27% over a state-of-the-art implementation, or, for the same latency, reduces area by 45%.
2024
TCHES
Thunderbird: Efficient Homomorphic Evaluation of Symmetric Ciphers in 3GPP by combining two modes of TFHE
Hybrid homomorphic encryption (a.k.a., transciphering) can alleviate the ciphertext size expansion inherent to fully homomorphic encryption by integrating a specific symmetric encryption scheme, which requires selected symmetric encryption scheme that can be efficiently evaluated homomorphically. While there has been a recent surge in the development of FHE-friendly ciphers, concerns have arisen regarding their security. A significant challenge for the transciphering community remains the efficient evaluation of symmetric encryption algorithms that have undergone extensive study and standardization.In this paper, we present an evaluation framework, dubbed Thunderbird, which for the first time presents efficient homomorphic implementations of stream ciphers SNOW 3G and ZUC that are standardized in the 3G Partnership Project (3GPP). Specifically, Thunderbird combines gate bootstrapping mode and leveled evaluation mode of TFHE to cater to various function types within symmetric encryption algorithms. In the gate bootstrapping mode, we propose a variant of the homomorphic full adder that consumes only a single blind rotation, which may be of independent interest. In the leveled evaluation mode, we employ the CMux gate combining with hybrid packing technique to efficiently achieve lookup tables, significantly reducing the need for gate bootstrapping, and adapt the current optimal circuit bootstrapping to expedite the Thunderbird framework. We have implemented the Thunderbird framework in the TFHEpp public library. Experimental results demonstrate that SNOW 3G and ZUC can homomorphically generate a keyword in only 7 seconds and 9.5 seconds, which are 52x and 32x faster than the trivial gate bootstrapping mode, respectively. For the homomorphic evaluation of the AES-128 algorithm using Thunderbird, we achieve a speedup of 1.9x in terms of latency and use less evaluation key compared to the state-of-the-art work.
2024
TCHES
1/0 Shades of UC: Photonic Side-Channel Analysis of Universal Circuits
A universal circuit (UC) can be thought of as a programmable circuit that can simulate any circuit up to a certain size by specifying its secret configuration bits. UCs have been incorporated into various applications, such as private function evaluation (PFE). Recently, studies have attempted to formalize the concept of semiconductor intellectual property (IP) protection in the context of UCs. This is despite the observations made in theory and practice that, in reality, the adversary may obtain additional information about the secret when executing cryptographic protocols. This paper aims to answer the question of whether UCs leak information unintentionally, which can be leveraged by the adversary to disclose the configuration bits. In this regard, we propose the first photon emission analysis against UCs relying on computer vision-based approaches. We demonstrate that the adversary can utilize a cost-effective solution to take images to be processed by off-the-shelf algorithms to extract configuration bits. We examine the efficacy of our method in two scenarios: (1) the design is small enough to be captured in a single image during the attack phase, and (2) multiple images should be captured to launch the attack by deploying a divide-and-conquer strategy. To evaluate the effectiveness of our attack, we use metrics commonly applied in side-channel analysis, namely rank and success rate. By doing so, we show that our profiled photon emission analysis achieves a success rate of 1 by employing a few templates (concretely, only 18 images were used as templates).
2024
TCHES
Low-Latency Masked Gadgets Robust against Physical Defaults with Application to Ascon
Low-latency masked hardware implementations are known to be a difficult challenge. On the one hand, the propagation of glitches can falsify their independence assumption (that is required for security) and can only be stopped by registers. This implies that glitch-robust masked AND gates (maintaining a constant number of shares) require at least one cycle. On the other hand, Knichel and Moradi’s only known single-cycle multiplication gadget that ensures (composable) security against glitches for any number of shares requires additional care to maintain security against transition-based leakages. For example, it cannot be integrated in a single-cycle roundbased architecture which is a natural choice for low-latency implementations. In this paper, we therefore describe the first single-cycle masked multiplication gadget that is trivially composable and provides security against transitions and glitches, and prove its security in the robust probing model. We then analyze the interest of this new gadget for the secure implementation of the future lightweight cryptography standard Ascon, which has good potential for low-latency. We show that it directly leads to improvements for uniformly protected implementations (where all computations are masked). We also show that it is can be handy for integration in so-called leveled implementations (where only the key derivation and the tag generation are masked, which provides integrity with leakage in encryption and decryption and confidentiality with leakage in encryption only). Most importantly, we show that it is very attractive for implementations that we denote as multi-target, which can alternate between uniformly protected and leveled implementations, without latency overheads and at limited cost. We complete these findings by evaluating different protected implementations of Ascon, clarifying its hardware design space.
2024
TCHES
Unboxing ARX-Based White-Box Ciphers: Chosen-Plaintext Computation Analysis and Its Applications
It has been proven that the white-box ciphers with small encodings will be vulnerable to algebraic and computation attacks. By leveraging the large encodings, the self-equivalence and implicit implementations are proposed for ARXbased white-box ciphers. Unfortunately, these two types of white-box implementations are proven to be insecure under the algebraic attack. Different from algebraic attacks, computation analysis can extract the secret key from the memory access traces without software reverse engineering. It is still an open problem whether the self-equivalence and implicit implementations can resist the computation analysis.In this paper, we analyze the encoded structure of the self-equivalence/implicit whitebox ARX ciphers and discuss its resistance to the computation analysis, such as differential computation analysis (DCA) and algebraic degree computation analysis (ADCA). The results reveal that the large input, encoding, and round key can practically mitigate DCA and ADCA. To deal with the large space, we introduce a new method which is named chosen-plaintext computation analysis (CP-CA). Based on a partial key guess and deliberately chosen intermediate value, CP-CA constructs a reverse function to calculate a set of plaintexts. With the obtained plaintexts, the large affine and non-linear encodings will be reduced to a small space. Subsequently, CP-CA mounts the computation analysis on the traces to recover the secret key. Following CP-CA, we propose a CP-DCA attack and reformulate ADCA as chosen-plaintext linear encoding analysis (CP-LEA). The experimental results indicate that the selfequivalence white-box SPECK32/48/64/96/128 and implicit white-box SPECK32/64 implementations are vulnerable to CP-DCA and CP-LEA attacks.
2024
TCHES
Polynomial sharings on two secrets: Buy one, get one free
While passive side-channel attacks and active fault attacks have been studied intensively in the last few decades, strong attackers combining these attacks have only been studied relatively recently. Due to its simplicity, most countermeasures against passive attacks are based on additive sharing. Unfortunately, extending these countermeasures against faults often leads to quite a significant performance penalty, either due to the use of expensive cryptographic operations or a large number of shares due to massive duplication. Just recently, Berndt, Eisenbarth, Gourjon, Faust, Orlt, and Seker thus proposed to use polynomial sharing against combined attackers (CRYPTO 2023). While they construct gadgets secure against combined attackers using only a linear number of shares, the overhead introduced might still be too large for practical scenarios.In this work, we show how the overhead of nearly all known constructions using polynomial sharing can be reduced by nearly half by embedding two secrets in the coefficients of one polynomial at the expense of increasing the degree of the polynomial by one. We present a very general framework that allows adapting these constructions to this new sharing scheme and prove the security of this approach against purely passive side-channel attacks, purely active fault attacks, and combined attacks. Furthermore, we present new gadgets allowing us to operate upon the different secrets in a number of useful ways.
2024
TCHES
White-box filtering attacks breaking SEL masking: from exponential to polynomial time
This work proposes a new white-box attack technique called filtering, which can be combined with any other trace-based attack method. The idea is to filter the traces based on the value of an intermediate variable in the implementation, aiming to fix a share of a sensitive value and degrade the security of an involved masking scheme.Coupled with LDA (filtered LDA, FLDA), it leads to an attack defeating the state-ofthe-art SEL masking scheme (CHES 2021) of arbitrary degree and number of linear shares with quartic complexity in the window size. In comparison, the current best attacks have exponential complexities in the degree (higher degree decoding analysis, HDDA), in the number of linear shares (higher-order differential computation analysis, HODCA), or the window size (white-box learning parity with noise, WBLPN). The attack exploits the key idea of the SEL scheme - an efficient parallel combination of the nonlinear and linear masking schemes. We conclude that a proper composition of masking schemes is essential for security.In addition, we propose several optimizations for linear algebraic attacks: redundant node removal (RNR), optimized parity check matrix usage, and chosen-plaintext filtering (CPF), significantly improving the performance of security evaluation of white-box implementations.
2024
TCHES
HAETAE: Shorter Lattice-Based Fiat-Shamir Signatures
We present HAETAE (Hyperball bimodAl modulE rejecTion signAture schemE), a new lattice-based signature scheme. Like the NIST-selected Dilithium signature scheme, HAETAE is based on the Fiat-Shamir with Aborts paradigm, but our design choices target an improved complexity/compactness compromise that is highly relevant for many space-limited application scenarios. We primarily focus on reducing signature and verification key sizes so that signatures fit into one TCP or UDP datagram while preserving a high level of security against a variety of attacks. As a result, our scheme has signature and verification key sizes up to 39% and 25% smaller, respectively, compared than Dilithium. We provide a portable, constanttime reference implementation together with an optimized implementation using AVX2 instructions and an implementation with reduced stack size for the Cortex-M4. Moreover, we describe how to efficiently protect HAETAE against implementation attacks such as side-channel analysis, making it an attractive candidate for use in IoT and other embedded systems.
2024
TCHES
A Deep Analysis of two Glitch-Free Hardware Masking Schemes SESYM and LMDPL
In the context of masking, which is the dominant technique for protecting cryptographic hardware designs against Side-Channel Analysis (SCA) attacks, the focus has long been on the design of masking schemes that guarantee provable security in the presence of glitches. Unfortunately, achieving this comes at the cost of increased latency, since registers are required to stop glitch propagation. Previous work has attempted to reduce latency by eliminating registers, but the exponential increase in area makes such approaches impractical. Some relatively new attempts have used Dual-Rail Pre-charge (DRP) logic styles to avoid glitches in algorithmically masked circuits. Promising approaches in this area include LUT-based Masked Dual-Rail with Pre-charge Logic (LMDPL) and Self-Synchronized Masking (SESYM), presented at CHES 2020 and CHES 2022 respectively. Both schemes allow masking of arbitrary functions with only one cycle latency. However, even if glitches no longer occur, there are other physical defaults that may violate the security of a glitch-free masked circuit. The imbalanced delay of dual rails is a known security problem for DRP logic styles such as Wave Dynamic Differential Logic (WDDL), but is not covered by the known security models, e.g., robust probing model.In this work, we illustrate that imbalanced signal delays pose a threat to the security of algorithmically masked circuits implemented with DRP logic, both in theory and practice. Notably, we underscore the security of LMDPL even when delays are taken into account, contrasting with the vulnerability observed in SESYM under similar conditions. Consequently, our findings highlight the critical importance of addressing imbalanced delays in the design of masked circuits using DRP logic. In particular, our findings motivate the need for an appropriate security model, and imply that relying solely on the probing security model and avoiding glitches may be insufficient to construct secure circuits.
2024
TCHES
Optimized Hardware-Software Co-Design for Kyber and Dilithium on RISC-V SoC FPGA
Kyber and Dilithium are both lattice-based post-quantum cryptography (PQC) algorithms that have been selected for standardization by the American National Institute of Standards and Technology (NIST). NIST recommends them as two primary algorithms to be implemented for most use cases. As the applications of RISC-V processors move from specialized scenarios to general scenarios, efficient implementations of PQC algorithms on general-purpose RISC-V platforms are required. In this work, we present an optimized hardware-software co-design for Kyber and Dilithium on the industry’s first RISC-V System-on-Chip (SoC) Field Programmable Gate Array (FPGA) platform. The performance of both algorithms is enhanced through the utilization of hardware acceleration and software optimization, while a certain level of flexibility is still maintained. The polynomial arithmetic operations in Kyber and Dilithium are accelerated by the customized accelerators. We employ a unified high-level architecture to depict their shared characteristics and design dedicated underlying modular multipliers to explore their distinctive features. The hashing functions are optimized using RISC-V assembly instructions, resulting in improved performance and reduced code size without additional hardware resources. For other operations involving matrices and vectors, we present a multi-core acceleration scheme based on the multi-core RISC-V Microprocessor Sub-System (MSS). Combining these acceleration and optimization methods, experimental results show that the overall performance of Kyber and Dilithium across different security levels improves by 3 to 5 times, while the utilized FPGA resources account for less than 5% of the total resources provided by the platform.
2024
TCHES
Automated Generation of Fault-Resistant Circuits
Fault Injection (FI) attacks, which involve intentionally introducing faults into a system to cause it to behave in an unintended manner, are widely recognized and pose a significant threat to the security of cryptographic primitives implemented in hardware, making fault tolerance an increasingly critical concern. However, protecting cryptographic hardware primitives securely and efficiently, even with wellestablished and documented methods such as redundant computation, can be a timeconsuming, error-prone, and expertise-demanding task. In this research, we present a comprehensive and fully-automated software solution for the Automated Generation of Fault-Resistant Circuits (AGEFA). Our application employs a generic and extensively researched methodology for the secure integration of countermeasures based on Error-Correcting Codes (ECCs) into cryptographic hardware circuits. Our software tool allows designers without hardware security expertise to develop fault-tolerant hardware circuits with pre-defined correction capabilities under a comprehensive fault adversary model. Moreover, our tool applies to masked designs without violating the masking security requirements, in particular to designs generated by the tool AGEMA. We evaluate the effectiveness of our approach through experiments on various block ciphers and demonstrate its ability to produce fault-tolerant circuits. Additionally, we assess the security of examples generated by AGEFA against Side-Channel Analysis (SCA) and FI using state-of-the-art leakage and fault evaluation tools.
2024
TCHES
Correction Fault Attacks on Randomized CRYSTALS-Dilithium
After NIST’s selection of Dilithium as the primary future standard for quantum-secure digital signatures, increased efforts to understand its implementation security properties are required to enable widespread adoption on embedded devices. Concretely, there are still many open questions regarding the susceptibility of Dilithium to fault attacks. This is especially the case for Dilithium’s randomized (or hedged) signing mode, which, likely due to devastating implementation attacks on the deterministic mode, was selected as the default by NIST.This work takes steps towards closing this gap by presenting two new key-recovery fault attacks on randomized/hedged Dilithium. Both attacks are based on the idea&lt; of correcting faulty signatures after signing. A successful correction yields the value of a secret intermediate that carries information on the key. After gathering many faulty signatures and corresponding correction values, it is possible to solve for thesigning key via either simple linear algebra or lattice-reduction techniques. Our first attack extends a previously published attack based on an instruction-skipping fault to the randomized setting. Our second attack injects faults in the matrix A, which is part of the public key. As such, it is not sensitive to side-channel leakage and has, potentially for this reason, not seen prior analysis regarding faults.We show that for Dilithium2, the attacks allow key recovery with as little as 1024 and 512 faulty signatures, with each signature generated by injecting a single targeted fault. We also demonstrate how our attacks can be adapted to circumvent several popular fault countermeasures with a moderate increase in the computational runtime and the number of required faulty signatures. These results are verified using both simulated faults and clock glitches on an ARM-based standard microcontroller. The presented attacks demonstrate that also randomized Dilithium can be subject to diverse fault attacks, that certain countermeasures might be easily bypassed, and that potential fault targets reach beyond side-channel sensitive operations. Still, many further operations are likely also susceptible, implying the need for increased analysis efforts in the future.
2024
TCHES
Hints from Hertz: Dynamic Frequency Scaling Side-Channel Analysis of Number Theoretic Transform in Lattice-Based KEMs
Number Theoretic Transform (NTT) has been widely used in accelerating computations in lattice-based cryptography. However, attackers can potentially launch power analysis targeting the NTT because it is one of the most time-consuming parts of the implementation. This extended time frame provides a natural window of opportunity for attackers. In this paper, we investigate the first CPU frequency leakage (Hertzbleed-like) attacks against NTT in lattice-based KEMs. Our key observation is that different inputs to NTT incur different Hamming weights in its output and intermediate layers. By measuring the CPU frequency during the execution of NTT, we propose a simple yet effective attack idea to find the input to NTT that triggers NTT processing data with significantly low Hamming weight. We further apply our attack idea to real-world applications that are built upon NTT: CPAsecure Kyber without Compression and Decompression functions, and CCA-secure NTTRU. This leads us to extract information or frequency hints about the secret key. Integrating these hints into the LWE-estimator framework, we estimate a minimum of 35% security loss caused by the leakage. The frequency and timing measurements on the Reference and AVX2 implementations of NTT in both Kyber and NTTRU align well with our theoretical analysis, confirming the existence of frequency side-channel leakage in NTT. It is important to emphasize that our observation is not limited to a specific implementation but rather the algorithm on which NTT is based. Therefore, our results call for more attention to the analysis of power leakage against NTT in lattice-based cryptography.
2024
TCHES
Evict+Spec+Time: Exploiting Out-of-Order Execution to Improve Cache-Timing Attacks
Speculative out-of-order execution is a strategy of masking execution latency by allowing younger instructions to execute before older instructions. While originally considered to be innocuous, speculative out-of-order execution was brought into the spotlight with the 2018 publication of the Spectre and Meltdown attacks. These attacks demonstrated that microarchitectural side channels can leak sensitive data accessed by speculatively executed instructions that are not part of the normal program execution. Since then, a significant effort has been vested in investigating how microarchitectural side channels can leak data from speculatively executed instructions and how to control this leakage. However, much less is known about how speculative out-of-order execution affects microarchitectural side-channel attacks.In this paper, we investigate how speculative out-of-order execution affects the Evict+Time cache attack. Evict+Time is based on the observation that cache misses are slower than cache hits, hence by measuring the execution time of code, an attacker can determine if a cache miss occurred during the execution. We demonstrate that, due to limited resources for tracking out-of-order execution, under certain conditions an attacker can gain more fine-grained information and determine whether a cache miss occurred in part of the executed code.Based on the observation, we design the Evict+Spec+Time attack, a variant of Evict+Time that can learn not only whether a cache miss occurred, but also in which part of the victim code it occurred. We demonstrate that Evict+Spec+Time is an order of magnitude more efficient than Evict+Time when attacking a T-tables-based implementation of AES. We further show an Evict+Spec+Time attack on an S-boxbased implementation of AES, recovering the key with as little as 14 815 decryptions. To the best of our knowledge, ours is the first successful Evict+Time attack on such a victim.
2024
TCHES
Time Sharing - A Novel Approach to Low-Latency Masking
We present a novel approach to small area and low-latency first-order masking in hardware. The core idea is to separate the processing of shares in time in order to achieve non-completeness. Resulting circuits are proven first-order glitchextended PINI secure. This means the method can be straightforwardly applied to mask arbitrary functions without constraints which the designer must take care of. Furthermore we show that an implementation can benefit from optimization through EDA tools without sacrificing security. We provide concrete results of several case studies. Our low-latency implementation of a complete PRINCE core shows a 32% area improvement (44% with optimization) over the state-of-the-art. Our PRINCE S-Box passes formal verification with a tool and the complete core on FPGA shows no first-order leakage in TVLA with 100 million traces. Our low-latency implementation of the AES S-Box costs roughly one third (one quarter with optimization) of the area of state-of-the-art implementations. It shows no first-order leakage in TVLA with 250 million traces.
2024
TCHES
Efficient Table-Based Masking with Pre-processing
Masking is one of the most investigated countermeasures against sidechannel attacks. In a nutshell, it randomly encodes each sensitive variable into a number of shares, and compiles the cryptographic implementation into a masked one that operates over the shares instead of the original sensitive variables. Despite its provable security benefits, masking inevitably introduces additional overhead. Particularly, the software implementation of masking largely slows down the cryptographic implementations and requires a large number of random bits that need to be produced by a true random number generator. In this respect, reducing the&lt; overhead of masking is still an essential and challenging task. Among various known schemes, Table-Based Masking (TBM) stands out as a promising line of work enjoying the advantages of generality to any lookup tables. It also allows the pre-processing paradigm, wherein a pre-processing phase is executed independently of the inputs, and a much more efficient online (using the precomputed tables) phase takes place to calculate the result. Obviously, practicality of pre-processing paradigm relies heavily on the efficiency of online phase and the size of precomputed tables.In this paper, we investigate the TBM scheme that offers a combination of linear complexity (in terms of the security order, denoted as d) during the online phase and small precomputed tables. We then apply our new scheme to the AES-128, and provide an implementation on the ARM Cortex architecture. Particularly, for a security order d = 8, the online phase outperforms the current state-of-the-art AES implementations on embedded processors that are vulnerable to the side-channel attacks. The security order of our scheme is proven in theory and verified by the T-test in practice. Moreover, we investigate the speed overhead associated with the random bit generation in our masking technique. Our findings indicate that the speed overhead can be effectively balanced. This is mainly because that the true random number generator operates in parallel with the processor’s execution, ensuring a constant supply of fresh random bits for the masked computation at regular intervals.
2024
TCHES
Optimized Homomorphic Evaluation of Boolean Functions
We propose a new framework to homomorphically evaluate Boolean functions using the Torus Fully Homomorphic Encryption (TFHE) scheme. Compared to previous approaches focusing on Boolean gates, our technique can evaluate more complex Boolean functions with several inputs using a single bootstrapping. This allows us to greatly reduce the number of bootstrapping operations necessary to evaluate a Boolean circuit compared to previous works, thus achieving significant improvements in terms of performances. We define theoretically our approach which consists in adding an intermediate homomorphic layer between the plain Boolean space and the ciphertext space. This layer relies on so-called p-encodings embedding bits into Zp. We analyze the properties of these encodings to enable the evaluation of a given Boolean function and provide a deterministic algorithm (as well as an efficient heuristic) to find valid sets of encodings for a given function. We also propose a method to decompose any Boolean circuit into Boolean functions which are efficiently evaluable using our approach. We apply our framework to homomorphically evaluate various cryptographic primitives, and in particular the AES cipher. Our implementation results show significant improvements compared to the state of the art.
2024
TCHES
PoMMES: Prevention of Micro-architectural Leakages in Masked Embedded Software
Software solutions to address computational challenges are ubiquitous in our daily lives. One specific application area where software is often used is in embedded systems, which, like other digital electronic devices, are vulnerable to side-channel analysis attacks. Although masking is the most common countermeasure and provides a solid theoretical foundation for ensuring security, recent research has revealed a crucial gap between theoretical and real-world security. This shortcoming stems from the micro-architectural effects of the underlying micro-processor. Common security models used to formally verify masking schemes such as the d-probing model fully ignore the micro-architectural leakages that lead to a set of instructions that unintentionally recombine the shares. Manual generation of masked assembly code that remains secure in the presence of such micro-architectural recombinations often involves trial and error, and is non-trivial even for experts.Motivated by this, we present PoMMES, which enables inexperienced software developers to automatically compile masked functions written in a high-level programming language into assembly code, while preserving the theoretically proven security in practice. Compared to the state of the art, based on a general model for microarchitectural effects, our scheme allows the generation of practically secure masked software at arbitrary security orders for in-order processors. The major contribution of PoMMES is its micro-architecture aware register allocation algorithm, which is one of the crucial steps during the compilation process. In addition to simulation-based assessments that we conducted by open-source tools dedicated to evaluating masked software implementations, we confirm the effectiveness of the PoMMES-generated codes through experimental analysis. We present the result of power consumption based leakage assessments of several case studies running on a Cortex M0+ micro-controller, which is commonly deployed in industry.
2024
EUROCRYPT
Practical Attack on All Parameters of the DME Signature Scheme
DME is a multivariate scheme submitted to the call for additional signatures recently launched by NIST. Its performance is one of the best among all the candidates. The public key is constructed from the alternation of very structured linear and non-linear components that constitute the private key, the latter being defined over an extension field. We exploit these structures by proposing an algebraic attack which is practical on all DME parameters.
2024
EUROCRYPT
The Hardness of LPN over Any Integer Ring and Field for PCG Applications
Learning parity with noise (LPN) has been widely studied and used in cryptography. It was recently brought to new prosperity since Boyle et al. (CCS'18), putting LPN to a central role in designing secure multi-party computation, zero-knowledge proofs, private set intersection, and many other protocols. In this paper, we thoroughly studied security of LPN problems in this particular context. We found that some important aspects are long ignored and many conclusions from classical LPN cryptanalysis do not apply to this new setting, due to the low noise rates, extremely high dimensions, various types (in addition to $\FF_2$) and noise distributions. For LPN over a field, we give a parameterized reduction from exact-noise LPN to regular-noise LPN. Compared to the recent result by Feneuil, Joux and Rivain (Crypto'22), we significantly reduce the security loss by paying only a small additive price in dimension and number of samples. We analyze the security of LPN over a ring $\ZZ_{2^\lambda}$. Existing protocols based on LPN over integer rings use parameters as if they are over fields, but we found an attack that effectively reduces the weight of a noise by half compared to LPN over fields. Consequently, prior works that use LPN over $\ZZ_{2^\lambda}$ overestimate up to 40 bits of security. We provide a complete picture of the hardness of LPN over integer rings by showing: 1) the equivalence between its search and decisional versions; 2) an efficient reduction from LPN over $\FF_{2}$ to LPN over $\ZZ_{2^\lambda}$; and 3) generalization of our results to any integer ring. Finally, we provide an all-in-one estimator tool for the bit security of LPN parameters in the context of PCG, incorporating the recent advanced attacks.
2024
EUROCRYPT
A Novel Framework for Explainable Leakage Assessment
Schemes such as Common Criteria or FIPS 140-3 require the assessment of cryptographic implementations with respect to side channels at high security levels. Instead of a ``penetration testing'' style approach where specific tests are carried out, FIPS 140-3 relies on non-specific ``leakage assessment'' to identify potential side channel leaks in implementations of symmetric schemes. Leakage assessment, as it is understood today, is based on a simple leakage detection testing regime. Leakage assessment to date, provides no evidence whether or not the potential leakage is exploitable in a concrete attack: if a device fails the test, (and therefore certification under the FIPS 140-3 scheme) it remains unclear why it fails. We propose a novel assessment regime that is based on a different statistical rational than the existing leakage detection tests. Our statistical approach enables non-specific detection (i.e. we do not require to specify intermediate values) whilst simultaneously generating evidence for designing an attack vector that exploits identified leakage. We do this via an iterative approach, based on building and comparing nested regression models. We also provide, for the first time, concrete definitions for concepts such as key leakage, exploitable leakage and explainable leakage. Finally, we illustrate our novel leakage assessment framework in the context of two open-sourced masked software implementations on a processor that is known to exhibit micro-architectural leakage.
2024
EUROCRYPT
The supersingular Endomorphism Ring and One Endomorphism problems are equivalent
The supersingular Endomorphism Ring problem is the following: given a supersingular elliptic curve, compute all of its endomorphisms. The presumed hardness of this problem is foundational for isogeny-based cryptography. The One Endomorphism problem only asks to find a single non-scalar endomorphism. We prove that these two problems are equivalent, under probabilistic polynomial time reductions. We prove a number of consequences. First, assuming the hardness of the endomorphism ring problem, the Charles--Goren--Lauter hash function is collision resistant, and the SQIsign identification protocol is sound for uniformly random keys. Second, the endomorphism ring problem is equivalent to the problem of computing arbitrary isogenies between supersingular elliptic curves, a result previously known only for isogenies of smooth degree. Third, there exists an unconditional probabilistic algorithm to solve the endomorphism ring problem in time $\tilde O(p^{1/2})$, a result that previously required to assume the generalized Riemann hypothesis. To prove our main result, we introduce a flexible framework for the study of isogeny graphs with additional information. We prove a general and easy-to-use rapid mixing theorem.
2024
EUROCRYPT
The Complexity of Algebraic Algorithms for LWE
Arora & Ge introduced a noise-free polynomial system to compute the secret of a Learning With Errors (LWE) instance via linearization. Albrecht et al. later utilized the Arora-Ge polynomial model to study the complexity of Gröbner basis computations on LWE polynomial systems under the assumption of semi-regularity. In this paper we revisit the Arora-Ge polynomial and prove that it satisfies a genericity condition recently introduced by Caminata & Gorla, called being in generic coordinates. For polynomial systems in generic coordinates one can always estimate the complexity of DRL Gröbner basis computations in terms of the Castelnuovo-Mumford regularity and henceforth also via the Macaulay bound. Moreover, we generalize the Gröbner basis algorithm of Semaev & Tenti to arbitrary polynomial systems with a finite degree of regularity. In particular, existence of this algorithm yields another approach to estimate the complexity of DRL Gröbner basis computations in terms of the degree of regularity. In practice, the degree of regularity of LWE polynomial systems is not known, though one can always estimate the lowest achievable degree of regularity. Consequently, from a designer's worst case perspective this approach yields sub-exponential complexity estimates for general, binary secret and binary error LWE. In recent works by Dachman-Soled et al. the hardness of LWE in the presence of side information was analyzed. Utilizing their framework we discuss how hints can be incorporated into LWE polynomial systems and how they affect the complexity of Gröbner basis computations.
2024
EUROCRYPT
Anamorphic Encryption: New Constructions and Homomorphic Realizations
The elegant paradigm of Anamorphic Encryption (Persiano et al., Eurocrypt 2022) considers the question of establishing a private communication in a world controlled by a dictator. The challenge is to allow two users, sharing some secret anamorphic key, to exchange covert messages without the dictator noticing, even when the latter has full access to the regular secret keys. Over the last year several works con- sidered this question and proposed constructions, novel extensions and strengthened definitions. In this work we make progress on the study of this primitive in three main directions. First, we show that two general and well established encryption paradigms, namely hybrid encryption and the IBE-to-CCA transform, admit very simple and natural anamorphic extensions. Next, we show that anamorphism, far from being a phenomenon isolated to "basic" encryption schemes, extends also to homomorphic encryption. We show that some existing homomorphic schemes, (and most notably the fully homomorphic one by Gentry, Sahai and Waters) can be made anamorphic, while retaining their homomorphic properties both with respect to the regular and the covert message. Finally we refine the notion of anamorphic encryption by envisioning the possibility of splitting the anamorphic key into an encryption component (that only allows to encrypt covert messages) and a decryption compo- nent. This makes possible for a receiver to set up several, independent, covert channels associated with a single covert key.
2024
EUROCRYPT
SLAP: Succinct Lattice-Based Polynomial Commitments from Standard Assumptions
Recent works on lattice-based extractable polynomial commitments can be grouped into two classes: (i) non-interactive constructions that stem from the functional commitment by Albrecht, Cini, Lai, Malavolta and Thyagarajan (CRYPTO 2022), and (ii) lattice adaptations of the Bulletproofs protocol (S&P 2018). The former class enjoys security in the standard model, albeit a knowledge assumption is desired. In contrast, Bulletproof-like protocols can be made secure under falsifiable assumptions, but due to technical limitations regarding subtractive sets, they only offer inverse-polynomial soundness error. This issue becomes particularly problematic when transforming these protocols to the non-interactive setting using the Fiat-Shamir paradigm. In this work, we propose the first lattice-based non-interactive extractable polynomial commitment scheme which achieves polylogarithmic proof size and verifier runtime (in the length of the committed message) under standard assumptions. At the core of our work lies a new tree-based commitment scheme, along with an efficient proof of polynomial evaluation inspired by FRI (ICALP 2018). Natively, the construction is secure under a “multi-instance version” of the Power-Ring BASIS assumption (Eprint 2023/846). We then base security on the Module-SIS assumption by introducing several re-randomisation techniques which can be of independent interest.
2024
EUROCRYPT
Twinkle: Threshold Signatures from DDH with Full Adaptive Security
Sparkle is the first threshold signature scheme in the pairing-free discrete logarithm setting (Crites, Komlo, Maller, Crypto 2023) to be proven secure under adaptive corruptions. However, without using the algebraic group model, Sparkle's proof imposes an undesirable restriction on the adversary. Namely, for a signing threshold t<n, the adversary is restricted to corrupt at most t/2 parties. In addition, Sparkle's proof relies on a strong one-more assumption. In this work, we propose Twinkle, a new threshold signature scheme in the pairing-free setting which overcomes these limitations. Twinkle is the first pairing-free scheme to have a security proof under up to t adaptive corruptions without relying on the algebraic group model. It is also the first such scheme with a security proof under adaptive corruptions from a well-studied non-interactive assumption, namely, the Decisional Diffie-Hellman (DDH) assumption. We achieve our result in two steps. First, we design a generic scheme based on a linear function that satisfies several abstract properties and prove its adaptive security under a suitable one-more assumption related to this function. In the context of this proof, we also identify a gap in the security proof of Sparkle and develop new techniques to overcome this issue. Second, we give a suitable instantiation of the function for which the corresponding one-more assumption follows from DDH.
2024
EUROCRYPT
Toothpicks: More Efficient Fork-Free Two-Round Multi-Signatures
Tightly secure cryptographic schemes can be implemented with standardized parameters, while still having a sufficiently high security level backed up by their analysis. In a recent work, Pan and Wagner (Eurocrypt 2023) presented the first tightly secure two-round multi-signature scheme without pairings, called Chopsticks. While this is an interesting first theoretical step, Chopsticks is much less efficient than its non-tight counterparts. In this work, we close this gap by proposing a new tightly secure two-round multi-signature scheme that is as efficient as non-tight schemes. Our scheme is based on the DDH assumption without pairings. Compared to Chopsticks, we reduce the signature size by more than a factor of 3 and the communication complexity by more than a factor of 2. Technically, we achieve this as follows: (1) We develop a new pseudorandom path technique, as opposed to the pseudorandom matching technique in Chopsticks. (2) We construct a more efficient commitment scheme with suitable properties, which is an important primitive in both our scheme and Chopsticks. Surprisingly, we observe that the commitment scheme does not have to be binding, enabling our efficient construction.
2024
EUROCRYPT
Lower Bounds for Lattice-based Compact Functional Encryption
Functional encryption (FE) is a primitive where the holder of a master secret key can control which functions a user can evaluate on encrypted data. It is a powerful primitive that even implies indistinguishability obfuscation (iO), given sufficiently compact ciphertexts (Ananth-Jain, CRYPTO'15 and Bitansky-Vaikuntanathan, FOCS'15). However, despite being extensively studied, there are FE schemes, such as function-hiding inner-product FE (Bishop-Jain-Kowalczyk, AC'15, Abdalla-Catalano-Fiore-Gay-Ursu, CRYPTO’18) and compact quadratic FE (Baltico-Catalano-Fiore-Gay, Lin, CRYPTO’17), that can be only realized using pairings. This raises the question if there are some mathematical barriers that hinder us from realizing these FE schemes from other assumptions. In this paper, we study the difficulty of constructing lattice-based compact FE. We generalize the impossibility results of Ünal (EC'20) for lattice-based function-hiding FE, and extend it to the case of compact FE. Concretely, we prove lower bounds for lattice-based compact FE schemes which meet some (natural) algebraic restrictions at encryption and decryption, and have ciphertexts of linear size and secret keys of minimal degree. We see our results as important indications of why it is hard to construct lattice-based FE schemes for new functionalities, and which mathematical barriers have to be overcome.
2024
EUROCRYPT
Provable Dual Attacks on Learning with Errors
Learning with Errors (LWE) is an important problem for post-quantum cryptography (PQC) that underlines the security of several NIST PQC selected algorithms. Several recent papers have claimed improvements on the complexity of so-called dual attacks on LWE. These improvements make dual attacks comparable to or even better than primal attacks in certain parameter regimes. Unfortunately, those improvements rely on a number of untested and hard-to-test statistical assumptions. Furthermore, a recent paper [Ducas and Pulles, 2023] claims that the whole premise of those improvements might be incorrect. The goal of this paper is to improve the situation by proving the correctness of a dual attack without relying on any statistical assumption. Although our attack is greatly simplified compared to the recent ones, it shares many important technical elements with those attacks and can serve as a basis for the analysis of more advanced attacks. We provide some rough estimates on the complexity of our simplified attack on Kyber using a Monte Carlo Markov Chain discrete Gaussian sampler. Our main contribution is to clearly identify a set of parameters under which our attack (and presumably other recent dual attacks) can work. Furthermore, our analysis completely departs from the existing statistics-based analysis and is instead rooted in geometry. We also compare the regime in which our algorithm works to the ``contradictory regime'' of [Ducas and Pulles, 2023]. We observe that those two regimes are essentially complementary. Finally, we give a quantum version of our algorithm to speed up the computation. The algorithm is inspired by [Albrecht and Shen 2022] but is completely formal and does not rely on any heuristics.
2024
EUROCRYPT
Accelerating BGV Bootstrapping for Large $p$ Using Null Polynomials Over $\mathbb{Z}_{p^e}$
The BGV scheme is one of the most popular FHE schemes for computing homomorphic integer arithmetic. The bootstrapping technique of BGV is necessary to evaluate arbitrarily deep circuits homomorphically. However, the BGV bootstrapping performs poorly for large plaintext prime $p$ due to its digit removal procedure exhibiting a computational complexity of at least $O(\sqrt{p})$. In this paper, we propose optimizations for the digit removal procedure with large $p$ by leveraging the properties of null polynomials over the ring $\mathbb{Z}_{p^e}$. Specifically, we demonstrate that it is possible to construct low-degree null polynomials based on two observations of the input to the digit removal procedure: 1) the support size of the input can be upper-bounded by $(2B+1)^2$; 2) the size of the lower digits to be removed can be upper-bounded by $B$. Here $B$ can be controlled within a narrow interval $[22,23]$ in our parameter selection, making the degree of these null polynomials much smaller than $p$ for large values of $p$. These low-degree null polynomials can significantly reduce the polynomial degrees during homomorphic digit removal, thereby decreasing both running time and capacity consumption. Theoretically, our optimizations reduce the computational cost of extracting a single digit from $O(\sqrt{pe})$ (by Chen and Han) or $O(\sqrt{p}\sqrt[4]{e})$ (by Geelen et al.) to $\min(2B+1,\sqrt{\lceil e/t\rceil(2B+1)})$ for some $t\ge 1$. We implement and benchmark our method on HElib with $p=17,127,257,8191$ and $65537$. With our optimized digit removal, we achieve a bootstrapping throughput $1.38\sim151$ times that in HElib, with the speedup increasing with the value of $p$. For $p=65537$, we accelerate the digit removal step by 80 times and reduce the bootstrapping time from more than 12 hours to less than 14 minutes.
2024
EUROCRYPT
A Holistic Security Analysis of Monero Transactions
Monero is a popular cryptocurrency with strong privacy guarantees for users' transactions. At the heart of Monero's privacy claims lies a complex transaction system called RingCT, which combines several building blocks such as linkable ring signatures, homomorphic commitments, and range proofs, in a unique fashion. In this work, we provide the first rigorous security analysis for RingCT (as given in Zero to Monero, v2.0.0, 2020) in its entirety. This is in contrast to prior works that only provided security arguments for parts of RingCT. To analyze Monero's transaction system, we introduce the first holistic security model for RingCT. We then prove the security of RingCT in our model. Our framework is modular: it allows to view RingCT as a combination of various different sub-protocols. Our modular approach has the benefit that these components can be easily updated in future versions of RingCT, with only minor modifications to our analysis. At a technical level, we split our analysis in two parts. First, we identify which security notions for building blocks are needed to imply security for the whole system. Interestingly, we observe that existing and well-established notions (e.g., for the linkable ring signature) are insufficient. Second, we analyze all building blocks as implemented in Monero and prove that they satisfy our new notions. Here, we leverage the algebraic group model to overcome subtle problems in the analysis of the linkable ring signature component. As another technical highlight, we show that our security goals can be mapped to a suitable graph problem, which allows us to take advantage of the theory of network flows in our analysis. This new approach is also useful for proving security of other cryptocurrencies.
2024
EUROCRYPT
Lower-Bounds on Public-Key Operations in PIR
Private information retrieval (PIR) is a fundamental cryptographic primitive that allows a user to fetch a database entry without revealing to the server which database entry it learns. PIR becomes non-trivial if the server communication is less than the database size. We show that building (even) very weak forms of PIR protocols requires linearly many public-key operations. We then use this bound to examine the related problem of communication efficient oblivious transfer (OT) extension. Oblivious transfer is a crucial building block in secure multi-party computation (MPC). In most MPC protocols, OT invocations are the main bottleneck in terms of computation and communication. OT extension techniques allow one to minimize the number of public-key operations in MPC protocols. One drawback of all existing OT extension protocols is their communication overhead. In particular, the sender’s communication is roughly double what is information-theoretically optimal. We show that OT extension with close to optimal sender communication is impossible, illustrating that the communication overhead is inherent. Our techniques go much further; we can show many lower bounds on communication-efficient MPC. E.g. we prove that to build high-rate string OT with generic groups, the sender needs to do linearly many group operations.
2024
EUROCRYPT
Two-Round Maliciously-Secure Oblivious Transfer with Optimal Rate
We give a construction of a two-round batch oblivious transfer (OT) protocol in the CRS model that is UC-secure against malicious adversaries and has (near) optimal communication cost. Specifically, to perform a batch of $k$ oblivious transfers where the sender's inputs are bits, the sender and the receiver need to communicate a total of $3k + o(k) \cdot \mathsf{poly}(\lambda)$ bits. We argue that $3k$ bits are required by any protocol with a black-box and straight-line simulator. The security of our construction is proven assuming the hardness of Quadratic Residuosity (QR) and the Learning Parity with Noise (LPN).
2024
EUROCRYPT
Trapdoor Memory-Hard Functions
Memory-hard functions (MHF) are functions whose evaluation provably requires a lot of memory. While MHFs are an unkeyed primitive, it is natural to consider the notion of trapdoor MHFs (TMHFs). A TMHF is like an MHF, but when sampling the public parameters one also samples a trapdoor which allows evaluating the function much cheaper. Biryukov and Perrin (Asiacrypt'17) were the first to consider TMHFs and put forth a candidate TMHF construction called Diodon that is based on the Scrypt MHF (Percival, BSDCan'09). To allow for a trapdoor, Scrypt's initial hash chain is replaced by a sequence of squares in a group of unknown order where the order of the group is the trapdoor. For a length n sequence of squares and a group of order N, Diodon's cumulative memory complexity (CMC) is O(n^2\log N) without the trapdoor and O(n log(n) log(N)^2) with knowledge of it. While Scrypt is proven to be optimally memory-hard in the random oracle model (Alwen et al., Eurocrypt'17), Diodon's memory-hardness has not been proven so far. In this work, we fill this gap by rigorously analyzing a specific instantiation of Diodon. We show that its CMC is lower bounded by Ω((n^2)/(log n) log N) which almost matches the upper bound. Our proof is based Alwen et al.'s lower bound on Scrypt's CMC but requires non-trivial modifications due to the algebraic structure of Diodon. Most importantly, our analysis involves a more elaborate compression argument and a solvability criterion for certain systems of Diophantine equations.
2024
EUROCRYPT
A generic algorithm for efficient key recovery in differential attacks – and its associated tool
Differential cryptanalysis is an old and powerful attack against block ciphers. While different techniques have been introduced throughout the years to improve the complexity of this attack, the key recovery phase remains a tedious and error-prone procedure. In this work, we propose a new algorithm and its associated tool that permits, given a distinguisher, to output an efficient key guessing strategy. Our tool can be applied to SPN ciphers whose linear layer consists of a bit-permutation and whose key schedule is linear or almost linear. It can be used not only to help cryptanalysts find the best differential attack on a given cipher but also to assist designers in their security analysis. We applied our tool to four targets: RECTANGLE, PRESENT-80, SPEEDY-7-192 and GIFT-64. We extend the previous best attack on RECTANGLE-128 by one round and the previous best differential attack against PRESENT-80 by 2 rounds. We improve a previous key recovery step in an attack against SPEEDY and present more efficient key recovery strategies for RECTANGLE-80 and GIFT. Our tool outputs the results in only a second for most targets
2024
EUROCRYPT
Succinct Functional Commitments for Circuits from k-Lin
A functional commitment allows a user to commit to an input x and later, open the commitment to an arbitrary function y = f(x). The size of the commitment and the opening should be sublinear in |x| and |f|. In this work, we give the first pairing-based functional commitment for arbitrary circuits where the size of the commitment and the size of the opening consist of a constant number of group elements. Security relies on the standard bilateral k-Lin assumption. This is the first scheme with this level of succinctness from falsifiable bilinear map assumptions (previous approaches required SNARKs for NP). This is also the first functional commitment scheme for general circuits with poly(lambda)-size commitments and openings from any assumption that makes fully black-box use of cryptographic primitives and algorithms. Our construction relies on a new notion of projective chainable commitments which may be of independent interest.
2024
EUROCRYPT
Threshold Raccoon: Practical Threshold Signatures from Standard Lattice Assumptions
Threshold signatures improve both availability and security of digital signatures by splitting the signing key into $N$ shares handed out to different parties. Later on, any subset of at least $T$ parties can cooperate to produce a signature on a given message. While threshold signatures have been extensively studied in the pre-quantum setting, they remain sparse from quantum-resilient assumptions. We present the first efficient lattice-based threshold signatures with signature size 13~KiB and communication cost 40~KiB per user, supporting a threshold size as large as 1024~signers. We provide an accompanying high performance implementation. The security of the scheme is based on the same assumptions as Dilithium, a signature recently selected by NIST for standardisation which, as far as we know, cannot easily be made threshold efficiently. All operations used during signing are due to symmetric primitives and simple lattice operations; in particular our scheme does not need heavy tools such as threshold fully homomorphic encryption or homomorphic trapdoor commitments as in prior constructions. The key technical idea is to use _one-time additive masks_ to mitigate the leakage of the partial signing keys through partial signatures.
2024
EUROCRYPT
Leakage-Tolerant Circuits
A {\em leakage-resilient circuit} for $f:\{0,1\}^n\to\{0,1\}^m$ is a randomized Boolean circuit $C$ mapping a randomized encoding of an input $x$ to an encoding of $y=f(x)$, such that applying any leakage function $L\in \cal L$ to the wires of $C$ reveals essentially nothing about $x$. A {\em leakage-tolerant circuit} achieves the stronger guarantee that even when $x$ and $y$ are not protected by any encoding, the output of $L$ can be simulated by applying some $L'\in \cal L$ to $x$ and $y$ alone. Thus, $C$ is as secure as an ideal hardware implementation of $f$ with respect to leakage from $\cal L$. Leakage-resilient circuits were constructed for low-complexity classes $\cal L$, including (length-$t$ output) $\mathcal{AC}0$ functions, parities, and functions with bounded communication complexity. In contrast, leakage-{\em tolerant} circuits were only known for the simple case of {\em probing} leakage, where $L$ outputs the values of $t$ wires in $C$. We initiate a systematic study of leakage-tolerant circuits for natural classes $\cal L$ of {\em global} leakage functions, obtaining the following main results. \begin{itemize} \item {\bf Leakage-tolerant circuits for depth-1 leakage.} Every circuit $C_f$ for $f$ can be efficiently compiled into an $\cal L$-tolerant circuit $C$ for $f$, where $\cal L$ includes all leakage functions $L$ that output either $t$ {\em parities} or $t$ {\em disjunctions} (alternatively, conjunctions) of any number of wires or their negations. In the case of parities, our simulator runs in $2^{O(t)}$ time. We provide partial evidence that this may be inherent. \item {\bf Application to stateful leakage-resilient circuits.} Using a general transformation from leakage-tolerant circuits, we obtain the first construction of {\em stateful} $t$-leakage-resilient circuits that tolerate a {\em continuous} parity leakage, and the first such construction for disjunction/conjunction leakage in which the circuit size grows sub-quadratically with $t$. Interestingly, here we can obtain $\poly(t)$-time simulation even in the case of parities. \end{itemize}
2024
EUROCRYPT
Signatures with Memory-Tight Security in the Quantum Random Oracle Model
Auerbach, Cash, Fersch, and Kiltz (CRYPTO 2017) initiated the study of memory tightness of reductions in cryptography in addition to the standard tightness related to advantage and running time and showed the importance of memory tightness when the underlying problem can be solved efficiently with large memory. Diemert, Geller, Jager, and Lyu (ASIACRYPT 2021) and Ghoshal, Ghosal, Jaeger, and Tessaro (EUROCRYPT 2022) gave memory-tight proofs for the multi-challenge security of digital signatures in the random oracle model. This paper studies the memory-tight reductions for _post-quantum_ signature schemes in the _quantum_ random oracle model. Concretely, we show that signature schemes from lossy identification are multi-challenge secure in the quantum random oracle model via memory-tight reductions. Moreover, we show that the signature schemes from lossy identification achieve more enhanced securities considering _quantum_ signing oracles proposed by Boneh and Zhandry (CRYPTO 2013) and Alagic, Majenz, Russel, and Song (EUROCRYPT 2020). We additionally show that signature schemes from preimage-sampleable functions achieve those securities via memory-tight reductions.
2024
EUROCRYPT
Early Stopping for Any Number of Corruptions
Minimizing the round complexity of byzantine broadcast is a fundamental question in distributed computing and cryptography. In this work, we present the first \emph{early stopping} byzantine broadcast protocol that tolerates up to $t=n-1$ malicious corruptions and terminates in $\cO(\min\{f^2,t+1\})$ rounds for any execution with $f\leq t$ \emph{actual corruptions}. Our protocol is deterministic, adaptively secure, and works assuming a plain public key infrastructure. Prior early-stopping protocols all either require honest majority or tolerate only up to $t=(1-\epsilon)n$ malicious corruptions while requiring either trusted setup or strong number theoretic hardness assumptions. As our key contribution, we show a novel tool called a \emph{polarizer} that allows us to transfer certificate-based strategies from the honest majority setting to settings with a dishonest majority.
2024
EUROCRYPT
The NISQ Complexity of Collision Finding
Collision-resistant hashing, a fundamental primitive in modern cryptography, ensures that there is no efficient way to find distinct inputs that produce the same hash value. This property underpins the security of various cryptographic applications, making it crucial to understand its complexity. The complexity of this problem is well-understood in the classical setting and Theta(N^{1/2}) queries are needed to find a collision. However, the advent of quantum computing has introduced new challenges since quantum adversaries --- equipped with the power of quantum queries --- can find collisions much more efficiently. Brassard, Høyer and Tapp and Aaronson and Shi established that full-scale quantum adversaries require Theta(N^{1/3}) queries to find a collision, prompting a need for longer hash outputs, which impacts efficiency in terms of the key lengths needed for security. This paper explores the implications of quantum attacks in the Noisy-Intermediate Scale Quantum (NISQ) era. In this work, we investigate three different models for NISQ algorithms and achieve **tight bounds for all of them**: (1) A hybrid algorithm making adaptive quantum or classical queries but with a limited quantum query budget, or (2) A quantum algorithm with access to a noisy oracle, subject to a dephasing or depolarizing channel, or (3) A hybrid algorithm with an upper bound on its maximum quantum depth; i.e., a classical algorithm aided by low-depth quantum circuits. In fact, our results handle all regimes between NISQ and full-scale quantum computers. Previously, only results for the pre-image search problem were known for these models by Sun and Zheng, Rosmanis, Chen, Cotler, Huang and Li while nothing was known about the collision finding problem.
2024
EUROCRYPT
Registered Functional Encryptions from Pairings
This work initiates the study of \emph{concrete} registered functional encryption (Reg-FE) beyond ``all-or-nothing'' functionalities: - We build the first Reg-FE for linear function or inner-product evaluation (Reg-IPFE) from pairing. The scheme achieves adaptive IND-security under $k$-Lin assumption in the prime-order bilinear group. A minor modification yields the first Registered Inner-Product Encryption (Reg-IPE) scheme from $k$-Lin assumption. Prior work achieves the same security in the generic group model. - We build the first Reg-FE for quadratic function (Reg-QFE) from pairing. The scheme achieves \emph{very selective} simulation-based security (SIM-security) under bilateral $k$-Lin assumption in the prime-order bilinear group. Here, ``very selective'' means that the adversary claims challenge messages, all quadratic functions to be registered and all corrupted users at the beginning. Besides focusing on the compactness of the master public key and helper keys, we also aim for compact ciphertexts in Reg-FE. Let $L$ be the number of slots and $n$ be the input size. Our first Reg-IPFE has \emph{weakly compact} ciphertexts of size $O(n\cdot\log L)$ while our second Reg-QFE has \emph{compact} ciphertexts of size $O(n+\log L)$. Technically, for our first Reg-IPFE, we employ \emph{nested} dual-system method within the context of Reg-IPFE; for our second Reg-QFE, we follow Wee's ``IPFE-to-QFE'' transformation [TCC' 20] but devise a set of new techniques that make our \emph{pairing-based} Reg-IPFE compatible. Along the way, we introduce a new notion named \emph{Pre-Constrained Registered IPFE} which generalizes slotted Reg-IPFE by constraining the form of functions that can be registered.
2024
EUROCRYPT
Constant-Size zk-SNARKs in ROM from Falsifiable Assumptions
We prove that the seminal KZG polynomial commitment scheme (PCS) is black-box extractable under a simple falsifiable assumption ARSDH. To create an interactive argument, we construct a compiler that combines a black-box extractable non-interactive PCS and a polynomial IOP (PIOP). The compiler incurs a minor cost per every committed polynomial. Applying the Fiat-Shamir transformation, we obtain slightly less efficient variants of well-known PIOP-based zk-SNARKs, such as Plonk, that are knowledge-sound in the ROM under the ARSDH assumption. Importantly, there is no need for idealized group models or knowledge assumptions. This results in the first known zk-SNARKs in the ROM from falsifiable assumptions with both an efficient prover and constant-size argument.
2024
EUROCRYPT
Closing the Efficiency Gap between Synchronous and Network-Agnostic Consensus
In the consensus problem, $n$ parties want to agree on a common value, even if some of them are corrupt and arbitrarily misbehave. If the parties have a common input $m$, then they must agree on $m$. Protocols solving consensus assume either a synchronous communication network, where messages are delivered within a known time, or an asynchronous network with arbitrary delays. Asynchronous protocols only tolerate $t_a < n/3$ corrupt parties. Synchronous ones can tolerate $t_s < n/2$ corruptions with setup, but their security completely breaks down if the synchrony assumptions are violated. Network-agnostic consensus protocols, as introduced by Blum, Katz, and Loss [TCC'19], are secure regardless of network conditions, tolerating up to $t_s$ corruptions with synchrony and $t_a$ without, under provably optimal assumptions $t_a \leq t_s$ and $2t_s + t_a < n$. Despite efforts to improve their efficiency, all known network-agnostic protocols fall short of the asymptotic complexity of state-of-the-art purely synchronous protocols. In this work, we introduce a novel technique to compile any synchronous and any asynchronous consensus protocols into a network-agnostic one. This process only incurs a small constant number of overhead rounds, so that the compiled protocol matches the optimal round complexity for synchronous protocols. Our compiler also preserves under a variety of assumptions the asymptomatic communication complexity of state-of-the-art synchronous and asynchronous protocols. Hence, it closes the current efficiency gap between synchronous and network-agnostic consensus. As a plus, our protocols support $\ell$-bit inputs, and can be extended to achieve communication complexity $O(n^2\kappa + \ell n)$ under the assumptions for which this is known to be possible for purely synchronous protocols.
2024
EUROCRYPT
M&M'S: Mix and Match Attacks on Schnorr-type Blind Signatures with Repetition
Blind signatures allow the issuing of signatures on messages chosen by the user so that they ensure blindness of the message against the signer. Moreover, a malicious user cannot output l+1 signatures while only finishing l signing session. This notion, called one-more unforgeability, comes in two flavors supporting either sequential or concurrent sessions. In this paper, we investigate the security of a class of blind signatures constructed from Sigma-protocols with small challenge space C (i.e., polynomial in the security parameter), using k repetitions of the protocol to decrease the chances of a cheating prover. This class of schemes includes, among others, the Schnorr blind signature scheme with bit challenges and the recently proposed isogeny-based scheme CSI-Otter (Crypto’23). For this class of blind signatures, we show a polynomial-time attack that breaks one-more unforgeability for any l ≥ k concurrent sessions in time O(k·|C|). Contrary to the ROS attack, ours is generic and does not require any particular algebraic structure. We also propose a computational trade-off, where for any t ≤ k, our attack works for l = k/t in time O(k/t·|C|·t). The consequences of our attack are as follows. Schemes in the investigated class of blind signatures should not be used concurrently without applying specific transformations to boost the security to support more signing sessions. Moreover, for the parameters proposed for CSI-Otter (k = 128 and |C| = 2), the scheme becomes forgeable after 128 concurrent signing sessions for the basic attack and with only eight sessions in our optimized attack. We also show that for those parameters, it is even possible to compute two signatures in around 10 minutes with just one signing session using the computation power of the Bitcoin network. Thus, we show that for sequential security, the parameter k must be at least doubled in the security parameter for any of the investigated schemes.
2024
EUROCRYPT
Key Exchange with Tight (Full) Forward Secrecy via Key Confirmation
Weak forward secrecy (wFS) of authenticated key exchange (AKE) protocols is a passive variant of (full) forward secrecy (FS). A natural mechanism to upgrade from wFS to FS is the use of key confirmation messages which compute a message authentication code (MAC) over the transcript. Unfortunately, Gellert, Gjøsteen, Jacobson and Jager (GGJJ, CRYPTO 2023) show that this mechanism inherently incurs a loss proportional to the number of users, leading to an overall non-tight reduction, even if wFS was established using a tight reduction. Inspired by GGJJ, we propose a new notion, called one-way verifiable weak forward secrecy (OW-VwFS), and prove that OW-VwFS can be transformed tightly to FS using key confirmation in the random oracle model (ROM). To implement our generic transformation, we show that several tightly wFS AKE protocols additionally satisfy our OW-VwFS notion tightly. We highlight that using the recent lattice-based protocol from Pan, Wagner, and Zeng (CRYPTO 2023) can give us the first lattice-based tightly FS AKE via key confirmation in the classical random oracle model. Besides this, we also obtain a Decisional-Diffie-Hellman-based protocol that is considerably more efficient than the previous ones. Finally, we lift our study on FS via key confirmation to the quantum random oracle model (QROM). While our security reduction is overall non-tight, it matches the best existing bound for wFS in the QROM (Pan, Wagner, and Zeng, ASIACRYPT 2023), namely, it is square-root- and session-tight. Our analysis is in the multi-challenge setting, and it is more realistic than the single-challenge setting as in Pan et al..
2024
EUROCRYPT
Fully Homomorphic Encryption beyond IND-CCA1 Security: Integrity through Verifiability
We focus on the problem of constructing fully homomorphic encryption (FHE) schemes that achieve some meaningful notion of adaptive chosen-ciphertext security beyond $\ccai$. Towards this, we propose a new notion, called security against \textit{verified chosen-ciphertext attack} ($\nvcca$). The idea behind it is to ascertain the integrity of the ciphertext by imposing a strong control on the evaluation algorithm. Essentially, we require that a ciphertext obtained by the use of homomorphic evaluation must be "linked" to the original input ciphertexts. We precisely formalize the $\nvcca$ notion in two equivalent formulations; the first is in the indistinguishability paradigm, the second follows the non-malleability simulation-based approach, and is a generalization of the targeted malleability introduced by Boneh et al in 2012. We strengthen the credibility of our definitions by exploring relations to existing security notions for homomorphic schemes, namely $\ccai$, $\rcca$, $\funccpa$, $\ccva$, and $\hcca$. We prove that $\nvcca$ security is the strongest notion known so far, that can be achieved by an FHE scheme; in particular, $\nvcca$ is strictly stronger than $\ccai$. Finally, we provide a generic transformation, that takes \textit{any} $\cpa$-secure FHE scheme and makes it $\nvcca$-secure. Our transformation first turns an FHE scheme into a $\ccaii$-secure scheme where a part of the ciphertext retains the homomorphic properties and then extends it with a succinct non-interactive argument of knowledge to control the evaluation algorithm. In fact we obtain \emph{four} variations for the $\nvcca$-secure FHE construction, as we give two public-key variations and two symmetric-key ones. As a direct implication, we get the \emph{first} $\ccai$-secure FHE schemes that is based on \emph{bootstrapping} techniques.
2024
EUROCRYPT
Key Recovery Attack on the Partial Vandermonde Knapsack Problem
The Partial Vandermonde (PV) Knapsack problem is an algebraic variant of the low-density inhomogeneous SIS problem. The problem has been used as a building block for various lattice-based constructions, including signatures (ACNS'14, ACISP'18), encryptions (DCC'15,DCC'20), and signature aggregation (Eprint'20). At Crypto'22, Boudgoust, Gachon, and Pellet-Mary proposed a key distinguishing attack on the PV Knapsack exploiting algebraic properties of the problem. Unfortunately, their attack doesn't offer key recovery, except for worst-case keys. In this paper, we propose an alternative attack on the PV Knapsack problem, which provides key recovery for a much larger set of keys. Like the Crypto'22 attack, it is based on lattice reduction and uses a dimension reduction technique to speed-up the underlying lattice reduction algorithm and enhance its performance. As a side bonus, our attack transforms the PV Knapsack problem into uSVP instances instead of SVP instances in the Crypto'22 attack. This also helps the lattice reduction algorithm, both from a theoretical and practical point of view. We use our attack to re-assess the hardness of the concrete parameters used in the literature. It appears that many contain a non-negligible fraction of weak keys, which are easily identified and extremely susceptible to our attack. For example, a fraction of $2^{-19}$ of the public keys of a parameter set from ACISP'18 can be solved in about $30$ hours on a moderate server using off-the-shelf lattice reduction. This parameter set was initially claimed to have a $129$-bit security against key recovery attack. Its security was reduced to $87$-bit security using the distinguishing attack from Crypto'22. Similarly, the ACNS'14 proposal also includes a parameter set containing a fraction of $2^{-19}$ of weak keys; those can be solved in about $17$ hours.
2024
EUROCRYPT
SPRINT: High-Throughput Robust Distributed Schnorr Signatures
We describe robust high-throughput threshold protocols for generating Schnorr signatures in an asynchronous setting with potentially hundreds of parties. The protocols run a single message-independent interactive ephemeral randomness generation procedure (i.e., DKG) followed by \emph{non-interactive} signature generation for multiple messages, at a communication cost similar to one execution of a synchronous non-robust protocol in prior work (e.g., Gennaro et al.) and with a large number of parties (ranging from few tens to hundreds and more). Our protocols extend seamlessly to the dynamic/proactive setting where each run of the protocol uses a new committee with refreshed shares of the secret key; in particular, they support large committees periodically sampled from among the overall population of parties and the required secret state is transferred to the selected parties. The protocols work over a broadcast channel and are robust (provide guaranteed output delivery) even over asynchronous networks. The combination of these features makes our protocols a good match for implementing a signature service over a public blockchain with many validators, where guaranteed output delivery is an absolute must. In that setting, there is a system-wide public key, where the corresponding secret signature key is distributed among the validators. Clients can submit messages (under suitable controls, e.g. smart contracts), and authorized messages are signed relative to the global public key. Asymptotically, when running with committees of $n$ parties, our protocols can generate $\Omega(n^2)$ signatures per run, while providing resilience against $\Omega(n)$ corrupted nodes and broadcasting only $O(n^2)$ group elements and scalars (hence $O(1)$ elements per signature). We prove the security of our protocols via a reduction to the hardness of the discrete logarithm problem in the random oracle model.
2024
EUROCRYPT
Best-of-Both-Worlds Multiparty Quantum Computation with Publicly Verifiable Identifiable Abort
Alon et al. (CRYPTO 2021) introduced a multiparty quantum computation protocol that is secure with identifiable abort (MPQC-SWIA). However, their protocol allows only inside MPQC parties to know the identity of malicious players. This becomes problematic when two groups of people disagree and need a third party, like a jury, to verify who the malicious party is. This issue takes on heightened significance in the quantum setting, given that quantum states may exist in only a single copy. Thus, we emphasize the necessity of a protocol with publicly verifiable identifiable abort (PVIA), enabling outside observers with only classical computational power to agree on the identity of the malicious party in case of an abort. However, achieving MPQC with PVIA poses significant challenges due to the no-cloning theorem, and previous works proposed by Mahadev (STOC 2018) and Chung et al. (Eurocrypt 2022) for classical verification of quantum computation fall short. In this paper, we obtain the first MPQC-PVIA protocol assuming post-quantum oblivious transfer and a classical broadcast channel. The core component of our construction is a new authentication primitive called auditable quantum authentication (AQA) that identifies the malicious sender with overwhelming probability. Additionally, we provide the first MPQC protocol with best-of-both-worlds (BoBW) security, which guarantees output delivery with an honest majority and remains secure with abort even if the majority is dishonest. Our best-of-both-worlds MPQC protocol also satisfies PVIA upon abort.
2024
EUROCRYPT
Proof-of-Work-based Consensus in Expected-Constant Time
In the traditional consensus problem (aka Byzantine agreement), parties are required to agree on a common value despite the malicious behavior of some of them, subject to the condition that if all the honest parties start the execution with the same value, then that should be the outcome. This problem has been extensively studied by both the distributed computing and cryptographic protocols communities. With the advent of blockchains, whose main application---a distributed ledger---essentially requires that miners agree on their views, new techniques have been proposed to solve the problem, and in particular in so-called ``permissionless'' environments, where parties are not authenticated or have access to point-to-point channels and, further, may come and go as they please. So far, the fastest way to achieve consensus in the proof-of-work (PoW)-based setting of Bitcoin, takes O(polylog \kappa) number of rounds, where \kappa is the security parameter. We present the first protocol in this setting that requires expected-constant number of rounds. Furthermore, we show how to apply securely sequential composition in order to yield a fast distributed ledger protocol that settles all transactions in expected-constant time. Our result is based on a novel instantiation of ``m-for-1 PoWs'' on parallel chains that facilitates our basic building block, Chain-King Consensus. The techniques we use, via parallel chains, to port classical protocol design elements (such as Phase-King Consensus, super-phase sequential composition and others) into the permissionless setting may be of independent interest.
2024
EUROCRYPT
Tight Security of TNT and Beyond: Attacks, Proofs and Possibilities for the Cascaded LRW Paradigm
Liskov, Rivest and Wagner laid the theoretical foundations for tweakable block ciphers (TBC). In a seminal paper, they proposed two (up to) birthday-bound secure design strategies --- LRW1 and LRW2 --- to convert any block cipher into a TBC. Several of the follow-up works consider cascading of LRW-type TBCs to construct beyond-the-birthday bound (BBB) secure TBCs. Landecker et al. demonstrated that just two-round cascading of LRW2 can already give a BBB security. Bao et al. undertook a similar exercise in context of LRW1 with TNT --- a three-round cascading of LRW1 --- that has been shown to achieve BBB security as well. In this paper, we present a CCA distinguisher on TNT that achieves a non-negligible advantage with $ O(2^{n/2}) $ queries, directly contradicting the security claims made by the designers. We provide a rigorous and complete advantage calculation coupled with experimental verification that further support our claim. Next, we provide new and simple proofs of birthday-bound CCA security for both TNT and its single-key variant, which confirm the tightness of our attack. Furthering on to a more positive note, we show that adding just one more block cipher call, referred as 4-LRW1, does not just re-establish the BBB security, but also amplifies it up to $ 2^{3n/4} $ queries. As a side-effect of this endeavour, we propose a new abstraction of the cascaded LRW-design philosophy, referred to as the LRW+ paradigm, comprising two block cipher calls sandwiched between a pair of tweakable universal hashes. This helps us to provide a modular proof covering all cascaded LRW constructions with at least $ 2 $ rounds, including 4-LRW1, and its more established relative, the well-known CLRW2, or more aptly, 2-LRW2.
2024
EUROCRYPT
Algorithms for matrix code and alternating trilinear form equivalences via new isomorphism invariants
We devise algorithms for finding equivalences of trilinear forms over finite fields modulo linear group actions. Our focus is on two problems under this umbrella, Matrix Code Equivalence (MCE) and Alternating Trilinear Form Equivalence (ATFE), since their hardness is the foundation of the NIST round-1 signature candidates MEDS and ALTEQ respectively. We present new algorithms for MCE and ATFE, which are further development of the algorithms for polynomial isomorphism and alternating trilinear form equivalence, in particular by Bouillaguet, Fouque, and Véber (Eurocrypt 2013), and Beullens (Crypto 2023). Key ingredients in these algorithms are new easy-to-compute distinguishing invariants under the respective group actions. For MCE, we associate easy-to-compute isomorphism invariants to corank-1 points of matrix codes, which lead to a birthday-type algorithm. We present empirical justifications that these isomorphism invariants are easy-to-compute and distinguishing, and provide an implementation of this algorithm. This algorithm has some implications to the security of MEDS. The invariant function for ATFE is similar, except it is associated with lower rank points. Modulo certain assumptions on turning the invariant function into canonical forms, our algorithm for ATFE improves on the runtime of the previously best known algorithm of Buellens (Crypto 2023). Finally, we present quantum variants of our classical algorithms with cubic runtime improvements.
2024
EUROCRYPT
Efficient Arithmetic in Garbled Circuits
Garbled Circuit (GC) techniques usually work with Boolean circuits. Despite intense interest, efficient arithmetic generalizations of GC were only known from strong assumptions, such as LWE. We construct symmetric-key-based arithmetic garbled circuits from circular correlation robust hashes, the assumption underlying the celebrated Free XOR garbling technique. Let $\lambda$ denote a security parameter, and consider the integers $\Z_m$ for any $m \geq 2$. Let $\ell = \lceil \log_2 m \rceil$ be the bit length of $\Z_m$ values. We garble arithmetic circuits over $\Z_m$ where the garbling of each gate has size $O(\ell \cdot \lambda)$ bits. Contrast this with Boolean-circuit-based arithmetic, requiring $O(\ell^2\cdot \lambda)$ bits via the schoolbook multiplication algorithm, or $O(\ell^{1.585}\cdot \lambda)$ bits via Karatsuba's algorithm. Our arithmetic gates are compatible with Boolean operations and with Garbled RAM, allowing to garble complex programs of arithmetic values.