International Association for Cryptologic Research

International Association
for Cryptologic Research

IACR News

Updates on the COVID-19 situation are on the Announcement channel.

Here you can see all recent updates to the IACR webpage. These updates are also available:

RSS symbol icon
via RSS feed
Twitter bird icon
via Twitter
Weibo icon
via Weibo
Facebook icon
via Facebook

16 May 2023

Saleh Khalaj Monfared, Tahoura Mosavirik, Shahin Tajik
ePrint Report ePrint Report
The threat of physical side-channel attacks and their countermeasures is a widely researched field. Most physical side-channel attacks rely on the unavoidable influence of computation or storage on voltage or current fluctuations. Such data-dependent influence can be exploited by, for instance, power or electromagnetic analysis. In this work, we introduce a novel non-invasive physical side-channel attack, which exploits the data-dependent changes in the impedance of the chip. Our attack relies on the fact that the temporarily stored contents in registers alter the physical characteristics of the circuit, which results in changes in the die's impedance. To sense such impedance variations, we deploy a well-known RF/microwave method called scattering parameter analysis, in which we inject sine wave signals with high frequencies into the system's power distribution network (PDN) and measure the echo of the signals. We demonstrate that according to the content bits and physical location of a register, the reflected signal is modulated differently at various frequency points enabling the simultaneous and independent probing of individual registers. Such side-channel leakage violates the $t$-probing security model assumption used in masking, which is a prominent side-channel countermeasure. To validate our claims, we mount non-profiled and profiled impedance analysis attacks on hardware implementations of unprotected and high-order masked AES. We show that in the case of profiled attack, only a single trace is required to recover the secret key. Finally, we discuss how a specific class of hiding countermeasures might be effective against impedance leakage.
Expand
Yupu Hu, Dong Siyue, Wang Baocang, Dong Xingting
ePrint Report ePrint Report
Indistinguishability obfuscation (IO) is at the frontier of cryptography research for several years. LV16/Lin17 obfuscation schemes are famous progresses towards simplifying obfuscation mechanism. In fact, these two schemes only constructed two compact functional encryption (CFE) algorithms, while other things were taken to AJ15 IO frame or BV15 IO frame. That is, CFE algorithms are inserted into AJ15 IO frame or BV15 IO frame to form a complete IO scheme. The basic structure of two CFE algorithms can be described in the following way. The polynomial-time-computable Boolean function is transformed into a group of low-degree low-locality component functions by using randomized encoding, while some public combination of values of component functions is the value of original Boolean function. The encryptor uses constant-degree multilinear maps (rather than polynomial-degree multilinear maps) to encrypt independent variables of component functions. The decryptor uses zero-testing tool of multilinear maps to obtain values of component functions (rather than to obtain values of independent variables), and then uses public combination to obtain the value of original Boolean function.

In this paper we restrict IO to be a real white box (RWB). Under such restriction we point out that LV16/Lin17 CFE algorithms being inserted into AJ15 IO frame are invalid. More detailedly, such insertion makes the adversary gradually learn the shape of the function, therefore the scheme is not secure. In other words, such scheme is not a real IO scheme, but rather a garbling scheme. It needs to be said that RWB restriction is reasonable, which means the essential contribution of IO for cryptography research.
Expand
Quang Dao, Jim Miller, Opal Wright, Paul Grubbs
ePrint Report ePrint Report
A flurry of excitement amongst researchers and practitioners has produced modern proof systems built using novel technical ideas and seeing rapid deployment, especially in cryptocurrencies. Most of these modern proof systems use the Fiat-Shamir (F-S) transformation, a seminal method of removing interaction from a protocol with a public-coin verifier. Some prior work has shown that incorrectly applying F-S (i.e., using the so-called "weak" F-S transformation) can lead to breaks of classic protocols like Schnorr's discrete log proof; however, little is known about the risks of applying F-S incorrectly for modern proof systems seeing deployment today.

In this paper, we fill this knowledge gap via a broad theoretical and practical study of F-S in implementations of modern proof systems. We perform a survey of open-source implementations and find 36 weak F-S implementations affecting 12 different proof systems. For four of these---Bulletproofs, Plonk, Spartan, and Wesolowski's VDF---we develop novel knowledge soundness attacks accompanied by rigorous proofs of their efficacy. We perform case studies of applications that use vulnerable implementations, and demonstrate that a weak F-S vulnerability could have led to the creation of unlimited currency in a private blockchain protocol. Finally, we discuss possible mitigations and takeaways for academics and practitioners.
Expand
Ginevra Giordani, Lorenzo Grassi, Silvia Onofri, Marco Pedicini
ePrint Report ePrint Report
The construction of invertible non-linear layers over $\mathbb F_p^n$ that minimize the multiplicative cost is crucial for the design of symmetric primitives targeting Multi Party Computation (MPC), Zero-Knowledge proofs (ZK), and Fully Homomorphic Encryption (FHE). At the current state of the art, only few non-linear functions are known to be invertible over $\mathbb F_p$, as the power maps $x\mapsto x^d$ for $\gcd(d,p-1)=1$. When working over $\mathbb F_p^n$ for $n\ge2$, a possible way to construct invertible non-linear layers $\mathcal S$ over $\mathbb F_p^n$ is by making use of a local map $F:\mathbb F_p^m\rightarrow \mathbb F_p$ for $m\le n$, that is, $\mathcal S_F(x_0, x_1, \ldots, x_{n-1}) = y_0\|y_1\|\ldots \|y_{n-1}$ where $y_i = F(x_i, x_{i+1}, \ldots, x_{i+m-1})$. This possibility has been recently studied by Grassi, Onofri, Pedicini and Sozzi at FSE/ToSC 2022. Given a quadratic local map $F:\mathbb F_p^m \rightarrow \mathbb F_p$ for $m\in\{1,2,3\}$, they proved that the shift-invariant non-linear function $\mathcal S_F$ over $\mathbb F_p^n$ defined as before is never invertible for any $n\ge 2\cdot m-1$. In this paper, we face the problem by generalizing such construction. Instead of a single local map, we admit multiple local maps, and we study the creation of nonlinear layers that can be efficiently verified and implemented by a similar shift-invariant lifting. After formally defining the construction, we focus our analysis on the case $\mathcal S_{F_0, F_1}(x_0, x_1, \ldots, x_{n-1}) = y_0\|y_1\|\ldots \|y_{n-1}$ for $F_0, F_1 :\mathbb F_p^2\rightarrow \mathbb F_p$ of degree at most 2. This is a generalization of the previous construction using two alternating functions $F_0,F_1$ instead of a single $F$. As main result, we prove that (i) if $n\ge3$, then $\mathcal S_{F_0, F_1}$ is never invertible if both $F_0$ and $F_1$ are quadratic, and that (ii) if $n\ge 4$, then $\mathcal S_{F_0, F_1}$ is invertible if and only if it is a Type-II Feistel scheme.
Expand
Erica Blum, Jonathan Katz, Julian Loss, Kartik Nayak, Simon Ochsenreither
ePrint Report ePrint Report
Protocols for state-machine replication (SMR) often trade off performance for resilience to network delay. In particular, protocols for asynchronous SMR tolerate arbitrary network delay but sacrifice throughput/latency when the network is fast, while partially synchronous protocols have good performance in a fast network but fail to make progress if the network experiences high delay. Existing hybrid protocols are resilient to arbitrary network delay and have good performance when the network is fast, but suffer from high overhead (``thrashing'') if the network repeatedly switches between being fast and slow (e.g., in a network that is typically fast but has intermittent message delays).

We propose Abraxas, a generic approach for constructing a hybrid protocol based on any protocol $\Pi_\mathsf{fast}$ and any asynchronous protocol $\Pi_\mathsf{slow}$ to achieve (1)~security and performance equivalent to $\Pi_\mathsf{slow}$ under arbitrary network behavior; (2)~performance equivalent to $\Pi_\mathsf{fast}$ when conditions are favorable. We instantiate Abraxas with the best existing protocols for $\Pi_\mathsf{fast}$ (Jolteon) and $\Pi_\mathsf{slow}$ (2-chain VABA), and show experimentally that the resulting protocol significantly outperforms Ditto, the previous state-of-the-art hybrid protocol.
Expand
Angelique Faye Loe, Liam Medley, Christian O'Connell, Elizabeth A. Quaglia
ePrint Report ePrint Report
A whistleblower is a person who leaks sensitive information on a prominent individual or organisation engaging in an unlawful or immoral activity. Whistleblowing has the potential to mitigate corruption and fraud by identifying the misuse of capital. In extreme cases whistleblowing can also raise awareness about unethical practices to individuals by highlighting dangerous working conditions. Obtaining and sharing the sensitive information associated with whistleblowing can carry great risk to the individual or party revealing the data. In this paper we extend the notion of timed-release encryption to include a new security property which we term implicit authentication, with the goal of making the practice of whistleblowing safer.

We formally define the new primitive of timed-release encryption with implicit authentication (TRE-IA), providing rigorous game-base definitions. We then build a practical TRE-IA construction that satisfies the security requirements of this primitive, using repeated squaring in an RSA group, and the RSA-OAEP encryption scheme. We formally prove our construction secure and provide a performance analysis of our implementation in Python along with recommendations for practical deployment and integration with an existing whistleblowing tool SecureDrop.
Expand
Liam Medley, Angelique Faye Loe, Elizabeth A. Quaglia
ePrint Report ePrint Report
In this work, we provide a systematisation of knowledge of delay-based cryptography, in which we discuss and compare the existing primitives within cryptography that utilise a time-delay. We start by considering the role of time within cryptography, explaining broadly what a delay aimed to achieve at its inception and now, in the modern age. We then move on to describing the underlying assumptions used to achieve these goals, and analyse topics including trust, decentralisation and concrete methods to implement a delay. We then survey the existing primitives, discussing their security properties, instantiations and applications. We make explicit the relationships between these primitives, identifying a hierarchy and the theoretical gaps that exist. We end this systematisation of knowledge by highlighting relevant future research directions within the field of delay-based cryptography, from which this area would greatly benefit.
Expand
Raziyeh Salarifard, Hadi Soleimany
ePrint Report ePrint Report
The Number Theoretic Transform (NTT) is used to efficiently execute polynomial multiplication. It has become an important part of lattice-based post-quantum methods and the subsequent generation of standard cryptographic systems. However, implementing post-quantum schemes is challenging since they rely on intricate structures. This paper demonstrates how to develop a high-speed NTT multiplier highly optimized for FPGAs with few logical resources. We describe a novel architecture for NTT that leverages unique precomputation. Our method efficiently maps these specific pre-computed values into the built-in Block RAMs (BRAMs), which greatly reduces the area and time required for implementation when compared to previous works. We have chosen Kyber parameters to implement the proposed architectures. Compared to the most well-known approach for implementing Kyber’s polynomial multiplication using NTT, the time is reduced by 31%, and AT (area × time) is improved by 25% as a result of the pre computation we suggest in this study. It is worth mentioning that we obtained these improvements while our method does not require DSP.
Expand

15 May 2023

Foo Yee Yeo, Jason H. M. Ying
ePrint Report ePrint Report
Private set intersection (PSI) enables two parties, each holding a private set to compute their intersection without revealing other information in the process. We introduce a generalized extension of conventional PSI termed as third-party PSI, whereby the intersection output of the two parties is only known to an inputless third party. In this setting, the two parties who participate in the protocol have no knowledge of the intersection result or any information of the set content of the other party. In general, third-party PSI settings arise where there is a need for an external party to obtain the intersection outcome without leakage of additional information to any other party. This setting is motivated by an increasing importance in several real-world applications. We describe protocols which achieve this functionality with minimal communication overhead. To the best of knowledge, our work is the first of its kind to explore this variant of PSI.
Expand
Zhengjun Cao, Lihua Liu
ePrint Report ePrint Report
We show that the key agreement scheme [Comput. Commun., 2021(170): 1--18] is insecure against impersonation attacks, because there is a trivial equality which results in the loss of data confidentiality.
Expand
Hannah Keller, Claudio Orlandi, Anat Paskin-Cherniavsky, Divya Ravi
ePrint Report ePrint Report
The bottleneck-complexity (BC) of secure multiparty computation (MPC) protocols is a measure of the maximum number of bits which are sent and received by any party in a protocol. As the name suggests, the goal of studying BC-efficient protocols is to increase overall efficiency by making sure that the workload in the protocol is somehow "amortized'' by the protocol participants.

Orlandi et al. (PKC 2022) initiated the study of BC-efficient protocols from simple assumptions in the correlated randomness model and for semi-honest adversaries. In this work, we extend the study of Orlandi et al. in two primary directions: (a) to a larger and more general class of functions and (b) to the information-theoretic setting.

In particular, we offer semi-honest secure protocols for the useful function classes of abelian programs, 'read-$k$' non-abelian programs, and 'read-$k$' generalized formulas.

Our constructions use a novel abstraction, called 'incremental function secret-sharing' (IFSS), that can be instantiated with unconditional security or from one-way functions (with different efficiency trade-offs).
Expand
Anup Kumar Kundu, Shibam Ghosh, Dhiman Saha, Mostafizar Rahman
ePrint Report ePrint Report
The division property introduced by Todo in Crypto 2015 is one of the most versatile tools in the arsenal of a cryptanalyst which has given new insights into many ciphers primarily from an algebraic perspective. On the other end of the spectrum we have fault attacks which have evolved into the deadliest of all physical attacks on cryptosystems. The current work aims to combine these seemingly distant tools to come up with a new type of fault attack. We show how fault invariants are formed under special input division multi-sets and are independent of the fault injection location. It is further shown that the same division trail can be exploited as a multi-round Zero-Sum distinguisher to reduce the key-space to practical limits. As a proof of concept division trails of PRESENT and GIFT are exploited to mount practical key-recovery attacks based on the random nibble fault model. For GIFT-64, we are able to recover the unique master-key with 30 nibble faults with faults injected at rounds 21 and 19. For PRESENT-80, DiFA reduces the key-space from $2^{80}$ to $2^{16}$ with 15 faults in round 25 while for PRESENT-128, the unique key is recovered with 30 faults in rounds 25 and 24. This constitutes the best fault attacks on these ciphers in terms of fault injection rounds. We also report an interesting property pertaining to fault induced division trails which shows its inapplicability to attack GIFT-128. Overall, the usage of division trails in fault based cryptanalysis showcases new possibilities and reiterates the applicability of classical cryptanalytic tools in physical attacks.
Expand
Colin Steidtmann, Sanjay Gollapudi
ePrint Report ePrint Report
Zero-knowledge proofs and arithmetic circuits are essential building blocks in modern cryptography, but comparing their efficiency across different implementations can be challenging. In this paper, we address this issue by presenting comprehensive benchmarking results for a range of signature schemes and hash functions implemented in Circom, a popular circuit language that has not been extensively benchmarked before. Our benchmarking statistics include prover time, verifier time, and proof size, and cover a diverse set of schemes including Poseidon, Pedersen, MiMC, SHA-256, ECDSA, EdDSA, Sparse Merkle Tree, and Keccak-256. We also introduce a new Circom circuit and a full JavaScript test suite for the Schnorr signature scheme. Our results offer valuable insights into the relative strengths and weaknesses of different schemes and frameworks, and confirm the theoretical predictions with precise real-world data. Our findings can guide researchers and practitioners in selecting the most appropriate scheme for their specific applications, and can serve as a benchmark for future research in this area.
Expand
Rishabh Bhadauria, Carmit Hazay, Muthuramakrishnan Venkitasubramaniam, Wenxuan Wu, Yupeng Zhang
ePrint Report ePrint Report
Polynomial commitment schemes allow a prover to commit to a polynomial and later reveal the evaluation of the polynomial on an arbitrary point along with proof of validity. This object is central in the design of many cryptographic schemes such as zero-knowledge proofs and verifiable secret sharing. In the standard definition, the polynomial is known to the prover whereas the evaluation points are not private. In this paper, we put forward the notion of private polynomial commitments that capture additional privacy guarantees, where the evaluation points are hidden from the verifier while the polynomial is hidden from both. We provide concretely efficient constructions that allow simultaneously batch the verification of many evaluations with a small additive overhead. As an application, we design a new concretely efficient multi-party private set-intersection with malicious security and improved asymptotic communication and space complexities. We demonstrate the concrete efficiency of our construction via an implementation. Our scheme can prove $2^{10}$ evaluations of a private polynomial of degree $2^{10}$ in 157s. The proof size is only 169KB and the verification time is 11.8s. Moreover, we also implemented the multi-party private set intersection protocol and scale it to 1000 parties (which has not been shown before). The total running time for $2^{14}$ elements per party is 2,410 seconds. While existing protocols offer better computational complexity, our scheme offers significantly smaller communication and better scalability (in the number of parties) owing to better memory usage.
Expand
Xiaohai Dai, Bolin Zhang, Hai Jin, Ling Ren
ePrint Report ePrint Report
To reduce latency and communication overhead of asynchronous Byzantine Fault Tolerance (BFT) consensus, an optimistic path is often added, with Ditto and BDT as state-of-the-art representatives. These protocols first attempt to run an optimistic path that is typically adapted from partially-synchronous BFT and promises good performance in good situations. If the optimistic path fails to make progress, these protocols switch to a pessimistic path after a timeout, to guarantee liveness in an asynchronous network. This design crucially relies on an accurate estimation of the network delay Δ to set the timeout parameter correctly. A wrong estimation of Δ can lead to either premature or delayed switching to the pessimistic path, hurting the protocol's efficiency in both cases.

To address the above issue, we propose ParBFT, which employs a parallel optimistic path. As long as the leader of the optimistic path is non-faulty, ParBFT ensures low latency without requiring an accurate estimation of the network delay. We propose two variants of ParBFT, namely ParBFT1 and ParBFT2, with a trade-off between latency and communication. ParBFT1 simultaneously launches the two paths, achieves lower latency under a faulty leader, but has a quadratic message complexity even in good situations. ParBFT2 reduces the message complexity in good situations by delaying the pessimistic path, at the cost of a higher latency under a faulty leader. Experimental results demonstrate that ParBFT outperforms Ditto or BDT. In particular, when the network condition is bad, ParBFT can reach consensus through the optimistic path, while Ditto and BDT suffer from path switching and have to make progress using the pessimistic path.
Expand
Archisman Ghosh, Jose Maria Bermudo Mera, Angshuman Karmakar, Debayan Das, Santosh Gosh, Ingrid Verbauwhede, Shreyas Sen
ePrint Report ePrint Report
The hard mathematical problems that assure the security of our current public-key cryptography (RSA, ECC) are broken if and when a quantum computer appears rendering them ineffective for use in the quantum era. Lattice based cryptography is a novel approach to public key cryptography, of which the mathematical investigation (so far) resists attacks from quantum computers. By choosing a module learning with errors (MLWE) algorithm as the next standard, National Institute of Standard \& Technology (NIST) follows this approach. The multiplication of polynomials is the central bottleneck in the computation of lattice based cryptography. Because public key cryptography is mostly used to establish common secret keys, focus is on compact area, power and energy budget and to a lesser extent on throughput or latency. While most other work focuses on optimizing number theoretic transform (NTT) based multiplications, in this paper we highly optimize a Toom-Cook based multiplier. We demonstrate that a memory-efficient striding Toom-Cook with lazy interpolation, results in a highly compact, low power implementation, which on top enables a very regular memory access scheme. To demonstrate the efficiency, we integrate this multiplier into a Saber post-quantum accelerator, one of the four NIST finalists. Algorithmic innovation to reduce active memory, timely clock gating and shift-add multiplier has helped to achieve 38\% less power than state-of-the art PQC core, 4 $\times$ less memory, 36.8\% reduction in multiplier energy and 118$\times$ reduction in active power with respect to state-of-the-art Saber accelerator (not silicon verified). This accelerator consumes $0.158mm^2$ active area which is lowest reported till date despite process disadvantages of the state-of-the-art designs.
Expand
Barbara Gigerl, Robert Primas, Stefan Mangard
ePrint Report ePrint Report
Cryptographic software running on embedded devices requires protection against physical side-channel attacks such as power analysis. Masking is a widely deployed countermeasure against these attacksand is directly implemented on algorithmic level. Many works study the security of masked cryptographic software on CPUs, pointing out potential problems on algorithmic/microarchitecture-level, as well as corresponding solutions, and even show masked software can be implemented efficiently and with strong (formal) security guarantees. However, these works also make the implicit assumption that software is executed directly on the CPU without any abstraction layers in-between, i.e., they focus exclusively on the bare-metal case. Many practical applications, including IoT and automotive/industrial environments, require multitasking embedded OSs on which masked software runs as one out of many concurrent tasks. For such applications, the potential impact of events like context switches on the secure execution of masked software has not been studied so far at all.

In this paper, we provide the first security analysis of masked cryptographic software spanning all three layers (SW, OS, CPU). First, we apply a formal verification approach to identify leaks within the execution of masked software that are caused by the embedded OS itself, rather than on algorithmic or microarchitecture level. After showing that these leaks are primarily caused by context switching, we propose several different strategies to harden a context switching routine against such leakage, ultimately allowing masked software from previous works to remain secure when being executed on embedded OSs. Finally, we present a case study focusing on FreeRTOS, a popular embedded OS for embedded devices, running on a RISC-V core, allowing us to evaluate the practicality and ease of integration of each strategy.
Expand
Jikang Lin, Jiahui He, Yanhong Fan, Meiqin Wang
ePrint Report ePrint Report
Low energy is an important aspect of hardware implementation. For energy-limited battery-powered devices, low energy stream ciphers can play an important role. In \texttt{IACR ToSC 2021}, Caforio et al. proposed the Perfect Tree energy model for stream cipher that links the structure of combinational logic circuits with state update functions to energy consumption. In addition, a metric given by the model shows a negative correlation with energy consumption, i.e., the higher the balance of the perfect tree, the lower the energy consumption. However, Caforio et al. didn't give a method that eliminate imbalances of the unrolled strand tree for the existing stream ciphers.

In this paper, based on the Perfect Tree energy model, we propose a new redundant design model that improve the balances of the unrolled strand tree for the purpose of reducing energy consumption. In order to obtain the redundant design, we propose a search algorithm for returning the corresponding implementation scheme. For the existing stream ciphers, the proposed model and search method can be used to provide a low-power redundancy design scheme. To verify the effectiveness, we apply our redundant model and search method in the stream ciphers (e.g., \texttt{Trivium} and \texttt{Kreyvium}) and conducted a synthetic test. The results of the energy measurement demonstrate that the proposed model and search method can obtain lower energy consumption.
Expand
Xiao Lan, Hongjian Jin, Hui Guo, Xiao Wang
ePrint Report ePrint Report
Computing the quantile of a massive data stream has been a crucial task in networking and data management. However, existing solutions assume a centralized model where one data owner has access to all data. In this paper, we put forward a study of secure quantile aggregation between private data streams, where data streams owned by different parties would like to obtain a quantile of the union of their data without revealing anything else about their inputs. To this end, we designed efficient cryptographic protocols that are secure in the semi-honest setting as well as the malicious setting. By incorporating differential privacy, we further improve the efficiency by 1.1× to 73.1×. We implemented our protocol, which shows practical efficiency to aggregate real-world data streams efficiently.
Expand
Kazuma Taka, Tatusya Ishikawa, Kosei Sakamoto, Takanori Isobe
ePrint Report ePrint Report
As low-latency designs tend to have a small number of rounds to decrease latency, the differential-type cryptanalysis can become a significant threat to them. In particular, since a multiple-branch-based design, such as Orthros can have the strong clustering effect on differential attacks due to its large internal state, it is crucial to investigate the impact of the clustering effect in such a design. In this paper, we present a new SAT-based automatic search method for evaluating the clustering effect in the multiple-branch-based design. By exploiting an inherent trait of multiple-branch-based designs, our method enables highly efficient evaluations of clustering effects on this-type designs. % that a conventional method by automatic search tools. We apply our method to the low-latency PRF Orthros, and show a best differential distinguisher reaching up to 7 rounds of Orthros with $2^{116.806}$ time/data complexity and 9-round distinguisher for each underlying permutation which is 2 more rounds than known longest distinguishers. Besides, we update the designer's security bound for differential attacks based on the lower bounds for the number of active S-boxes, and obtain the optimal differential characteristic of Orthros, Branch 1, and Branch 2 for the first time. Consequently, we improve the designer's security bound from 9/12/12 to 7/10/10 rounds for Orthros/Branch 1/Branch 2 based on a single differential characteristic.
Expand
◄ Previous Next ►