Weijia Wang

CryptoDB

Weijia Wang

ORCID: 0000-0001-6982-2537

Publications and invited talks

Year

Venue

Title

2025

EUROCRYPT

Tighter Security Notions for a Modular Approach to Private Circuits Abstract

Bohan Wang Juelin Zhang Yu Yu Weijia Wang

To counteract side-channel attacks, a masking scheme splits each intermediate variable into $n$ shares and transforms each elementary operation (e.g., field addition and multiplication) to the masked correspondence called gadget, such that intrinsic noise in the leakages renders secret recovery infeasible in practice. A simple and efficient security notion is the probing model ensuring that any $n-1$ shares are independently distributed from the secret input. One requirement of the probing model is that the noise in the leakages should increase with the number of shares, largely restricting the side-channel security in the low-noise scenario. Another security notion for masking, called the random probing model, allows each variable to leak with a probability $p$. While this model reflects the physical reality of side channels much better, it brings significant overhead. At Crypto 2018, Ananth et al. proposed a modular approach that can provide random probing security for any security level by expanding small base gadgets with $n$ share recursively, such that the tolerable leakage probability $p$ decreases with $n$ while the security increases exponentially with the recursion depth of expansion. Then, Bela{\"{i}}d et al. provided a formal security definition called Random Probing Expandability~(RPE) and an explicit framework using the modular approach to construct masking schemes at Crypto 2020. In this paper, we investigate how to tighten the RPE definition via allowing the dependent failure probabilities of multiple inputs, which results in a new definition called related RPE. It can be directly used for the expansion of multiplication gates and reduce the complexity of the base multiplication gadget from $\mathcal{O}(n^2\log n)$ proposed at Asiacrypt 2021 to $\mathcal{O}(n^2)$ and maintain the same security level. Furthermore, we describe a method to expand any gates (rather than only multiplication) with the related RPE gadgets. Besides, we denote another new RPE definition called Multiple inputs RPE used for the expansion of multiple-input gates composed with any gates. Utilizing these methods, we reduce the complexity of the 3-share circuit compiler to $\mathcal{O}(|C|\cdot\kappa^{3.2})$, where $|C|$ is the size of the unprotected circuit and the protection failure probability of the global circuit is $2^{-\kappa}$. In comparison, the complexity of the state-of-the-art work, proposed at Eurocrypt 2021, is $\mathcal{O}(|C|\cdot\kappa^{3.9})$ for the same value of $n$. Additionally, we provide the construction of a 5-share circuit compiler with a complexity $\mathcal{O}(|C|\cdot\kappa^{2.8})$.

2025

PKC

Thorough Power Analysis on Falcon Gaussian Samplers and Practical Countermeasure Abstract

Xiuhan Lin Shiduo Zhang Yang Yu Weijia Wang Qidi You Ximing Xu Xiaoyun Wang

Falcon is one of post-quantum signature schemes selected by NIST for standardization. With the deployment underway, its implementation security is of great importance. In this work, we focus on the side-channel security of Falcon and our contributions are threefold. First, by exploiting the symplecticity of NTRU and a recent decoding technique, we dramatically improve the key recovery using power leakages within Falcon Gaussian samplers. Compared to the state of the art (Zhang, Lin, Yu and Wang, EUROCRYPT 2023), the amount of traces required by our attack for a full key recovery is reduced by at least 85%. Secondly, we present a complete power analysis for two exposed power leakages within Falcon’s integer Gaussian sampler. We identify new sources of these leakages, which have not been identified by previous works, and conduct detailed security evaluations within the reference implementation of Falcon on Chipwhisperer. Thirdly, we propose effective and easy-to-implement countermeasures against both two leakages to protect the whole Falcon’s integer Gaussian sampler. Configured with our countermeasures, we provide security evaluations on Chipwhisperer and report performance of protected implementation. Experimental results highlight that our countermeasures admit a practical trade-off between effciency and side-channel security.

2025

TCHES

Rejected Signatures’ Challenges Pose New Challenges: Key Recovery of CRYSTALS-Dilithium via Side-Channel Attacks Abstract

Yuanyuan Zhou Weijia Wang Yiteng Sun Yu Yu

Rejection sampling is a crucial security mechanism in lattice-based signature schemes that follow the Fiat-Shamir with aborts paradigm, such as MLDSA/ CRYSTALS-Dilithium. This technique transforms secret-dependent signature samples into ones that are statistically close to a secret-independent distribution (in the random oracle model). While many side-channel attacks have directly targeted sensitive data such as nonces, secret keys, and decomposed commitments, fewer studies have explored the potential leakage associated with rejection sampling. Notably, at HOST 2021, Karabulut et al. showed that leakage from rejected signatures’ challenges can undermine, but not entirely break, the security of the Dilithium scheme.Motivated by the above, we convert the problem of key recovery (from the leakage of rejection sampling) to an integer linear programming problem (ILP), where rejected responses of unique Hamming weights set upper/lower constraints of the product between the challenge and the private key. We formally study the worst-case complexity of the problem as well as empirically confirm the practicality of the rejected signature’s challenge attack. For all three security levels of Dilithium-2/3/5, our attack recovers the private key in seconds or minutes with a 100% Success Rate (SR).Our attack leverages knowledge of the rejected signature’s challenge and response, and thus we propose methods to extract this information by exploiting single-trace sidechannel leakage from Number Theoretic Transform (NTT) operations and functions associated with the response generation procedure. We demonstrate the practicality of this rejected signature’s challenge attack by using real power consumption on an ARM Cortex-M4 microcontroller. To the best of our knowledge, it is the first practical and efficient side-channel key recovery attack on ML-DSA/Dilithium that targets the rejection sampling procedure. Furthermore, we discuss some countermeasures to mitigate this security issue.

2024

TCHES

eLIMInate: a Leakage-focused ISE for Masked Implementation Abstract

Hao Cheng Daniel Page Weijia Wang

Even given a state-of-the-art masking scheme, masked software implementation of some cryptography functionality can pose significant challenges stemming, e.g., from simultaneous requirements for efficiency and security. In this paper we design an Instruction Set Extension (ISE) to address a specific element of said challenge, namely the elimination of leakage stemming from architectural and microarchitectural overwriting. Conceptually, the ISE allows a leakage-focused behavioural hint to be communicated from software to the micro-architecture: using it informs how computation is realised when applied to masking-specific data, which then offers an opportunity to eliminate associated leakage. We develop prototype, latencyand area-optimised implementations of the ISE design based on the RISC-V Ibex core. Using them, we demonstrate that use of the ISE can close the gap between assumptions about and actual behaviour of a device and thereby deliver an improved security guarantee.

2024

TCHES

Efficient Table-Based Masking with Pre-processing Abstract

Juelin Zhang Taoyun Wang Yiteng Sun Fanjie Ji Bohan Wang Lu Li Yu Yu Weijia Wang

Masking is one of the most investigated countermeasures against sidechannel attacks. In a nutshell, it randomly encodes each sensitive variable into a number of shares, and compiles the cryptographic implementation into a masked one that operates over the shares instead of the original sensitive variables. Despite its provable security benefits, masking inevitably introduces additional overhead. Particularly, the software implementation of masking largely slows down the cryptographic implementations and requires a large number of random bits that need to be produced by a true random number generator. In this respect, reducing the< overhead of masking is still an essential and challenging task. Among various known schemes, Table-Based Masking (TBM) stands out as a promising line of work enjoying the advantages of generality to any lookup tables. It also allows the pre-processing paradigm, wherein a pre-processing phase is executed independently of the inputs, and a much more efficient online (using the precomputed tables) phase takes place to calculate the result. Obviously, practicality of pre-processing paradigm relies heavily on the efficiency of online phase and the size of precomputed tables.In this paper, we investigate the TBM scheme that offers a combination of linear complexity (in terms of the security order, denoted as d) during the online phase and small precomputed tables. We then apply our new scheme to the AES-128, and provide an implementation on the ARM Cortex architecture. Particularly, for a security order d = 8, the online phase outperforms the current state-of-the-art AES implementations on embedded processors that are vulnerable to the side-channel attacks. The security order of our scheme is proven in theory and verified by the T-test in practice. Moreover, we investigate the speed overhead associated with the random bit generation in our masking technique. Our findings indicate that the speed overhead can be effectively balanced. This is mainly because that the true random number generator operates in parallel with the processor’s execution, ensuring a constant supply of fresh random bits for the masked computation at regular intervals.

2024

TCHES

Extending Randomness-Free First-Order Masking Schemes and Applications to Masking-Friendly S-boxes Abstract

Lixuan Wu Yanhong Fan Weijia Wang Bart Preneel Meiqin Wang

Masking has emerged as a widely adopted countermeasure against side-channel attacks. However, the implementation of masking schemes faces several challenges, including hardware area, latency and the overhead associated with fresh randomness generation. To eliminate the implementation cost caused by fresh randomness, Shahmirzadi et al. introduced a methodology for constructing 2-share first-order masking schemes without randomness at CHES 2021. In this work, we extend Shahmirzadi et al.’s method to find masked implementations for more S-boxes and further reduce the hardware overhead. We propose the concept of a non-linear compression layer, a comprehensive share assignment strategy based on a linear compression layer, and corresponding optimization techniques. Based on these techniques, we construct the first randomness-free first-order masking schemes for the PRINCE S-box and its inverse, reduce the hardware overhead of masking schemes for multiple S-boxes, and design new masking-friendly S-boxes. Particularly for the SKINNY S-box, the reduction is 21% and 15% in area and power consumption, respectively. To validate the security of masked implementations, we not only employ the automated tools SILVER and PROLEAD but also conduct FPGA-based experiments.

2024

TCHES

Random Probing Security with Precomputation Abstract

Bohan Wang Fanjie Ji Yiteng Sun Weijia Wang

At Eurocrypt 2014, Duc, Dziembowski and Faust proposed the random probing model to bridge the gap between the probing model proposed at Crypto 2003 and the noisy model proposed at Eurocrypt 2013. Compared with the probing model whose noise in the leakages should (linearly) increase with the number of shares, the random probing model allows each variable leak its value with a probability p, which reflects the physical reality of side channels much better. In Crypto 2020, Belaïd et al. proposed the Random Probing Expandability (RPE) security ensuring the random probing security for arbitrary order masking algorithms with constant leakage probability. However, the complexity of existing RPE algorithms is much higher than that of the probing secure algorithms, which is short of practical usage. In this paper, we investigate the random probing security with precomputation, where a masked cryptographic implementation can be divided into two phases. The first phase, called preprocessing, takes random bits and returns a number of precomputed values. The second phase, called online computation, takes input (e.g., plaintext and shares of secret) and precomputed values to calculate output (e.g., ciphertext) efficiently. We describe a random probing secure precomputable scheme, which transforms an arbitrary circuit compiler with tolerant leakage probability p into a precomputable one by adding a public (but random) share that is calculated in the online phase and the tolerant leakage probability of the new compiler is min{p, 2−5.01}. Then, we apply the new scheme to the bitsliced AES. Notably, the implementation under ARM Cortex M architecture shows that the performance of the online phase is significantly improved and even comparable to masking schemes only secure in the probing model.

2023

EUROCRYPT

Improved Power Analysis Attacks on Falcon Abstract

Shiduo Zhang Xiuhan Lin Yang Yu Weijia Wang

Falcon is one of the three post-quantum signature schemes selected for standardization by NIST. Due to its low bandwidth and high efficiency, Falcon is seen as an attractive option for quantum-safe embedded systems. In this work, we study Falcon’s side-channel resistance by analysing its Gaussian samplers. Our results are mainly twofold. The first result is an improved key recovery exploiting the leakage within the base sampler investigated by Guerreau et al. (CHES 2022). Instead of resorting to the fourth moment as in former parallelepiped-learning attacks, we work with the second order statistics covariance and use its spectral decomposition to recover the secret information. Our approach substantially reduces the the requirement of measurements and computation resources: 220 000 traces is sufficient to recover the secret key of Falcon-512 within half an hour with a probability of ≈ 25%. As a comparison, even with 106 traces, the former attack still needs about 1000 hours CPU time of lattice reduction for a full key recovery. In addition, our approach is robust to inaccurate leakage classification, which is another advantage over parallelepiped-learning attacks. Our second result is a practical power analysis targeting the integer Gaussian sampler of Falcon. The analysis relies on the leakage of random sign flip within the integer Gaussian sampling. This leakage was exposed in 2018 by Kim and Hong, but it is not considered in the Falcon’s implementation and unexploited for side-channel analysis until now. We identify the leakage within the reference implementation of Falcon on an ARM Cortex-M4 STM32F407IGT6 microprocessor. We also show that this single bit of leakage is in effect enough for practical key recovery: with 170 000 traces one can fully recover the key of Falcon-512 within half an hour. Furthermore, combining the sign leakage and the aforementioned leakage, one can recover the key with only 45 000 signature measurements in a short time. As a by-product, we also extend our power analysis to Mitaka that is a recent variant of Falcon. The same leakages exist within the integer Gaussian samplers of Mitaka, and they can also be used to mount key recovery attacks. Nevertheless, the key recovery in Mitaka requires much more traces than it does in Falcon, due to their different lattice Gaussian samplers.

2023

TCHES

Efficient Private Circuits with Precomputation Abstract

Weijia Wang Fanjie Ji Juelin Zhang Yu Yu

At CHES 2022, Wang et al. described a new paradigm for masked implementations using private circuits, where most intermediates can be precomputed before the input shares are accessed, significantly accelerating the online execution of masked functions. However, the masking scheme they proposed mainly featured (and was designed for) the cost amortization, leaving its (limited) suitability in the above precomputation-based paradigm just as a bonus. This paper aims to provide an efficient, reliable, easy-to-use, and precomputation-compatible masking scheme. We propose a new masked multiplication over the finite field Fq suitable for the precomputation, and prove its security in the composable notion called Probing-Isolating Non-Inference (PINI). Particularly, the operations (e.g., AND and XOR) in the binary field can be achieved by assigning q = 2, allowing the bitsliced implementation that has been shown to be quite efficient for the software implementations. The new masking scheme is applied to leverage the masking of AES and SKINNY block ciphers on ARM Cortex M architecture. The performance results show that the new scheme contributes to a significant speed-up compared with the state-of-the-art implementations. For SKINNY with block size 64, the speed and RAM requirement can be significantly improved (saving around 45% cycles in the online-computation and 60% RAM space for precomputed values) from AES-128, thanks to its smaller number of AND gates. Besides the security proof by hand, we provide formal verifications for the multiplication and T-test evaluations for the masked implementations of AES and SKINNY. Because of the structure of the new masked multiplication, our formal verification can be performed for security orders up to 16.

2022

TOSC

Towards Low-Latency Implementation of Linear Layers 📺 Abstract

Qun Liu Weijia Wang Yanhong Fan Lixuan Wu Ling Sun Meiqin Wang

Lightweight cryptography features a small footprint and/or low computational complexity. Low-cost implementations of linear layers usually play an important role in lightweight cryptography. Although it has been shown by Boyar et al. that finding the optimal implementation of a linear layer is a Shortest Linear Program (SLP) problem and NP-hard, there exist a variety of heuristic methods to search for near-optimal solutions. This paper considers the low-latency criteria and focuses on the heuristic search of lightweight implementation for linear layers. Most of the prior approach iteratively combines the inputs (of linear layers) to reach the output, which can be regarded as the forward search. To better adapt the low-latency criteria, we propose a new framework of backward search that attempts to iteratively split every output (into an XORing of two bits) until all inputs appear. By bounding the time of splitting, the new framework can find a sub-optimal solution with a minimized depth of circuits.We apply our new search algorithm to linear layers of block ciphers and find many low-latency candidates for implementations. Notably, for AES Mixcolumns, we provide an implementation with 103 XOR gates with a depth of 3, which is among the best hardware implementations of the AES linear layer. Besides, we obtain better implementations in XOR gates for 54.3% of 4256 Maximum Distance Separable (MDS) matrices proposed by Li et al. at FSE 2019. We also achieve an involutory MDS matrix (in M4(GL(8, F2))) whose implementation uses the lowest number (i.e., 86, saving 2 from the state-of-the-art result) of XORs with the minimum depth.

2022

TCHES

Side-Channel Masking with Common Shares Abstract

Weijia Wang Chun Guo Yu Yu Fanjie Ji Yang Su

To counter side-channel attacks, a masking scheme randomly encodes keydependent variables into several shares, and transforms operations into the masked correspondence (called gadget) operating on shares. This provably achieves the de facto standard notion of probing security.We continue the long line of works seeking to reduce the overhead of masking. Our main contribution is a new masking scheme over finite fields in which shares of different variables have a part in common. This enables the reuse of randomness / variables across different gadgets, and reduces the total cost of masked implementation. For security order d and circuit size l, the randomness requirement and computational complexity of our scheme are Õ(d2) and Õ(ld2) respectively, strictly improving upon the state-of-the-art Õ(d2) and Õ(ld3) of Coron et al. at Eurocrypt 2020.A notable feature of our scheme is that it enables a new paradigm in which many intermediates can be precomputed before executing the masked function. The precomputation consumes Õ(ld2) and produces Õ(ld) variables to be stored in RAM. The cost of subsequent (online) computation is reduced to Õ(ld), effectively speeding up e.g., challenge-response authentication protocols. We showcase our method on the AES on ARM Cortex M architecture and perform a T-test evaluation. Our results show a speed-up during the online phase compared with state-of-the-art implementations, at the cost of acceptable RAM consumption and precomputation time.To prove security for our scheme, we propose a new security notion intrinsically supporting randomness / variables reusing across gadgets, and bridging the security of parallel compositions of gadgets to general compositions, which may be of independent interest.

2022

TOSC

More Inputs Makes Difference: Implementations of Linear Layers Using Gates with More Than Two Inputs Abstract

Qun Liu Weijia Wang Ling Sun Yanhong Fan Lixuan Wu Meiqin Wang

Lightweight cryptography ensures cryptography applications to devices with limited resources. Low-area implementations of linear layers usually play an essential role in lightweight cryptography. The previous works have provided plenty of methods to generate low-area implementations using 2-input xor gates for various linear layers. However, it is still challenging to search for smaller implementations using two or more inputs xor gates. This paper, inspired by Banik et al., proposes a novel approach to construct a quantity of lower area implementations with (n + 1)- input gates based on the given implementations with n-input gates. Based on the novel algorithm, we present the corresponding search algorithms for n = 2 and n = 3, which means that we can efficiently convert an implementation with 2-input xor gates and 3-input xor gates to lower-area implementations with 3-input xor gates and 4-input xor gates, respectively.We improve the previous implementations of linear layers for many block ciphers according to the area with these search algorithms. For example, we achieve a better implementation with 4-input xor gates for AES MixColumns, which only requires 243 GE in the STM 130 nm library, while the previous public result is 258.9 GE. Besides, we obtain better implementations for all 5500 lightweight matrices proposed by Li et al. at FSE 2019, and the area for them is decreased by about 21% on average.

2021

TOSC

Provable Security of SP Networks with Partial Non-Linear Layers 📺 Abstract

Chun Guo François-Xavier Standaert Weijia Wang Xiao Wang Yu Yu

Motivated by the recent trend towards low multiplicative complexity blockciphers (e.g., Zorro, CHES 2013; LowMC, EUROCRYPT 2015; HADES, EUROCRYPT 2020; MALICIOUS, CRYPTO 2020), we study their underlying structure partial SPNs, i.e., Substitution-Permutation Networks (SPNs) with parts of the substitution layer replaced by an identity mapping, and put forward the first provable security analysis for such partial SPNs built upon dedicated linear layers. For different instances of partial SPNs using MDS linear layers, we establish strong pseudorandom security as well as practical provable security against impossible differential attacks. By extending the well-established MDS code-based idea, we also propose the first principled design of linear layers that ensures optimal differential propagation. Our results formally confirm the conjecture that partial SPNs achieve the same security as normal SPNs while consuming less non-linearity, in a well-established framework.

2020

TOSC

Efficient Side-Channel Secure Message Authentication with Better Bounds 📺 Abstract

Chun Guo François-Xavier Standaert Weijia Wang Yu Yu

We investigate constructing message authentication schemes from symmetric cryptographic primitives, with the goal of achieving security when most intermediate values during tag computation and verification are leaked (i.e., mode-level leakage-resilience). Existing efficient proposals typically follow the plain Hash-then-MAC paradigm T = TGenK(H(M)). When the domain of the MAC function TGenK is {0, 1}128, e.g., when instantiated with the AES, forgery is possible within time 264 and data complexity 1. To dismiss such cheap attacks, we propose two modes: LRW1-based Hash-then-MAC (LRWHM) that is built upon the LRW1 tweakable blockcipher of Liskov, Rivest, and Wagner, and Rekeying Hash-then-MAC (RHM) that employs internal rekeying. Built upon secure AES implementations, LRWHM is provably secure up to (beyond-birthday) 278.3 time complexity, while RHM is provably secure up to 2121 time. Thus in practice, their main security threat is expected to be side-channel key recovery attacks against the AES implementations. Finally, we benchmark the performance of instances of our modes based on the AES and SHA3 and confirm their efficiency.

2020

TCHES

Efficient and Private Computations with Code-Based Masking 📺 Abstract

Weijia Wang Pierrick Méaux Gaëtan Cassiers François-Xavier Standaert

Code-based masking is a very general type of masking scheme that covers Boolean masking, inner product masking, direct sum masking, and so on. The merits of the generalization are twofold. Firstly, the higher algebraic complexity of the sharing function decreases the information leakage in “low noise conditions” and may increase the “statistical security order” of an implementation (with linear leakages). Secondly, the underlying error-correction codes can offer improved fault resistance for the encoded variables. Nevertheless, this higher algebraic complexity also implies additional challenges. On the one hand, a generic multiplication algorithm applicable to any linear code is still unknown. On the other hand, masking schemes with higher algebraic complexity usually come with implementation overheads, as for example witnessed by inner-product masking. In this paper, we contribute to these challenges in two directions. Firstly, we propose a generic algorithm that allows us (to the best of our knowledge for the first time) to compute on data shared with linear codes. Secondly, we introduce a new amortization technique that can significantly mitigate the implementation overheads of code-based masking, and illustrate this claim with a case study. Precisely, we show that, although performing every single code-based masked operation is relatively complex, processing multiple secrets in parallel leads to much better performances. This property enables code-based masked implementations of the AES to compete with the state-of-the-art in randomness complexity. Since our masked operations can be instantiated with various linear codes, we hope that these investigations open new avenues for the study of code-based masking schemes, by specializing the codes for improved performances, better side-channel security or improved fault tolerance.

2020

TOSC

Beyond-Birthday-Bound Security for 4-round Linear Substitution-Permutation Networks 📺 Abstract

Yuan Gao Chun Guo Meiqin Wang Weijia Wang Jiejing Wen

Recent works of Cogliati et al. (CRYPTO 2018) have initiated provable treatments of Substitution-Permutation Networks (SPNs), one of the most popular approach to construct modern blockciphers. Such theoretical SPN models may employ non-linear diffusion layers, which enables beyond-birthday-bound provable security. Though, for the model of real world blockciphers, i.e., SPN models with linear diffusion layers, existing provable results are capped at birthday security up to $2^{n/2}$ adversarial queries, where $n$ is the size of the idealized S-boxes. In this paper, we overcome this birthday barrier and prove that a 4-round SPN with linear diffusion layers and independent round keys is secure up to $2^{2n/3}$ queries. For this, we identify conditions on the linear layers that are sufficient for such security, which, unsurprisingly, turns out to be slightly stronger than Cogliati et al.'s conditions for birthday security. These provides additional theoretic supports for real world SPN blockciphers.

2020

ASIACRYPT

Packed Multiplication: How to Amortize the Cost of Side-channel Masking? 📺 Abstract

Weijia Wang Chun Guo François-Xavier Standaert Yu Yu Gaëtan Cassiers

Higher-order masking countermeasures provide strong provable security against side-channel attacks at the cost of incurring significant overheads, which largely hinders its applicability. Previous works towards remedying cost mostly concentrated on ``local'' calculations, i.e., optimizing the cost of computation units such as a single AND gate or a field multiplication. This paper explores a complementary ``global'' approach, i.e., considering multiple operations in the masked domain as a batch and reducing randomness and computational cost via amortization. In particular, we focus on the amortization of $\ell$ parallel field multiplications for appropriate integer $\ell > 1$, and design a kit named {\it packed multiplication} for implementing such a batch. Higher-order masking countermeasures provide strong provable security against side-channel attacks at the cost of incurring significant overheads, which largely hinders its applicability. Previous works towards remedying cost mostly concentrated on ``local'' calculations, i.e., optimizing the cost of computation units such as a single AND gate or a field multiplication. This paper explores a complementary ``global'' approach, i.e., considering multiple operations in the masked domain as a batch and reducing randomness and computational cost via amortization. In particular, we focus on the amortization of $\ell$ parallel field multiplications for appropriate integer $\ell > 1$, and design a kit named {\it packed multiplication} for implementing such a batch. For $\ell+d\leq2^m$, when $\ell$ parallel multiplications over $\mathbb{F}_{2^{m}}$ with $d$-th order probing security are implemented, packed multiplication consumes $d^2+2\ell d + \ell$ bilinear multiplications and $2d^2 + d(d+1)/2$ random field variables, outperforming the state-of-the-art results with $O(\ell d^2)$ multiplications and $\ell \left \lfloor d^2/4\right \rfloor + \ell d$ randomness. To prove $d$-probing security for packed multiplications, we introduce some weaker security notions for multiple-inputs-multiple-outputs gadgets and use them as intermediate steps, which may be of independent interest. As parallel field multiplications exist almost everywhere in symmetric cryptography, lifting optimizations from ``local'' to ``global'' substantially enlarges the space of improvements. To demonstrate, we showcase the method on the AES Subbytes step, GCM and TET (a popular disk encryption). Notably, when $d=8$, our implementation of AES Subbytes in ARM Cortex M architecture achieves a gain of up to $33\%$ in total speeds and saves up to $68\%$ random bits than the state-of-the-art bitsliced implementation reported at ASIACRYPT~2018.

2015

CHES

Evaluation and Improvement of Generic-Emulating DPA Attacks

Weijia Wang Yu Yu Junrong Liu Zheng Guo François-Xavier Standaert Dawu Gu Sen Xu Rong Fu