A Faster Third-Order Masking of Lookup Tables
Masking of S-boxes using lookup tables is an effective countermeasure to thwart side-channel attacks on block ciphers implemented in software. At first and second orders, the Table-based Masking (TBM) schemes can be very efficient and even faster than circuit-based masking schemes. Ever since the customised second-order TBM schemes were proposed, the focus has been on designing and optimising Higher-Order Table-based Masking (HO-TBM) schemes that facilitate masking at arbitrary order. One of the reasons for this trend is that at large orders HO-TBM schemes are significantly slower and consume a prohibitive amount of RAM memory compared to circuit-based masking schemes such as bit-sliced masking, and hence efforts were targeted in this direction. However, a recent work due to Valiveti and Vivek (TCHES 2021) has demonstrated that the HO-TBM scheme of Coron et al. (TCHES 2018) is feasible to be implemented on memory-constrained devices with pre-processing capability and a competitive online execution time. Yet, currently, there are no customised designs for third-order TBM that are more efficient than instantiating a HO-TBM scheme at third order.In this work, we propose a third-order TBM scheme for arbitrary S-boxes that is secure in the probing model and under compositions, i.e., 3-SNI secure. It is very efficient in terms of the overall running time, compared to the third-order instantiations of state-of-the-art HO-TBM schemes. It also supports the pre-processing functionality. For example, the overall running time of a single execution of the third-order masked AES-128 on a 32-bit ARM-Cortex M4 micro-controller is reduced by about 80% without any overhead on the online execution time. This implies that the online execution time of the proposed scheme is approximately eight times faster than the bit-sliced masked implementation at third order, and it is comparable to the recent scheme of Wang et al. (TCHES 2022) that makes use of reuse of shares. We also present the implementation results for the third-order masked PRESENT cipher. Our work suggests that there is a significant scope for tuning the performance of HO-TBM schemes at lower orders.
Higher-Order Lookup Table Masking in Essentially Constant Memory 📺
Masking using randomised lookup tables is a popular countermeasure for side-channel attacks, particularly at small masking orders. An advantage of this class of countermeasures for masking S-boxes compared to ISW-based masking is that it supports pre-processing and thus significantly reducing the amount of computation to be done after the unmasked inputs are available. Indeed, the “online” computation can be as fast as just a table lookup. But the size of the randomised lookup table increases linearly with the masking order, and hence the RAM memory required to store pre-processed tables becomes infeasible for higher masking orders. Hence demonstrating the feasibility of full pre-processing of higher-order lookup table-based masking schemes on resource-constrained devices has remained an open problem. In this work, we solve the above problem by implementing a higher-order lookup table-based scheme using an amount of RAM memory that is essentially independent of the masking order. More concretely, we reduce the amount of RAM memory needed for the table-based scheme of Coron et al. (TCHES 2018) approximately by a factor equal to the number of shares. Our technique is based upon the use of pseudorandom number generator (PRG) to minimise the randomness complexity of ISW-based masking schemes proposed by Ishai et al. (ICALP 2013) and Coron et al. (Eurocrypt 2020). Hence we show that for lookup table-based masking schemes, the use of a PRG not only reduces the randomness complexity (now logarithmic in the size of the S-box) but also the memory complexity, and without any significant increase in the overall running time. We have implemented in software the higher-order table-based masking scheme of Coron et al. (TCHES 2018) at tenth order with full pre-processing of a single execution of all the AES S-boxes on a ARM Cortex-M4 device that has 256 KB RAM memory. Our technique requires only 41.2 KB of RAM memory, whereas the original scheme would have needed 440 KB. Moreover, our 8-bit implementation results demonstrate that the online execution time of our variant is about 1.5 times faster compared to the 8-bit bitsliced masked implementation of AES-128.
Second-Order Masked Lookup Table Compression Scheme 📺
Masking by lookup table randomisation is a well-known technique used to achieve side-channel attack resistance for software implementations, particularly, against DPA attacks. The randomised table technique for first- and second-order security requires about m•2n bits of RAM to store an (n,m)-bit masked S-box lookup table. Table compression helps in reducing the amount of memory required, and this is useful for highly resource-constrained IoT devices. Recently, Vadnala (CT-RSA 2017) proposed a randomised table compression scheme for first- and second-order security in the probing leakage model. This scheme reduces the RAM memory required by about a factor of 2l, where l is a compression parameter. Vivek (Indocrypt 2017) demonstrated an attack against the second-order scheme of Vadnala. Hence achieving table compression at second and higher orders is an open problem.In this work, we propose a second-order secure randomised table compression scheme which works for any (n,m)-bit S-box. Our proposal is a variant of Vadnala’s scheme that is not only secure but also significantly improves the time-memory trade-off. Specifically, we improve the online execution time by a factor of 2n−l. Our proposed scheme is proved 2-SNI secure in the probing leakage model. We have implemented our method for AES-128 on a 32-bit ARM Cortex processor. We are able to reduce the memory required to store a randomised S-box table for second-order AES-128 implementation to 59 bytes.