Towards Low-Latency Implementation of Linear Layers 📺
Lightweight cryptography features a small footprint and/or low computational complexity. Low-cost implementations of linear layers usually play an important role in lightweight cryptography. Although it has been shown by Boyar et al. that finding the optimal implementation of a linear layer is a Shortest Linear Program (SLP) problem and NP-hard, there exist a variety of heuristic methods to search for near-optimal solutions. This paper considers the low-latency criteria and focuses on the heuristic search of lightweight implementation for linear layers. Most of the prior approach iteratively combines the inputs (of linear layers) to reach the output, which can be regarded as the forward search. To better adapt the low-latency criteria, we propose a new framework of backward search that attempts to iteratively split every output (into an XORing of two bits) until all inputs appear. By bounding the time of splitting, the new framework can find a sub-optimal solution with a minimized depth of circuits.We apply our new search algorithm to linear layers of block ciphers and find many low-latency candidates for implementations. Notably, for AES Mixcolumns, we provide an implementation with 103 XOR gates with a depth of 3, which is among the best hardware implementations of the AES linear layer. Besides, we obtain better implementations in XOR gates for 54.3% of 4256 Maximum Distance Separable (MDS) matrices proposed by Li et al. at FSE 2019. We also achieve an involutory MDS matrix (in M4(GL(8, F2))) whose implementation uses the lowest number (i.e., 86, saving 2 from the state-of-the-art result) of XORs with the minimum depth.
More Inputs Makes Difference: Implementations of Linear Layers Using Gates with More Than Two Inputs
Lightweight cryptography ensures cryptography applications to devices with limited resources. Low-area implementations of linear layers usually play an essential role in lightweight cryptography. The previous works have provided plenty of methods to generate low-area implementations using 2-input xor gates for various linear layers. However, it is still challenging to search for smaller implementations using two or more inputs xor gates. This paper, inspired by Banik et al., proposes a novel approach to construct a quantity of lower area implementations with (n + 1)- input gates based on the given implementations with n-input gates. Based on the novel algorithm, we present the corresponding search algorithms for n = 2 and n = 3, which means that we can efficiently convert an implementation with 2-input xor gates and 3-input xor gates to lower-area implementations with 3-input xor gates and 4-input xor gates, respectively.We improve the previous implementations of linear layers for many block ciphers according to the area with these search algorithms. For example, we achieve a better implementation with 4-input xor gates for AES MixColumns, which only requires 243 GE in the STM 130 nm library, while the previous public result is 258.9 GE. Besides, we obtain better implementations for all 5500 lightweight matrices proposed by Li et al. at FSE 2019, and the area for them is decreased by about 21% on average.