## CryptoDB

### Patrick Longa

#### Publications

Year
Venue
Title
2018
TCHES
We present high-speed implementations of the post-quantum supersingular isogeny Diffie-Hellman key exchange (SIDH) and the supersingular isogeny key encapsulation (SIKE) protocols for 32-bit ARMv7-A processors with NEON support. The high performance of our implementations is mainly due to carefully optimized multiprecision and modular arithmetic that finely integrates both ARM and NEON instructions in order to reduce the number of pipeline stalls and memory accesses, and a new Montgomery reduction technique that combines the use of the UMAAL instruction with a variant of the hybrid-scanning approach. In addition, we present efficient implementations of SIDH and SIKE for 64-bit ARMv8-A processors, based on a high-speed Montgomery multiplication that leverages the power of 64-bit instructions. Our experimental results consolidate the practicality of supersingular isogeny-based protocols for many real-world applications. For example, a full key-exchange execution of SIDHp503 is performed in about 176 million cycles on an ARM Cortex-A15 from the ARMv7-A family (i.e., 88 milliseconds @2.0GHz). On an ARM Cortex-A72 from the ARMv8-A family, the same operation can be carried out in about 90 million cycles (i.e., 45 milliseconds @1.992GHz). All our software is protected against timing and cache attacks. The techniques for modular multiplication presented in this work have broad applications to other cryptographic schemes.
2017
EUROCRYPT
2017
CHES
This work deals with the energy-efficient, high-speed and high-security implementation of elliptic curve scalar multiplication and elliptic curve Diffie-Hellman (ECDH) key exchange on embedded devices using Four$\mathbb {Q}$ and incorporating strong countermeasures to thwart a wide variety of side-channel attacks. First, we set new speed records for constant-time curve-based scalar multiplication and DH key exchange at the 128-bit security level with implementations targeting 8, 16 and 32-bit microcontrollers. For example, our software computes a static ECDH shared secret in $\sim$6.9 million cycles (or 0.86 s @8 MHz) on a low-power 8-bit AVR microcontroller which, compared to the fastest Curve25519 and genus-2 Kummer implementations on the same platform, offers 2$\times$ and 1.4$\times$ speedups, respectively. Similarly, it computes the same operation in $\sim$496 thousand cycles on a 32-bit ARM Cortex-M4 microcontroller, achieving a factor-2.9 speedup when compared to the fastest Curve25519 implementation targeting the same platform. Second, we engineer a set of side-channel countermeasures taking advantage of Four$\mathbb {Q}$’s rich arithmetic and propose a secure implementation that offers protection against a wide range of sophisticated side-channel attacks. Finally, we perform a differential power analysis evaluation of our software running on an ARM Cortex-M4, and report that no leakage was detected with up to 10 million traces. These results demonstrate the potential of deploying Four$\mathbb {Q}$ on low-power applications such as protocols for IoT.
2016
CRYPTO
2016
CHES
2015
EPRINT
2015
ASIACRYPT
2014
EPRINT
2014
JOFC
2012
ASIACRYPT
2011
EUROCRYPT
2010
CHES
2010
EPRINT
In this paper, a thorough bottom-up optimization process (field, point and scalar arithmetic) is used to speed up the computation of elliptic curve point multiplication and report new speed records on modern x86-64 based processors. Our different implementations include elliptic curves using Jacobian coordinates, extended Twisted Edwards coordinates and the recently proposed Galbraith-Lin-Scott (GLS) method. Compared to state-of-the-art implementations on identical platforms the proposed techniques provide up to 30% speed improvements. Additionally, compared to the best previous published results on similar platforms improvements up to 31% are observed. This research is crucial for advancing high speed cryptography on new emerging processor architectures.
2010
EPRINT
In this work, we analyze and present experimental data evaluating the efficiency of several techniques for speeding up the computation of elliptic curve point multiplication on emerging x86-64 processor architectures. In particular, we study the efficient combination of such techniques as elimination of conditional branches and incomplete reduction to achieve fast field arithmetic over GF(p). Furthermore, we study the impact of (true) data dependencies on these processors and propose several generic techniques to reduce the number of pipeline stalls, memory reads/writes and function calls. We also extend these techniques to field arithmetic over GF(p^2), which is utilized as underlying field by the recently proposed Galbraith-Lin-Scott (GLS) method to achieve higher performance in the point multiplication. By efficiently combining all these methods with state-of-the-art elliptic curve algorithms we obtain high-speed implementations of point multiplication that are up to 31% faster than the best previous published results on similar platforms. This research is crucial for advancing high-speed cryptography on new emerging processor architectures.
2009
PKC
2008
PKC
2008
EPRINT
We present a new methodology to derive faster composite operations of the form dP+Q, where d is a small integer >= 2, for generic ECC scalar multiplications over prime fields. In particular, we present an efficient Doubling-Addition (DA) operation that can be exploited to accelerate most scalar multiplication methods, including multiscalar variants. We also present a new precomputation scheme useful for window-based scalar multiplications that is shown to achieve the lowest cost among all known methods using only one inversion. In comparison to the remaining approaches that use none or several inversions, our scheme offers higher performance for most common I/M ratios. By combining the benefits of our precomputation scheme and the new DA operation, we can save up to 6.2% in the scalar multiplication using fractional wNAF.
2008
EPRINT
In this paper we present a new method for scalar multiplication that uses a generic multibase representation to reduce the number of required operations. Further, a multibase NAF-like algorithm that efficiently converts numbers to such representation without impacting memory or speed performance is developed and showed to be sublinear in terms of the number of nonzero terms. Additional representation reductions are discussed with the introduction of window-based variants that use an extended set of precomputations. To realize the proposed multibase scalar multiplication with or without precomputations in the setting of Elliptic Curve Cryptosystems (ECC) over prime fields, we also present a methodology to derive fast composite operations such as tripling or quintupling of a point that require less memory than previous point formulae. Point operations are then protected against simple side-channel attacks using a highly efficient atomic structure. Extensive testing is carried out to show that our multibase scalar multiplication is the fastest method to date in the setting of ECC and exhibits a small footprint, which makes it ideal for implementation on constrained devices.
2008
EPRINT
Elliptic curve cryptography (ECC), independently introduced by Koblitz and Miller in the 80's, has attracted increasing attention in recent years due to its shorter key length requirement in comparison with other public-key cryptosystems such as RSA. Shorter key length means reduced power consumption and computing effort, and less storage requirement, factors that are fundamental in ubiquitous portable devices such as PDAs, cellphones, smartcards, and many others. To that end, a lot of research has been carried out to speed-up and improve ECC implementations, mainly focusing on the most important and time-consuming ECC operation: scalar multiplication. In this thesis, we focus in optimizing such ECC operation at the point and scalar arithmetic levels, specifically targeting standard curves over prime fields. At the point arithmetic level, we introduce two innovative methodologies to accelerate ECC formulae: the use of new composite operations, which are built on top of basic point doubling and addition operations; and the substitution of field multiplications by squarings and other cheaper operations. These techniques are efficiently exploited, individually or jointly, in several contexts: to accelerate computation of scalar multiplications, and the computation of pre-computed points for window-based scalar multiplications (up to 30% improvement in comparison with previous best method); to speed-up computations of simple side-channel attack (SSCA)-protected implementations using innovative atomic structures (up to 22% improvement in comparison with scalar multiplication using original atomic structures); and to develop parallel formulae for SIMD-based applications, which are able to execute three and four operations simultaneously (up to 72% of improvement in comparison with a sequential scalar multiplication). At the scalar arithmetic level, we develop new sublinear (in terms of Hamming weight) multibase scalar multiplications based on NAF-like conversion algorithms that are shown to be faster than any previous scalar multiplication method. For instance, proposed multibase scalar multiplications reduce computing times in 10.9% and 25.3% in comparison with traditional NAF for unprotected and SSCA-protected scenarios, respectively. Moreover, our conversion algorithms overcome the problem of converting any integer to multibase representation, solving an open problem that was defined as hard. Thus, our algorithms make the use of multiple bases practical for applications as ECC scalar multiplication for first time.
2008
EPRINT
We present an innovative technique to add elliptic curve points with the form P+-Q, and discuss its application to the generation of precomputed tables for the scalar multiplication. Our analysis shows that the proposed schemes offer, to the best of our knowledge, the lowest costs for precomputing points on both single and multiple scalar multiplication and for various elliptic curve forms, including the highly efficient Jacobi quartics and Edwards curves.
2008
EPRINT
In this paper, we introduce the Fractional Window-w Multibase Non-Adjacent Form (Frac-wmbNAF) method to perform the scalar multiplication. This method generalizes the recently developed Window-w mbNAF (wmbNAF) method by allowing an unrestricted number of precomputed points. We then make a comprehensive analysis of the most recent and relevant methods existent in the literature for the ECC scalar multiplication, including the presented generalization and its original non-window version known as Multibase Non-Adjacent Form (mbNAF). Moreover, we present new improvements in the point operation formulae. Specifically, we reduce further the cost of composite operations such as doubling-addition, tripling, quintupling and septupling of a point, which are relevant for the speed up of methods using multiple bases. Following, we also analyze the precomputation stage in scalar multiplications and present efficient schemes for the different studied scenarios. Our analysis includes the standard elliptic curves using Jacobian coordinates, and also Edwards curves, which are gaining growing attention due to their high performance. We demonstrate with extensive tests that mbNAF is currently the most efficient method without precomputations not only for the standard curves but also for the faster Edwards form. Similarly, Frac-wmbNAF is shown to attain the highest performance among window-based methods for all the studied curve forms.

CHES 2019
CHES 2018