Patrick Longa

CryptoDB

Patrick Longa

Publications and invited talks

Year

Venue

Title

2023

EUROCRYPT

New algorithms for the Deuring correspondence: Towards practical and secure SQISign signatures Abstract

Luca De Feo Antonin Leroux Patrick Longa Benjamin Wesolowski

The Deuring correspondence defines a bijection between isogenies of supersingular elliptic curves and ideals of maximal orders in a quaternion algebra. We present a new algorithm to translate ideals of prime-power norm to their corresponding isogenies --- a central task of the effective Deuring correspondence. The new method improves upon the algorithm introduced in 2021 by De Feo, Kohel, Leroux, Petit and Wesolowski as a building-block of the SQISign signature scheme. SQISign is the most compact post-quantum signature scheme currently known, but is several orders of magnitude slower than competitors, the main bottleneck of the computation being the ideal-to-isogeny translation. We implement the new algorithm and apply it to SQISign, achieving a more than two-fold speedup in key generation and signing with a new choice of parameter. Moreover, after adapting the state-of-the-art GF(p^2) multiplication algorithms by Longa to implement SQISign's underlying extension field arithmetic and adding various improvements, we push the total speedups to over three times for signing and four times for verification. In a second part of the article, we advance cryptanalysis by showing a very simple distinguisher against one of the assumptions used in SQISign. We present a way to impede the distinguisher through a few changes to the generic KLPT algorithm. We formulate a new assumption capturing these changes, and provide an analysis together with experimental evidence for its validity.

2023

TCHES

Efficient Algorithms for Large Prime Characteristic Fields and Their Application to Bilinear Pairings Abstract

Patrick Longa

We propose a novel approach that generalizes interleaved modular multiplication algorithms for the computation of sums of products over large prime fields. This operation has widespread use and is at the core of many cryptographic applications. The method reformulates the widely used lazy reduction technique, crucially avoiding the need for storage and computation of “double-precision” operations. Moreover, it can be easily adapted to the different methods that exist to compute modular multiplication, producing algorithms that are significantly more efficient and memory-friendly. We showcase the performance of the proposed approach in the computation of multiplication over an extension field Fpk , and demonstrate its impact with record-breaking implementations of bilinear pairings. Specifically, we accomplish a full optimal ate pairing computation over the popular BLS12-381 curve, designed for the 128-bit security level, in under half a millisecond on a 3.2GHz Intel Coffee Lake processor, which is about 1.40× faster than the state-of-the-art. Similarly, we perform the same computation over the BLS24-509 curve, targeting the 192-bit security level, in ~ 2.6 milliseconds, achieving a speedup of more than 1.30x. We also report a significant impact on other applications, including protocols based on supersingular isogenies.

2021

CRYPTO

The Cost to Break SIKE: A Comparative Hardware-Based Analysis with AES and SHA-3 📺 Abstract

Patrick Longa Wen Wang Jakub Szefer

This work presents a detailed study of the classical security of the post-quantum supersingular isogeny key encapsulation (SIKE) protocol using a realistic budget-based cost model that considers the actual computing and memory costs that are needed for cryptanalysis. In this effort, we design especially-tailored hardware accelerators for the time-critical multiplication and isogeny computations that we use to model an ASIC-powered instance of the van Oorschot-Wiener (vOW) parallel collision search algorithm. We then extend the analysis to AES and SHA-3 in the context of the NIST post-quantum cryptography standardization process to carry out a parameter analysis based on our cost model. This analysis, together with the state-of-the-art quantum security analysis of SIKE, indicates that the current SIKE parameters offer higher practical security than currently believed, closing an open issue on the suitability of the parameters to match NIST's security levels. In addition, we explore the possibility of using significantly smaller primes to enable more efficient and compact implementations with reduced bandwidth. Our improved cost model and analysis can be applied to other cryptographic settings and primitives, and can have implications for other post-quantum candidates in the NIST process.

2020

TCHES

A Compact and Scalable Hardware/Software Co-design of SIKE 📺 Abstract

Pedro Maat C. Massolino Patrick Longa Joost Renes Lejla Batina

We present efficient and compact hardware/software co-design implementations of the Supersingular Isogeny Key Encapsulation (SIKE) protocol on field-programmable gate arrays (FPGAs). In order to be better equipped for different post-quantum scenarios, our architectures were designed to feature high-flexibility by covering all the currently available parameter sets and with support for primes up to 1016 bits. In particular, any of the current SIKE parameters equivalent to the post-quantum security of AES-128/192/256 and SHA3-256 can be selected and run on-the-fly. This security scalability property, together with the small footprint and efficiency of our architectures, makes them ideal for embedded applications in a post-quantum world. In addition, the proposed implementations exhibit regular, constant-time execution, which provides protection against timing and simple sidechannel attacks. Our results demonstrate that supersingular isogeny-based primitives such as SIDH and SIKE can indeed be deployed for embedded applications featuring competitive performance. For example, our smallest architecture based on a 128-bit MAC unit takes only 3415 slices, 21 BRAMs and 57 DSPs on a Virtex 7 690T and can perform key generation, encapsulation and decapsulation in 14.4, 24.4 and 26.0 milliseconds for SIKEp434 and in 52.3, 86.4 and 93.2 milliseconds for SIKEp751, respectively.

2020

PKC

Improved Classical Cryptanalysis of SIKE in Practice 📺 Abstract

Craig Costello Patrick Longa Michael Naehrig Joost Renes Fernando Virdia

The main contribution of this work is an optimized implementation of the van Oorschot-Wiener (vOW) parallel collision finding algorithm. As is typical for cryptanalysis against conjectured hard problems (e. g. factoring or discrete logarithms), challenges can arise in the implementation that are not captured in the theory, making the performance of the algorithm in practice a crucial element of estimating security. We present a number of novel improvements, both to generic instantiations of the vOW algorithm finding collisions in arbitrary functions, and to its instantiation in the context of the supersingular isogeny key encapsulation (SIKE) protocol, that culminate in an improved classical cryptanalysis of the computational supersingular isogeny (CSSI) problem. In particular, we present a scalable implementation that can be applied to the Round-2 parameter sets of SIKE that can be used to give confidence in their security levels.

2020

TCHES

Parameterized Hardware Accelerators for Lattice-Based Cryptography and Their Application to the HW/SW Co-Design of qTESLA 📺 Abstract

Wen Wang Shanquan Tian Bernhard Jungk Nina Bindel Patrick Longa Jakub Szefer

This paper presents a set of efficient and parameterized hardware accelerators that target post-quantum lattice-based cryptographic schemes, including a versatile cSHAKE core, a binary-search CDT-based Gaussian sampler, and a pipelined NTT-based polynomial multiplier, among others. Unlike much of prior work, the accelerators are fully open-sourced, are designed to be constant-time, and can be parameterized at compile-time to support different parameters without the need for re-writing the hardware implementation. These flexible, publicly-available accelerators are leveraged to demonstrate the first hardware-software co-design using RISC-V of the post-quantum lattice-based signature scheme qTESLA with provably secure parameters. In particular, this work demonstrates that the NIST’s Round 2 level 1 and level 3 qTESLA variants achieve over a 40-100x speedup for key generation, about a 10x speedup for signing, and about a 16x speedup for verification, compared to the baseline RISC-V software-only implementation. For instance, this corresponds to execution in 7.7, 34.4, and 7.8 milliseconds for key generation, signing, and verification, respectively, for qTESLA’s level 1 parameter set on an Artix-7 FPGA, demonstrating the feasibility of the scheme for embedded applications.

2018

TCHES

SIDH on ARM: Faster Modular Multiplications for Faster Post-Quantum Supersingular Isogeny Key Exchange Abstract

Hwajeong Seo Zhe Liu Patrick Longa Zhi Hu

We present high-speed implementations of the post-quantum supersingular isogeny Diffie-Hellman key exchange (SIDH) and the supersingular isogeny key encapsulation (SIKE) protocols for 32-bit ARMv7-A processors with NEON support. The high performance of our implementations is mainly due to carefully optimized multiprecision and modular arithmetic that finely integrates both ARM and NEON instructions in order to reduce the number of pipeline stalls and memory accesses, and a new Montgomery reduction technique that combines the use of the UMAAL instruction with a variant of the hybrid-scanning approach. In addition, we present efficient implementations of SIDH and SIKE for 64-bit ARMv8-A processors, based on a high-speed Montgomery multiplication that leverages the power of 64-bit instructions. Our experimental results consolidate the practicality of supersingular isogeny-based protocols for many real-world applications. For example, a full key-exchange execution of SIDHp503 is performed in about 176 million cycles on an ARM Cortex-A15 from the ARMv7-A family (i.e., 88 milliseconds @2.0GHz). On an ARM Cortex-A72 from the ARMv8-A family, the same operation can be carried out in about 90 million cycles (i.e., 45 milliseconds @1.992GHz). All our software is protected against timing and cache attacks. The techniques for modular multiplication presented in this work have broad applications to other cryptographic schemes.

2017

EUROCRYPT

Efficient Compression of SIDH Public Keys

Craig Costello David Jao Patrick Longa Michael Naehrig Joost Renes David Urbanik

2017

CHES

Four$\mathbb {Q}$ on Embedded Devices with Strong Countermeasures Against Side-Channel Attacks Abstract

Zhe Liu Patrick Longa Geovandro C. C. F. Pereira Oscar Reparaz Hwajeong Seo

This work deals with the energy-efficient, high-speed and high-security implementation of elliptic curve scalar multiplication and elliptic curve Diffie-Hellman (ECDH) key exchange on embedded devices using Four$$\mathbb {Q}$$ and incorporating strong countermeasures to thwart a wide variety of side-channel attacks. First, we set new speed records for constant-time curve-based scalar multiplication and DH key exchange at the 128-bit security level with implementations targeting 8, 16 and 32-bit microcontrollers. For example, our software computes a static ECDH shared secret in $$\sim $$6.9 million cycles (or 0.86 s @8 MHz) on a low-power 8-bit AVR microcontroller which, compared to the fastest Curve25519 and genus-2 Kummer implementations on the same platform, offers 2$$\times $$ and 1.4$$\times $$ speedups, respectively. Similarly, it computes the same operation in $$\sim $$496 thousand cycles on a 32-bit ARM Cortex-M4 microcontroller, achieving a factor-2.9 speedup when compared to the fastest Curve25519 implementation targeting the same platform. Second, we engineer a set of side-channel countermeasures taking advantage of Four$$\mathbb {Q}$$’s rich arithmetic and propose a secure implementation that offers protection against a wide range of sophisticated side-channel attacks. Finally, we perform a differential power analysis evaluation of our software running on an ARM Cortex-M4, and report that no leakage was detected with up to 10 million traces. These results demonstrate the potential of deploying Four$$\mathbb {Q}$$ on low-power applications such as protocols for IoT.

2016

CRYPTO

Efficient Algorithms for Supersingular Isogeny Diffie-Hellman 📺

Craig Costello Patrick Longa Michael Naehrig

2016

CHES

Four ℚ on FPGA: New Hardware Speed Records for Elliptic Curve Cryptography over Large Prime Characteristic Fields 📺

Kimmo Järvinen Andrea Miele Reza Azarderakhsh Patrick Longa

2015

ASIACRYPT

Fourℚ: Four-Dimensional Decompositions on a ℚ-curve over the Mersenne Prime

Craig Costello Patrick Longa