IACR News item: 03 October 2025
Minjoo Sim, Hyunjun Kim, Minwoo Lee, Hwajeong Seo
Polynomial multiplication over $\mathbb{F}_2[x]$ is a fundamental building block in code-based and lattice-based cryptography, particularly on lightweight embedded devices where dedicated carry-less multiply instructions are unavailable. This paper presents a high-speed, constant-time implementation of radix-16 polynomial multiplication on the ARM Cortex-M4, combining zero-padding with recursive Karatsuba layers. Building on the radix-16 decomposition proposed by Chen et al. in TCHES’21, we replace the conventional schoolbook inner multiplier with a multi-level Karatsuba scheme. This optimization reduces cycle counts on the ARM Cortex-M4 while preserving constant-time execution. To further optimize efficiency, the design minimizes packing and unpacking overhead by operating at 128-bit granularity and employs a five-stage pipeline—Decomposition, Padding, Multiplication, Unpadding, and Reassembly—implemented entirely with data-independent shifts, XORs, and masks. Experimental results on the Cortex-M4 show that our optimized $ct\_poly32\_mul\_64\_bit$ implementation achieves up to 31\% improvement over the best existing constant-time baseline, demonstrating the efficiency and scalability of recursive Karatsuba for resource-constrained cryptographic applications.
Additional news items may be found on the IACR news page.