



# A Reconfigurable System on Chip Implementation for Elliptic Curve Cryptography over GF(2<sup>n</sup>)

Michael Jung<sup>1</sup>, M. Ernst<sup>1</sup>, F. Madlener<sup>1</sup>, S. Huss<sup>1</sup>, R. Blümel<sup>2</sup>

<sup>1</sup> Integrated Circuits and Systems Lab Computer Science Department Darmstadt University of Technology Germany

<sup>2</sup> cv cryptovision GmbHGelsenkirchenGermany



# ECC over GF(2<sup>n</sup>)

- GF(2<sup>n</sup>) operations implemented in HW
  - Square, Multiply, Add
  - Inversion with Fermat's little Theorem
  - Polynomial Base
  - Multiplier based on a novel, generalized version of the Karatsuba Ofman Algorithm<sup>1</sup>
- EC level algorithms implemented in SW
  - 2P Algorithm<sup>2</sup> for EC point multiplication
  - Projective Point Coordinates
- Algorithmic flexibility combined with performance



#### **Reconfigurable System on Chip**

• 8-Bit RISC Microcontroller

• 20+ MIPS @ 25 MHz



# **Reconfigurable System on Chip**

8-Bit RISC Microcontroller





#### **Reconfigurable System on Chip**

8-Bit RISC Microcontroller

• FPGA

36 kByte RAM

Dual Ported

 Simultaneously accessible by MCU and FPGA





# **Reconfigurable System on Chip**

• 8-Bit RISC Microcontroller





#### **Reconfigurable System on Chip**

- 8-Bit RISC Microcontroller
- FPGA
- 36 kByte RAM
- Peripherals
- Low cost, low complexity device





#### **Polynomial Karatsuba Multiplication**

$$A = \bigoplus_{i=0}^{n-1} a_i x^i \qquad B = \bigoplus_{i=0}^{n-1} b_i x^i$$

$$A \cdot B = (A_1 \hat{x} \oplus A_0) \cdot (B_1 \hat{x} \oplus B_0) \quad \text{with} \quad \hat{x} = x^{\frac{n}{2}}$$
$$= (A_1 \odot B_1) \hat{x}^2 \oplus (A_1 \odot B_0 \oplus A_0 \odot B_1) \hat{x} \oplus (A_0 \odot B_0)$$

$$T_{1} = (A_{1} \odot B_{1})$$

$$T_{2} = (A_{1} \oplus A_{0}) \odot (B_{1} \oplus B_{0}) = A_{1}B_{0} \oplus A_{0}B_{1} \oplus A_{1}$$

$$T_{3} = (A_{0} \odot B_{0})$$

$$A \cdot B = T_{1}\hat{x}^{2} \oplus (T_{2} \oplus T_{1} \oplus T_{3})\hat{x} \oplus T_{3}$$

$$\hat{x}^{3} \quad \hat{x}^{2} \quad \hat{x}^{1} \quad \hat{x}^{0}$$

$$0$$

$$0$$

$$A \cdot B = A_{0} \odot B_{0}$$

$$A \cdot B = T_{1}\hat{x}^{2} \oplus (T_{2} \oplus T_{1} \oplus T_{3})\hat{x} \oplus T_{3}$$

# Multi-Segment Karatsuba (MSK)

- Generalization of the Karatsuba Multiplication Algorithm
- Polynomials are split into an arbitrary number of segments

$$MSK_{k}(A,B) = \left( \bigoplus_{i=1}^{k} S_{i,0}(A,B) \cdot \hat{x}^{i-1} \right) \oplus \left( \bigoplus_{i=1}^{k-1} S_{k-i,i}(A,B) \cdot \hat{x}^{i-1+k} \right)$$

with

$$S_{m,l}(A,B) = \left(\bigoplus_{i=1}^{m-1} S_{i,l}(A,B)\right) \oplus \left(\bigoplus_{i=1}^{m-1} S_{i,l+m-i}(A,B)\right) \oplus M_{m,l}(A,B),$$

$$S_{1,l}(A,B) = M_{1,l}(A,B) \quad \text{and} \quad M_{m,l}(A,B) = \begin{pmatrix} l+m-1 \\ \bigoplus_{i=l} A_i \end{pmatrix} \cdot \begin{pmatrix} l+m-1 \\ \bigoplus_{i=l} B_i \end{pmatrix}$$



# $MSK_k$ for k=3

$$MSK_{3} = \begin{pmatrix} M_{1,2} \end{pmatrix} & \cdot \hat{x}^{4} \oplus \\ \begin{pmatrix} M_{1,1} \oplus M_{1,2} \oplus M_{2,1} \end{pmatrix} & \cdot \hat{x}^{3} \oplus \\ \begin{pmatrix} M_{2,0} \oplus M_{2,1} \oplus M_{3,0} \end{pmatrix} & \cdot \hat{x}^{2} \oplus \\ \begin{pmatrix} M_{1,0} \oplus M_{1,1} \oplus M_{2,0} \end{pmatrix} & \cdot \hat{x}^{1} \oplus \\ \begin{pmatrix} M_{1,0} \end{pmatrix} & \cdot \hat{x}^{0} \end{pmatrix}$$
 with

$$M_{1,0} = A_0 \cdot B_0$$

$$M_{1,1} = A_1 \cdot B_1$$

$$M_{1,2} = A_2 \cdot B_2$$

$$M_{2,0} = (A_0 \oplus A_1) \cdot (B_0 \oplus B_1)$$

$$M_{2,1} = (A_1 \oplus A_2) \cdot (B_1 \oplus B_2)$$

$$M_{3,0} = (A_0 \oplus A_1 \oplus A_2) \cdot (B_0 \oplus B_1 \oplus B_2)$$





1) Pattern Grouping



- 2) Reorder pattern by decreasing x<sup>n</sup>
- 3) Minimize differences in the set of indices of adjacent pattern

















1) Pattern Grouping



- 2) Reorder pattern by decreasing x<sup>n</sup>
- 3) Minimize differences in the set of indices of adjacent pattern



2.) Ordering of the pattern in a top-left to bottom-right fashion



2.) Ordering of the pattern in a top-left to bottom-right fashion





1) Pattern Grouping



- 2) Reorder pattern by decreasing x<sup>n</sup>
- 3) Minimize differences in the set of indices of adjacent pattern





3.) Ensuring that the set of added up segments in the partial products differs only by one element between two adjacent patterns





3.) Ensuring that the set of added up segments in the partial products differs only by one element between two adjacent patterns

















































#### **Coprocessor Interface**



# ISS

#### **Results**

#### **Prototype Implementation:**

- 5-Segment Karatsuba (MSK<sub>5</sub>)
- 23-Bit combinational Multiplier
- GF(2<sup>113</sup>)

| FF-Level  | Clock Cycles |            |
|-----------|--------------|------------|
| Operation | best case    | worst case |
| FF-Mult   | 32           | 152        |
| FF-Add    | 16           | 136        |
| FF-Square | 1            | 91         |

| Operation | Clock Cycles |
|-----------|--------------|
| EC-Double | 493          |
| EC-Add    | 615          |
| k∙P       | 130,200      |

- One EC point multiplication takes 10.9 ms @ 12 MHz
- Speed-up factor of about 40 compared to an assembler optimized software implementation