In this paper we propose an efficient and compact hardware implementation of a polynomial multiplier based on the Fast Fourier Transform (FFT) for use in ring-LWE cryptosystems. We optimize the forward wrapped convolution by merging the pre-processing and the FFT and propose an advanced memory access scheme which reduces the number
of memory accesses and the number of RAM slices used in the design.
These techniques result in a hardware implementation of a polynomial multiplier for the ring-LWE cryptosystem of dimension 256 that uses only 281 slices and one block RAM on a Virtex V.
Finally, we also propose a modification of a ring-LWE encryption system
that reduces the number of FTT operations from five to four resulting
in a near 20% speed-up.