Authenticated encryption (AE) has recently gained renewed interest due to the ongoing CAESAR competition. This paper deals with the performance of block cipher modes of operation for AE in parallel software. We consider the example of the AES on Intel\'s new Haswell microarchitecture that has improved intructions for AES rounds and finite field multiplication.
As opposed to most previous high-performance software implementations of operation modes -- that have considered the encryption of single messages -- we propose to process multiple messages in parallel. We demonstrate that this message scheduling is of significant advantage for most modes. As a baseline for longer messages, the performance of AES-CBC encryption on a single core increases by factor 6.8 when adopting this approach.
For the first time, we report optimized AES-NI implementations of the novel AE modes OTR, McOE-G, COBRA, and POET -- both with single and multiple messages. For almost all AE modes considered, we obtain a consistent speed-up when processing multiple messages in parallel. Notably, among the nonce-based modes, AES-CCM gets by factor 3.5 faster and its performance is about 1.2 cpb which is close to that of AES-GCM (the latter, however, possessing classes of weak keys), with AES-OCB3 still performing at only 0.69 cpb. Among the nonce-misuse resistant modes, AES-McOE-G receives a speed-up by factor 4 and its performance is about 1.44 cpb, which is faster than AES-COBRA with its 1.55 cpb but slower than AES-COPA with 1.29 cpb.