*21:17*[Pub][ePrint] Using Random Error Correcting Codes in Near-Collision Attacks on Generic Hash-Functions, by Inna Polak, Adi Shamir

In this paper we consider the problem of finding a near-collision

with Hamming distance bounded by $r$ in a generic cryptographic hash

function $h$ whose outputs can be modeled as random $n$-bit strings.

In 2011, Lamberger suggested a modified version of Pollard\'s rho method

which computes a chain of values by alternately applying the hash

function $h$ and an error correcting code $e$ to a random starting

value $x_{0}$ until it cycles. This turns some (but not all) of the

near-collisions in $h$ into full collisions in $f=e\\circ h$, which

are easy to find. In 2012, Leurent improved Lamberger\'s memoryless

algorithm by using any available amount of memory to store the endpoints

of multiple chains of $f$ values, and using Van Oorschot and Wiener\'s

algorithm to find many full collisions in $f$, hoping that one of

them will be an $r$-near-collision in $h$. This is currently the

best known time/memory tradeoff algorithm for the problem.

The efficiency of both Lamberger\'s and Leurent\'s algorithms depend

on the quality of their error correction code. Since they have to

apply error correction to \\emph{any} bit string, they want to use

perfect codes, but all the known constructions of such codes can correct

only $1$ or $3$ errors. To deal with a larger number of errors,

they recommend using a concatenation of many Hamming codes, each capable

of correcting a single error in a particular subset of the bits, along

with some projections. As we show in this paper, this is a suboptimal

choice, which can be considerably improved by using randomly chosen

linear codes instead of Hamming codes and storing a precomputed lookup

table to make the error correction process efficient. We show both

theoretically and experimentally that this is a better way to utilize

the available memory, instead of devoting all the memory to the storage

of chain endpoints. Compared to Leurent\'s algorithm, we demonstrate

an improvement ratio which grows with the size of the problem. In

particular, we experimentally verified an improvement ratio of about

$3$ in a small example with $n=160$ and $r=33$ which we implemented

on a single PC, and mathematically predicted an improvement ratio

of about $730$ in a large example with $n=1024$ and $r=100$, using

$2^{40}$ memory.