Nearly Optimal Property Preserving Hashing 📺
Property-preserving hashing (PPH) consists of a family of compressing hash functions $h$ such that, for any two inputs $x,y$, we can correctly identify whether some property $P(x,y)$ holds given only the digests $h(x),h(y)$. In a basic PPH, correctness should hold with overwhelming probability over the choice of $h$ when $x,y$ are worst-case values chosen a-priori and independently of $h$. In an adversarially robust PPH (RPPH), correctness must hold even when $x,y$ are chosen adversarially and adaptively depending on $h$. Here, we study (R)PPH for the property that the Hamming distance between $x$ and $y$ is at most $t$. The notion of (R)PPH was introduced by Boyle, LaVigne and Vaikuntanathan (ITCS '19), and further studied by Fleischhacker, Simkin (Eurocrypt '21) and Fleischhacker, Larsen, Simkin (Eurocrypt '22). In this work, we obtain improved constructions that are conceptually simpler, have nearly optimal parameters, and rely on more general assumptions than prior works. Our results are: * We construct information-theoretic non-robust PPH for Hamming distance via syndrome list-decoding of linear error-correcting codes. We provide a lower bound showing that this construction is essentially optimal. * We make the above construction robust with little additional overhead, by relying on homomorphic collision-resistant hash functions, which can be constructed from either the discrete-logarithm or the short-integer-solution assumptions. The resulting RPPH achieves improved compression compared to prior constructions, and is nearly optimal. * We also show an alternate construction of RPPH for Hamming distance under the minimal assumption that standard collision-resistant hash functions exist. The compression is slightly worse than our optimized construction using homomorphic collision-resistance, but essentially matches the prior state of the art constructions from specific algebraic assumptions. * Lastly, we study a new notion of randomized robust PPH (R2P2H) for Hamming distance, which relaxes RPPH by allowing the hashing algorithm itself to be randomized. We give an information-theoretic construction with optimal parameters.