## CryptoDB

### Mark Simkin

#### ORCID: 0000-0002-7325-5261

#### Publications

**Year**

**Venue**

**Title**

2024

CRYPTO

FRIDA: Data Availability Sampling from FRI
Abstract

As blockchains like Ethereum continue to grow, clients with limited resources can no longer store the entire chain.
Light nodes that want to use the blockchain, without verifying that it is in a good state overall, can just download the block headers without the corresponding block contents.
As those light nodes may eventually need some of the block contents, they would like to ensure that they are in principle available.
Data availability sampling, introduced by Bassam et al., is a process that allows light nodes to check the availability of data without download it.
In a recent effort, Hall-Andersen, Simkin, and Wagner have introduced formal definitions and analyzed several constructions.
While their work thoroughly lays the formal foundations for data availability sampling, the constructions are either prohibitively expensive, use a trusted setup, or have a download complexity for light clients scales with a square root of the data size.
In this work, we make a significant step forward by proposing an efficient data availability sampling scheme without a trusted setup and only polylogarithmic overhead.
To this end, we find a novel connection with interactive oracle proofs of proximity (IOPPs).
Specifically, we prove that any IOPP meeting an additional consistency criterion can be turned into an erasure code commitment, and then, leveraging a compiler due to Hall-Andersen, Simkin, and Wagner, into a data availability sampling scheme.
This new connection enables data availability to benefit from future results on IOPPs.
We then show that the widely used FRI IOPP satisfies our consistency criterion and demonstrate that the resulting data availability sampling scheme outperforms the state-of-the-art asymptotically and concretely in multiple parameters.

2023

PKC

Threshold Private Set Intersection with Better Communication Complexity
Abstract

Given $\ell$ parties with sets $X_1, \dots, X_\ell$ of size $n$, we would like to securely compute the intersection $\cap_{i=1}^\ell X_i$, if it is larger than $n-t$ for some threshold $t$, without revealing any other additional information.
It has previously been shown (Ghosh and Simkin, Crypto 2019) that this function can be securely computed with a communication complexity that only depends on $t$ and in particular does not depend on $n$.
For small values of $t$, this results in protocols that have a communication complexity that is sublinear in the size of the inputs.
Current protocols either rely on fully homomorphic encryption or have an at least quadratic dependency on the parameter $t$.
In this work, we construct protocols with a quasilinear dependency on $t$ from simple assumptions like additively homomorphic encryption and oblivious transfer.
All existing approaches, including ours, rely on protocols for computing a single bit, which indicates whether the intersection is larger than $n-t$ without actually computing it.
Our key technical contribution, which may be of independent interest, takes any such protocol with secret shared outputs and communication complexity $\mathcal{O}(\lambda \ell \mathsf{poly}(t))$, where $\lambda$ is the security parameter, and transforms it into a protocol with communication complexity $\mathcal{O}(\lambda^2 \ell t \mathsf{polylog}(t))$.

2023

EUROCRYPT

How to Compress Encrypted Data
Abstract

We study the task of obliviously compressing a vector comprised of $n$ ciphertexts of size $\xi$ bits each, where at most $t$ of the corresponding plaintexts are non-zero.
This problem commonly features in applications involving encrypted outsourced storages, such as searchable encryption or oblivious message retrieval.
We present two new algorithms with provable worst-case guarantees, solving this problem by using only homomorphic additions and multiplications by constants.
Both of our new constructions improve upon the state of the art asymptotically and concretely.
Our first construction, based on sparse polynomials, is perfectly correct and the first to achieve an asymptotically optimal compression rate by compressing the input vector into $\bigO{t \xi}$ bits.
Compression can be performed homomorphically by performing $\bigO{n \log n}$ homomorphic additions and multiplications by constants.
The main drawback of this construction is a decoding complexity of $\Omega(\sqrt{n})$.
Our second construction is based on a novel variant of invertible bloom lookup tables and is correct with probability $1-2^{-\kappa}$.
It has a slightly worse compression rate compared to our first construction as it compresses the input vector into $\bigO{\xi\kappa t /\log t}$ bits, where $\kappa \geq \log t$.
In exchange, both compression and decompression of this construction are highly efficient.
The compression complexity is dominated by $\bigO{n \kappa/\log t}$ homomorphic additions and multiplications by constants.
The decompression complexity is dominated by $\bigO{\kappa t /\log t}$ decryption operations and equally many inversions of a pseudorandom permutation.

2022

EUROCRYPT

Property-Preserving Hash Functions for Hamming Distance from Standard Assumptions
📺
Abstract

Property-preserving hash functions allow for compressing long inputs $x_0$ and $x_1$ into short hashes $h(x_0)$ and $h(x_1)$ in a manner that allows for computing a predicate $P(x_0, x_1)$ given only the two hash values without having access to the original data.
Such hash functions are said to be adversarially robust if an adversary that gets to pick $x_0$ and $x_1$ after the hash function has been sampled, cannot find inputs for which the predicate evaluated on the hash values outputs the incorrect result.
In this work we construct robust property-preserving hash functions for the hamming-distance predicate which distinguishes inputs with a hamming distance at least some threshold $t$ from those with distance less than $t$. The security of the construction is based on standard lattice hardness assumptions.
Our construction has several advantages over the best known previous construction by Fleischhacker and Simkin (Eurocrypt 2021).
Our construction relies on a single well-studied hardness assumption from lattice cryptography whereas the previous work relied on a newly introduced family of computational hardness assumptions.
In terms of computational effort, our construction only requires a small number of modular additions per input bit, whereas the work of Fleischhacker and Simkin required several exponentiations per bit as well as the interpolation and evaluation of high-degree polynomials over large fields.
An additional benefit of our construction is that the description of the hash function can be compressed to $\lambda$ bits assuming a random oracle.
Previous work has descriptions of length $\bigO{\ell \lambda}$ bits for input bit-length $\ell$.
We prove a lower bound on the output size of any property-preserving hash function for the hamming distance predicate.
The bound shows that the size of our hash value is not far from optimal.

2021

EUROCRYPT

The Mother of All Leakages: How to Simulate Noisy Leakages via Bounded Leakage (Almost) for Free
📺
Abstract

We show that noisy leakage can be simulated in the information-theoretic setting using a single query of bounded leakage, up to a small statistical simulation error and a slight loss in the leakage parameter. The latter holds true in particular for one of the most used noisy-leakage models, where the noisiness is measured using the conditional average min-entropy (Naor and Segev, CRYPTO'09 and SICOMP'12).
Our reductions between noisy and bounded leakage are achieved in two steps. First, we put forward a new leakage model (dubbed the dense leakage model) and prove that dense leakage can be simulated in the information-theoretic setting using a single query of bounded leakage, up to small statistical distance. Second, we show that the most common noisy-leakage models fall within the class of dense leakage, with good parameters. We also provide a complete picture of the relationships between different noisy-leakage models, and prove lower bounds showing that our reductions are nearly optimal.
Our result finds applications to leakage-resilient cryptography, where we are often able to lift security in the presence of bounded leakage to security in the presence of noisy leakage, both in the information-theoretic and in the computational setting. Additionally, we show how to use lower bounds in communication complexity to prove that bounded-collusion protocols (Kumar, Meka, and Sahai, FOCS'19) for certain functions do not only require long transcripts, but also necessarily need to reveal enough information about the inputs.

2021

EUROCRYPT

Robust Property-Preserving Hash Functions for Hamming Distance and More
📺
Abstract

Robust property-preserving hash (PPH) functions, recently introduced by Boyle, Lavigne, and Vaikuntanathan [ITCS 2019], compress large inputs $x$ and $y$ into short digests $h(x)$ and $h(y)$ in a manner that allows for computing a predicate $P$ on $x$ and $y$ while only having access to the corresponding hash values. In contrast to locality-sensitive hash functions, a robust PPH function guarantees to correctly evaluate a predicate on $h(x)$ and $h(y)$ even if $x$ and $y$ are chosen adversarially \emph{after} seeing $h$.
Our main result is a robust PPH function for the exact hamming distance predicate
\[
\mathsf{HAM}^t(x, y) =
\begin{cases}
1 &\text{if } d( x, y) \geq t \\
0 & \text{Otherwise}\\
\end{cases}
\]
where $d(x, y)$ is the hamming-distance between $x$ and $y$.
Our PPH function compresses $n$-bit strings into $\mathcal{O}(t \lambda)$-bit digests, where $\lambda$ is the security parameter.
The construction is based on the q-strong bilinear discrete logarithm assumption.
Along the way, we construct a robust PPH function for the set intersection predicate
\[
\mathsf{INT}^t(X, Y) =
\begin{cases}
1 &\text{if } \vert X \cap Y\vert > n - t \\
0 & \text{Otherwise}\\
\end{cases}
\]
which compresses sets $X$ and $Y$ of size $n$ with elements from some arbitrary universe $U$ into $\mathcal{O}(t\lambda)$-bit long digests.
This PPH function may be of independent interest.
We present an almost matching lower bound of $\Omega(t \log t)$ on the digest size of any PPH function for the intersection predicate, which indicates that our compression rate is close to optimal.
Finally, we also show how to extend our PPH function for the intersection predicate to more than two inputs.

2021

PKC

On Publicly-Accountable Zero-Knowledge and Small Shuffle Arguments
📺
Abstract

Constructing interactive zero-knowledge arguments from simple assumptions with small communication complexity and good computational efficiency is an important, but difficult problem.
In this work, we study interactive arguments with noticeable soundness error in their full generality and for the specific purpose of constructing concretely efficient shuffle arguments.
To counterbalance the effects of a larger soundness error, we show how to transform such three-move arguments into publicly-accountable ones which allow the verifier to convince third parties of detected misbehavior by a cheating prover.
This may be particularly interesting for applications where a malicious prover has to balance the profits it can make from cheating successfully and the losses it suffers from being caught.
We construct interactive, public-coin, zero-knowledge arguments with noticeable soundness error for proving that a target vector of commitments is a pseudorandom permutation of a source vector.
Our arguments do not rely on any trusted setup and only require the existence of collision-resistant hash functions.
The communication complexity of our arguments is \emph{independent} of the length of the shuffled vector.
For a soundness error of $2^{-5}=1/32$, the communication cost is $153$ bytes without and $992$ bytes with public accountability, meaning that our arguments are shorter than shuffle arguments realized using Bulletproofs (IEEE S\&P 2018) and even competitive in size with SNARKs, despite only relying on simple assumptions.

2020

EUROCRYPT

Lower Bounds for Leakage-Resilient Secret Sharing
📺
Abstract

Threshold secret sharing allows a dealer to split a secret into $n$ shares such that any authorized subset of cardinality at least $t$ of those shares efficiently reveals the secret, while at the same time any unauthorized subset of cardinality less than $t$ contains no information about the secret. Leakage-resilience additionally requires that the secret remains hidden even if one is given a bounded amount of additional leakage from every share.
In this work, we study leakage-resilient secret sharing schemes and prove a lower bound on the share size and the required amount randomness of any information-theoretically secure scheme.
We prove that for any information-theoretically secure leakage-resilient secret sharing scheme either the amount of randomness across all shares or the share size has to be linear in $n$.
More concretely, for a secret sharing scheme with $p$-bit long shares, $\ell$-bit leakage per share, where $\widehat{t}$ shares uniquely define the remaining $n - \widehat{t}$ shares, it has to hold that $p \ge \frac{\ell (n - t)}{\widehat{t}}$.
We use this lower bound to gain further insights into a question that was recently posed by Benhamouda et al. (CRYPTO'18), who ask to what extend existing regular secret sharing schemes already provide protection against leakage.
The authors proved that Shamir's secret sharing is $1$-bit leakage-resilient for reconstruction thresholds $t \geq 0.85n$ and conjectured that it is also $1$-bit leakage-resilient for any other threshold that is a constant fraction of the total number of shares.
We do not disprove their conjecture, but show that it is the best one could possibly hope for.
Concretely, we show that for large enough $n$ and any constant $0< c < 1$ it holds that Shamir's secret sharing scheme is \emph{not} leakage-resilient for $t \leq \frac{cn}{\log n}$.
In contrast to the setting with information-theoretic security, we show that our lower bound does not hold in the computational setting.
That is, we show how to construct a leakage-resilient secret sharing scheme in the random oracle model that is secure against computationally bounded adversaries and violates the lower bound stated above.

2020

CRYPTO

Non-Malleable Secret Sharing against Bounded Joint-Tampering Attacks in the Plain Model
📺
Abstract

Secret sharing enables a dealer to split a secret into a set of shares, in such a way that certain authorized subsets of share holders can reconstruct the secret, whereas all unauthorized subsets cannot.
Non-malleable secret sharing (Goyal and Kumar, STOC 2018) additionally requires that, even if the shares have been tampered with, the reconstructed secret is either the original or a completely unrelated one.
In this work, we construct non-malleable secret sharing tolerating $p$-time {\em joint-tampering} attacks in the plain model (in the computational setting), where the latter means that, for any $p>0$ fixed {\em a priori}, the attacker can tamper with the same target secret sharing up to $p$ times. In particular, assuming one-to-one one-way functions, we obtain:
- A secret sharing scheme for threshold access structures which tolerates joint $p$-time tampering with subsets of the shares of maximal size ({\em i.e.}, matching the privacy threshold of the scheme). This holds in a model where the attacker commits to a partition of the shares into non-overlapping subsets, and keeps tampering jointly with the shares within such a partition (so-called {\em selective partitioning}).
- A secret sharing scheme for general access structures which tolerates joint $p$-time tampering with subsets of the shares of size $O(\sqrt{\log n})$, where $n$ is the number of parties. This holds in a stronger model where the attacker is allowed to adaptively change the partition within each tampering query, under the restriction that once a subset of the shares has been tampered with jointly, that subset is always either tampered jointly or not modified by other tampering queries (so-called {\em semi-adaptive partitioning}).
At the heart of our result for selective partitioning lies a new technique showing that every one-time {\em statistically} non-malleable secret sharing against joint tampering is in fact {\em leakage-resilient} non-malleable ({\em i.e.},\ the attacker can leak jointly from the shares prior to tampering).
We believe this may be of independent interest, and in fact we show it implies lower bounds on the share size and randomness complexity of statistically non-malleable secret sharing against {\em independent} tampering.

2020

CRYPTO

Black-Box Transformations from Passive to Covert Security with Public Verifiability
📺
Abstract

In the context of secure computation, protocols with security against covert adversaries ensure that any misbehavior by malicious parties will be detected by the honest parties with some constant probability.
As such, these protocols provide better security guarantees than passively secure protocols and, moreover, are easier to construct than protocols with full security against active adversaries.
Protocols that, upon detecting a cheating attempt, allow the honest parties to compute a certificate that enables third parties to verify whether an accused party misbehaved or not are called publicly verifiable.
In this work, we present the first generic compilers for constructing two-party protocols with covert security and public verifiability from protocols with passive security.
We present two separate compilers, which are both fully blackbox in the underlying protocols they use.
Both of them only incur a constant multiplicative factor in terms of bandwidth overhead and a constant additive factor in terms of round complexity on top of the passively secure protocols they use.
The first compiler applies to all two-party protocols that have no private inputs.
This class of protocols covers the important class of preprocessing protocols that are used to setup correlated randomness among parties.
We use our compiler to obtain the first secret-sharing based two-party protocol with covert security and public verifiability.
Notably, the produced protocol achieves public verifiability essentially for free when compared with the best known previous solutions based on secret-sharing that did not provide public verifiability
Our second compiler constructs protocols with covert security and public verifiability for arbitrary functionalities from passively secure protocols.
It uses our first compiler to perform a setup phase, which is independent of the parties' inputs as well as the protocol they would like to execute.
Finally, we show how to extend our techniques to obtain multiparty computation protocols with covert security and public verifiability against arbitrary constant fractions of corruptions.

2020

TCC

Lower Bounds for Multi-Server Oblivious RAMs
📺
Abstract

In this work, we consider oblivious RAMs (ORAM) in a setting with multiple servers and the adversary may corrupt a subset of the servers. We present an $\Omega(log n)$ overhead lower bound for any k-server ORAM that limits any PPT adversary to distinguishing advantage at most $1/4k$ when only one server is corrupted. In other words, if one insists on negligible distinguishing advantage, then multi-server ORAMs cannot be faster than single-server ORAMs even with polynomially many servers of which only one unknown server is corrupted. Our results apply to ORAMs that may err with probability at most 1/128 as well as scenarios where the adversary corrupts larger subsets of servers. We also extend our lower bounds to other important data structures including oblivious stacks, queues, deques, priority queues and search trees.

2019

CRYPTO

The Communication Complexity of Threshold Private Set Intersection
📺
Abstract

Threshold private set intersection enables Alice and Bob who hold sets
$$S_{\mathsf {A}}$$
and
$$S_{\mathsf {B}}$$
of size n to compute the intersection
$$S_{\mathsf {A}} \cap S_{\mathsf {B}} $$
if the sets do not differ by more than some threshold parameter
$$t$$
. In this work, we investigate the communication complexity of this problem and we establish the first upper and lower bounds. We show that any protocol has to have a communication complexity of
$$\varOmega (t)$$
. We show that an almost matching upper bound of
$$\tilde{\mathcal {O}}(t)$$
can be obtained via fully homomorphic encryption. We present a computationally more efficient protocol based on weaker assumptions, namely additively homomorphic encryption, with a communication complexity of
$$\tilde{\mathcal {O}}(t ^2)$$
. For applications like biometric authentication, where a given fingerprint has to have a large intersection with a fingerprint from a database, our protocols may result in significant communication savings.Prior to this work, all previous protocols had a communication complexity of
$$\varOmega (n)$$
. Our protocols are the first ones with communication complexities that mainly depend on the threshold parameter
$$t$$
and only logarithmically on the set size n.

2019

CRYPTO

Stronger Leakage-Resilient and Non-Malleable Secret Sharing Schemes for General Access Structures
📺
Abstract

In this work we present a collection of compilers that take secret sharing schemes for an arbitrary access structure as input and produce either leakage-resilient or non-malleable secret sharing schemes for the same access structure. A leakage-resilient secret sharing scheme hides the secret from an adversary, who has access to an unqualified set of shares, even if the adversary additionally obtains some size-bounded leakage from all other secret shares. A non-malleable secret sharing scheme guarantees that a secret that is reconstructed from a set of tampered shares is either equal to the original secret or completely unrelated. To the best of our knowledge we present the first generic compiler for leakage-resilient secret sharing for general access structures. In the case of non-malleable secret sharing, we strengthen previous definitions, provide separations between them, and construct a non-malleable secret sharing scheme for general access structures that fulfills the strongest definition with respect to independent share tampering functions. More precisely, our scheme is secure against concurrent tampering: The adversary is allowed to (non-adaptively) tamper the shares multiple times, and in each tampering attempt can freely choose the qualified set of shares to be used by the reconstruction algorithm to reconstruct the tampered secret. This is a strong analogue of the multiple-tampering setting for split-state non-malleable codes and extractors.We show how to use leakage-resilient and non-malleable secret sharing schemes to construct leakage-resilient and non-malleable threshold signatures. Classical threshold signatures allow to distribute the secret key of a signature scheme among a set of parties, such that certain qualified subsets can sign messages. We construct threshold signature schemes that remain secure even if an adversary leaks from or tampers with all secret shares.

2019

ASIACRYPT

Perfectly Secure Oblivious RAM with Sublinear Bandwidth Overhead
Abstract

Oblivious RAM (ORAM) has established itself as a fundamental cryptographic building block. Understanding which bandwidth overheads are possible under which assumptions has been the topic of a vast amount of previous works. In this work, we focus on perfectly secure ORAM and we present the first construction with sublinear bandwidth overhead in the worst-case. All prior constructions with perfect security require linear communication overhead in the worst-case and only achieve sublinear bandwidth overheads in the amortized sense. We present a fundamentally new approach for constructing ORAM and our results significantly advance our understanding of what is possible with perfect security.Our main construction, Lookahead ORAM, is perfectly secure, has a worst-case bandwidth overhead of , and a total storage cost of on the server-side, where N is the maximum number of stored data elements. In terms of concrete server-side storage costs, our construction has the smallest storage overhead among all perfectly and statistically secure ORAMs and is only a factor 3 worse than the most storage efficient computationally secure ORAM. Assuming a client-side position map, our construction is the first, among all ORAMs with worst-case sublinear overhead, that allows for a online bandwidth overhead without server-side computation. Along the way, we construct a conceptually extremely simple statistically secure ORAM with a worst-case bandwidth overhead of , which may be of independent interest.

2018

CRYPTO

Yet Another Compiler for Active Security or: Efficient MPC Over Arbitrary Rings
📺
Abstract

We present a very simple yet very powerful idea for turning any passively secure MPC protocol into an actively secure one, at the price of reducing the threshold of tolerated corruptions.Our compiler leads to a very efficient MPC protocols for the important case of secure evaluation of arithmetic circuits over arbitrary rings (e.g., the natural case of $${\mathbb {Z}}_{2^{\ell }}\!$$) for a small number of parties. We show this by giving a concrete protocol in the preprocessing model for the popular setting with three parties and one corruption. This is the first protocol for secure computation over rings that achieves active security with constant overhead.

2018

PKC

Compact Zero-Knowledge Proofs of Small Hamming Weight
Abstract

We introduce a new technique that allows to give a zero-knowledge proof that a committed vector has Hamming weight bounded by a given constant. The proof has unconditional soundness and is very compact: It has size independent of the length of the committed string, and for large fields, it has size corresponding to a constant number of commitments. We show five applications of the technique that play on a common theme, namely that our proof allows us to get malicious security at small overhead compared to semi-honest security: (1) actively secure k-out-of-n OT from black-box use of 1-out-of-2 OT, (2) separable accountable ring signatures, (3) more efficient preprocessing for the TinyTable secure two-party computation protocol, (4) mixing with public verifiability, and (5) PIR with security against a malicious client.

#### Program Committees

- Crypto 2022
- Asiacrypt 2022

#### Coauthors

- Divesh Aggarwal (1)
- Giuseppe Ateniese (1)
- Gianluca Brian (2)
- Ivan Damgård (4)
- Antonio Faonio (2)
- Dario Fiore (1)
- Nils Fleischhacker (5)
- Satrajit Ghosh (2)
- Mathias Hall-Andersen (1)
- Johannes Krupp (2)
- Kasper Green Larsen (3)
- Ji Luo (1)
- Giulio Malavolta (1)
- Jesper Buus Nielsen (2)
- Stefan Nürnberger (1)
- Maciej Obremski (3)
- Sabine Oechsner (1)
- Claudio Orlandi (2)
- Erick Purwanto (1)
- Michael Raskin (1)
- João Ribeiro (2)
- Jonas Schneider (1)
- Peter Scholl (1)
- Dominique Schröder (2)
- Mark Simkin (18)
- Maciej Skórski (1)
- Daniele Venturi (2)
- Benedikt Wagner (1)
- Kevin Yeo (1)