## CryptoDB

### Ee-Chien Chang

#### Publications

Year
Venue
Title
2015
EPRINT
2015
EPRINT
2014
EPRINT
2010
EPRINT
We are interested in the integrity of the query results from an outsourced database service provider. Alice passes a set $\mathtt{D}$ of $d$-dimensional points, together with some authentication tag $\mathtt{T}$, to an untrusted service provider Bob. Later, Alice issues some query over $\mathtt{D}$ to Bob, and Bob should produce a query result and a proof based on $\mathtt{D}$ and $\mathtt{T}$. Alice wants to verify the integrity of the query result with the help of the proof, using only the private key. In this paper, we consider aggregate query conditional on multidimensional range selection. In its basic form, a query asks for the total number of data points within a $d$-dimensional range. We are concerned about the number of communication bits required and the size of the tag $\mathtt{T}$. We give a method that requires $O(d^2)$ communication bits to authenticate an aggregate query conditional on $d$-dimensional range selection. Besides counting, summing and finding of the minimum can also be supported. Furthermore, our scheme can be extended slightly to authenticate $d$-dimensional usual (non-aggregate) range selection query with $O(d^2)$ bits communication overhead, improving known results that require $O(\log^{d-1} N)$ communication overhead, where $N$ is the number of data points in the dataset.
2009
EPRINT
A redactable signature scheme for a string of objects supports verification even if multiple substrings are removed from the original string. It is important that the redacted string and its signature do not reveal anything about the content of the removed substrings. Existing schemes completely or partially leak a piece of information: the lengths of the removed substrings. Such length information could be crucial for many applications, especially when the removed substring has low entropy. We propose a scheme that can hide the length. Our scheme consists of two components. The first component $\mathcal{H}$, which is a collision resistant'' hash, maps a string to an unordered set, whereby existing schemes on unordered sets can then be applied. However, a sequence of random numbers has to be explicitly stored and thus it produces a large signature of size at least $(m k)$-bits where $m$ is the number of objects and $k$ is the size of a key sufficiently large for cryptographic operations. The second component uses RGGM tree, a variant of GGM tree, to generate the pseudo random numbers from a short seed, expected to be of size $O(k+ tk\log m)$ where $t$ is the number of removed substrings. Unlike GGM tree, the structure of the proposed RGGM tree is random. By an intriguing statistical property of the random tree, the redacted tree does not reveal the lengths of the substrings removed. The hash function $\mathcal{H}$ and the RGGM tree can be of independent interests.
2008
EPRINT
We are interested in this problem: a verifier, with a small and reliable storage, wants to periodically check whether a remote server is keeping a large file $\mathbf{x}$. A dishonest server, by adapting the challenges and responses, tries to discard partial information of $\mathbf{x}$ and yet evades detection. Besides the security requirements, there are considerations on communication, storage size and computation time. Juels et al. \cite{Pors} gave a security model for {\em Proof of Retrievability} ({\POR}) system. The model imposes a requirement that the original ${\bf x}$ can be recovered from multiple challenges-responses. Such requirement is not necessary in our problem. Hence, we propose an alternative security model for {\em Remote Integrity Check} ({\RIC}). We study a few schemes and analyze their efficiency and security. In particular, we prove the security of a proposed scheme {\simplePIR}. This scheme can be deployed as a {\POR} system and it also serves as an example of an effective {\POR} system whose extraction'' is not verifiable. We also propose a combination of the RSA-based scheme by Filho et al. \cite{DDPs} and the ECC-based authenticator by Naor et al. \cite{complex_memcheck}, which achieves good asymptotic performance. This scheme is not a {\POR} system and seems to be a secure {\RIC}. In-so-far, all schemes that have been proven secure can also be adopted as {\POR} systems. This brings out the question of whether there are fundamental differences between the two models. To highlight the differences, we introduce a notion, {\em trap-door compression}, that captures a property on compressibility.
2008
EPRINT
To reconcile the demand of information dissemination and preservation of privacy, a popular approach generalizes the attribute values in the dataset, for example by dropping the last digit of the postal code, so that the published dataset meets certain privacy requirements, like the notions of k-anonymity and l-diversity. On the other hand, the published dataset should remain useful and not over generalized. Hence it is desire to disseminate a database with high "usefulness", measured by a utility function. This leads to a generic framework whereby the optimal dataset (w.r.t. the utility function) among all the generalized datasets that meet certain privacy requirements, is chosen to be disseminated. In this paper, we observe that, the fact that a generalized dataset is optimal may leak information about the original. Thus, an adversary who is aware of how the dataset is generalized may able to derive more information than what the privacy requirements constrained. This observation challenges the widely adopted approach that treats the generalization process as an optimization problem. We illustrate the observation by giving counter-examples in the context of k-anonymity and l-diversity.
2006
EUROCRYPT
2006
EPRINT
Given the original set $X$ where $|X|=s$, a sketch $P$ is computed from $X$ and made public. From another set $Y$ where $|Y| = s$ and $P$, we can reconstruct $X$ if $|X\cap Y|\ge |s-t|$, where $t<s$ is some threshold. The sketch $P$ is secure if it does not reveal much information about $X$. A few constructions have been proposed, but they cannot handle multi-sets, that is, sets that may contain duplicate elements. We observe that the techniques in the set reconciliation protocol proposed by Minsky et al. (ISIT 2001) can be applied and give a secure sketch that supports multi-sets. If $X$ is a subset of an universe with $n$ elements, the running time of the encoding and decoding algorithms will be polynomial w.r.t. $s$ and $\log n$, and the entropy loss due to the sketch is less than $2t(1+\log n)$.
2005
EPRINT
A secure sketch is a set of published data that can help to recover the original biometric data after they are corrupted by permissible noises, and by itself does not reveal much information about the original. Several constructions have been proposed for different metrics, and in particular, set difference. We observe that in many promising applications, set difference alone is insufficient to model the noises. We propose to look into point-set difference, which measures noises that not only remove/introduce new feature points in the biometric objects, but may also perturb the points. In this paper, we first give an improvement for set difference construction that can be extended to multi-sets, where the sketch is small and there is an efficient decoding algorithm. We next give a sketch for point-set difference in both one and two-dimensional spaces. By using results in almost k-wise independence, the size of the sketch is reduced to near-optimal.