Oblivious Sampling with Applications to Two-Party k-Means Clustering
The k -means clustering problem is one of the most explored problems in data mining. With the advent of protocols that have proven to be successful in performing single database clustering, the focus has shifted in recent years to the question of how to extend the single database protocols to a multiple database setting. To date, there have been numerous attempts to create specific multiparty k -means clustering protocols that protect the privacy of each database, but according to the standard cryptographic definitions of “privacy-protection”, so far all such attempts have fallen short of providing adequate privacy. In this paper, we describe a Two-Party k -Means Clustering Protocol that guarantees privacy against an honest-but-curious adversary, and is more efficient than utilizing a general multiparty “compiler” to achieve the same task. In particular, a main contribution of our result is a way to compute efficiently multiple iterations of k -means clustering without revealing the intermediate values. To achieve this, we describe a technique for performing two-party division securely and also introduce a novel technique allowing two parties to securely sample uniformly at random from an unknown domain size. The resulting Division Protocol and Random Value Protocol are of use to any protocol that requires the secure computation of a quotient or random sampling. Our techniques can be realized based on the existence of any semantically secure homomorphic encryption scheme. For concreteness, we describe our protocol based on Paillier Homomorphic Encryption scheme (see Paillier in Advances in: cryptology EURO-CRYPT’99 proceedings, LNCS 1592, pp 223–238, 1999). We will also demonstrate that our protocol is efficient in terms of communication, remaining competitive with existing protocols (such as Jagannathan and Wright in: KDD’05, pp 593–599, 2005) that fail to protect privacy.
Throughput-Optimal Routing in Unreliable Networks
We demonstrate the feasibility of throughput-efficient routing in a highly unreliable network. Modeling a network as a graph with vertices representing nodes and edges representing the links between them, we consider two forms of unreliability: unpredictable edge-failures, and deliberate deviation from protocol specifications by corrupt nodes. The first form of unpredictability represents networks with dynamic topology, whose links may be constantly going up and down; while the second form represents malicious insiders attempting to disrupt communication by deliberately disobeying routing rules, by e.g. introducing junk messages or deleting or altering messages. We present a robust routing protocol for end-to-end communication that is simultaneously resilient to both forms of unreliability, achieving provably optimal throughput performance. Our proof proceeds in three steps: 1) We use competitive-analysis to find a lower-bound on the optimal throughput-rate of a routing protocol in networks susceptible to only edge-failures (i.e. networks with no malicious nodes); 2) We prove a matching upper bound by presenting a routing protocol that achieves this throughput rate (again in networks with no malicious nodes); and 3) We modify the protocol to provide additional protection against malicious nodes, and prove the modified protocol performs (asymptotically) as well as the original.
Authenticated Adversarial Routing
The aim of this paper is to demonstrate the feasibility of authenticated throughput-efficient routing in an unreliable and dynamically changing synchronous network in which the majority of malicious insiders try to destroy and alter messages or disrupt communication in any way. More specifically, in this paper we seek to answer the following question: Given a network in which the majority of nodes are controlled by a malicious adversary and whose topology is changing every round, is it possible to develop a protocol with polynomially-bounded memory per processor that guarantees throughput-efficient and correct end-to-end communication? We answer the question affirmatively for extremely general corruption patterns: we only request that the topology of the network and the corruption pattern of the adversary leaves at least one path each round connecting the sender and receiver through honest nodes (though this path may change at every round). Out construction works in the public-key setting and enjoys bounded memory per processor (that does not depend on the amount of traffic and is polynomial in the network size.) Our protocol achieves optimal transfer rate with negligible decoding error. We stress that our protocol assumes no knowledge of which nodes are corrupted nor which path is reliable at any round, and is also fully distributed with nodes making decisions locally, so that they need not know the topology of the network at any time. The optimality that we prove for our protocol is very strong. Given any routing protocol, we evaluate its efficiency (rate of message delivery) in the ``worst case,'' that is with respect to the worst possible graph and against the worst possible (polynomially bounded) adversarial strategy (subject to the above mentioned connectivity constraints). Using this metric, we show that there does not exist any protocol that can be asymptotically superior (in terms of throughput) to ours in this setting. We remark that the aim of our paper is to demonstrate via explicit example the feasibility of throughput-efficient authenticated adversarial routing. However, we stress that out protocol is not intended to provide a practical solution, as due to its complexity, no attempt thus far has been made to make the protocol practical by reducing constants or the large (though polynomial) memory requirements per processor. Our result is related to recent work of Barak, Goldberg and Xiao in 2008 who studied fault localization in networks assuming a private-key trusted setup setting. Our work, in contrast, assumes a public-key PKI setup and aims at not only fault localization, but also transmission optimality. Among other things, our work answers one of the open questions posed in the Barak et. al. paper regarding fault localization on multiple paths. The use of a public-key setting to achieve strong error-correction results in networks was inspired by the work of Micali, Peikert, Sudan and Wilson who showed that classical error-correction against a polynomially-bounded adversary can be achieved with surprisingly high precision. Our work is also related to an interactive coding theorem of Rajagopalan and Schulman who showed that in noisy-edge static-topology networks a constant overhead in communication can also be achieved (provided none of the processors are malicious), thus establishing an optimal-rate routing theorem for static-topology networks. Finally, our work is closely related and builds upon to the problem of End-To-End Communication in distributed networks, studied by Afek and Gafni, Awebuch, Mansour, and Shavit, and Afek, Awerbuch, Gafni, Mansour, Rosen, and Shavit, though none of these papers consider or ensure correctness in the setting of a malicious adversary controlling the majority of the network.
Secure Two-Party k-Means Clustering
The k-Means Clustering problem is one of the most-explored problems in data mining to date. With the advent of protocols that have proven to be successful in performing single database clustering, the focus has changed in recent years to the question of how to extend the single database protocols to a multiple database setting. To date there have been numerous attempts to create specific multiparty k-means clustering protocols that protect the privacy of each database, but according to the standard cryptographic definitions of ``privacy-protection,'' so far all such attempts have fallen short of providing adequate privacy. In this paper we describe a Two-Party k-Means Clustering Protocol that guarantees privacy, and is more efficient than utilizing a general multiparty ``compiler'' to achieve the same task. In particular, a main contribution of our result is a way to compute efficiently multiple iterations of k-means clustering without revealing the intermediate values. To achieve this, we use novel techniques to perform two-party division and sample uniformly at random from an unknown domain size. Our techniques are quite general and can be realized based on the existence of any semantically secure homomorphic encryption scheme. For concreteness, we describe our protocol based on Paillier Homomorphic Encryption scheme (see [Pa]). We will also demonstrate that our protocol is efficient in terms of communication, remaining competitive with existing protocols (such as [JW]) that fail to protect privacy.