International Association for Cryptologic Research

IACR News Central

You can also access the full news archive.

Further sources to find out about changes are CryptoDB, ePrint RSS, ePrint Web, Event calender (iCal).

2014-09-29
09:17 [Pub][ePrint]

Detecting hardware trojans is a difficult task in general.

In this article we study hardware trojan horses insertion and detection in cryptographic intellectual property (IP) blocks.

The context is that of a fabless design house that sells IP blocks as GDSII hard macros, and wants to check that final products have not been infected by trojans during the foundry stage.

First, we show the efficiency of a medium cost hardware trojans detection method if the placement or the routing have been redone by the foundry.

It consists in the comparison between optical microscopic pictures of the silicon product and the original view from a GDSII layout database reader.

Second, we analyze the ability of an attacker to introduce a hardware trojan horse without changing neither the placement nor the routing of the cryptographic IP logic.

On the example of an AES engine, we show that if the placement density is beyond $80$\\%, the insertion is basically impossible.

Therefore, this settles a simple design guidance to avoid trojan horses insertion in cryptographic IP blocks:

have the design be compact enough, so that any functionally discreet trojan necessarily requires a complete re-place and re-route, which is detected by mere optical imaging (and not complete chip reverse-engineering).

09:17 [Pub][ePrint]

Higher-order differential power analysis attacks are a serious threat for cryptographic hardware implementations. In particular, glitches in the circuit make it hard to protect the implementation with masking. The existing higher-order masking countermeasures that guarantee security in the presence of glitches use multi-party computation techniques and require a lot of resources in terms of circuit area and randomness. The Threshold Implementation method is also based on multi-party computation but it is more area and randomness efficient. Moreover, it typically requires less clock-cycles since all parties can operate simultaneously. However, so far it is only provable secure against 1st-order DPA. We address this gap and extend the Threshold Implementation technique to higher orders. We define generic constructions and prove their security. To illustrate the approach, we provide 1st, 2nd and 3rd-order DPA-resistant implementations of the block cipher KATAN- 32. Our analysis of 300 million power traces measured from an FPGA implementation supports the security proofs.

09:17 [Pub][ePrint]

09:17 [Pub][ePrint]

In the problem of anonymous authentication (Boneh et al. CCS 1999), a sender wishes to authenticate a message to a given recipient in a way that preserves anonymity: the recipient does not know the identity of the sender and only is assured that the sender belongs to some authorized set. Although solutions for the problem exist (for example, by using ring signatures, e.g. Naor, Crypto 2002), they provide no security when the anonymity set is a singleton. This work is motivated by the question of whether there is any type of anonymity possible in this scenario. It turns out that we can still protect the identity of all senders (authorized or not) if we shift our concern from preventing the identity information be revealed to the recipient to preventing it could be revealed to an external entity, other than the recipient. We define a natural functionality which provides such guarantees and we denote it by F_{eaa} for externally anonymous authenticated channel.

We argue that any realization of F_{eaa} must be deniable in the sense of Dodis et al. TCC 2009. To prove the deniability of similar primitives, previous work defined ad hoc notions of deniability for each task, and then each notion was showed equivalent to realizing the primitive in the Generalized Universal Composability framework (GUC, Canetti et al. TCC 2007). Instead, we put forward the question of whether deniability can be defined independently from any particular task. We answer this question in the affirmative providing a natural extension of the definition of Dodis et al. for arbitrary multiparty protocols. Furthermore, we show that a protocol satisfies this definition if an only if it realizes the ideal functionality F_{eaa} in the GUC framework. This result enables us to prove that most GUC functionalities we are aware of (and their realizations) are deniable.

We conclude by applying our results to the construction of a deniable protocol that realizes F_{eaa}.

09:17 [Pub][ePrint]

We develop a technique inspired by pseudorandom functions that allows us to increase the entropy available for proving the security of dual system encryption schemes under the Decisional Linear Assumption. We show an application of the tool to Attribute-Based Encryption by presenting a Key-Policy ABE scheme that exhibits a significant improvement over the state of the art schemes in public parameter size in terms of the number of reuses of attributes allowed in the policy while remaining fully secure under the Decisional Linear Assumption.

09:17 [Pub][ePrint]

Encryption algorithms are designed to be difficult to break without knowledge of the secrets or keys. To achieve this, the algorithms require the keys to be large, with some algorithms having a recommend size of 2048-bits or more. However most modern processors only support computation on 64-bits at a time. Therefore standard operations with large numbers are more complicated to implement. One operation that is particularly challenging to implement efficiently is modular reduction. In this paper we propose a highly-efficient algorithm for solving large modulo operations; it has several advantages over current approaches as it supports the use of a variable sized lookup table, has good spatial and temporal locality allowing data to be streamed, and only requires basic processor instructions. Our proposed algorithm is theoretically compared to widely used modular algorithms, before practically compared against the state-of-the-art GNU Multiple Precision (GMP) large number library.

09:17 [Pub][ePrint]

The well-known classical constructions of garbled circuits use four

ciphertexts per gate, although various methods have been proposed to

reduce this cost. The best previously known methods for optimizing AND

gates (two ciphertexts; Pinkas et al., ASIACRYPT 2009) and XOR gates

(zero ciphertexts; Kolesnikov & Schneider, ICALP 2008) were

incompatible, so most implementations used the best known method

compatible with free-XOR gates (three ciphertexts; Kolesnikov &

Schneider, ICALP 2008). In this work we show how to simultaneously

garble AND gates using two ciphertexts and XOR gates using zero

ciphertexts, resulting in smaller garbled circuits than any prior

scheme. The main idea behind our construction is to break an AND gate

into two half-gates --- AND gates for which one party knows one

input. Each half-gate can be garbled with a single ciphertext, so our

construction uses two ciphertexts for each AND gate while being

compatible with free-XOR gates. The price for the reduction in size is that the evaluator must perform two cryptographic operations per AND gate, rather than one as in previous schemes. We experimentally

demonstrate that our garbling scheme leads to an overall decrease in

time (up to 25%), bandwidth (up to 33%), and energy use (up to 20%)

over several benchmark applications. We also initiate a study of lower bounds for garbled gate size, and show that our construction is optimal for a large class of garbling schemes encompassing all known practical garbling techniques.

09:17 [Pub][ePrint]

We build the first public-key broadcast encryption system that simultaneously achieves adaptive security against arbitrary number of colluders, has small system parameters, and has a security proof based on non-interactive falsifiable assumptions. Our scheme is built from composite order multilinear maps and enjoys a ciphertext overhead, private key size, and public key size that are are all poly-logarithmic in the total number of users. Previous broadcast schemes with similar parameters are either proven secure in a weaker static model, or rely on powerful tools such as program obfuscation and involve non-falsifiable assumptions.

09:17 [Pub][ePrint]

Recent revelations by Edward Snowden show that a user\'s own hardware and software can be used against her in various ways (e.g., to leak her private information). And, a series of recent announcements has shown that widespread implementations of cryptographic software often contain serious bugs that cripple security. This motivates us to consider the following (seemingly absurd) question: How can we guarantee a user\'s security when she may be using a malfunctioning or arbitrarily compromised machine? To that end, we introduce the notion of a cryptographic reverse firewall (RF). Such a machine sits between the user\'s computer and the outside world, potentially modifying the messages that she sends and receives as she engages in a cryptographic protocol.

A good reverse firewall accomplishes three things: (1) it maintains functionality, so that if the user\'s computer is working correctly, the RF will not break the functionality of the underlying protocol; (2) it preserves security, so that regardless of how the user\'s machine behaves, the presence of the RF will provide the same security guarantees as the properly implemented protocol; and (3) it resists exfiltration, so that regardless of how the user\'s machine behaves, the presence of the RF will prevent the machine from leaking any information to the outside world. Importantly, we do not model the firewall as a trusted party. It does not share any secrets with the user, and the protocol should be both secure and functional without the firewall (when it is implemented correctly).

Our security definition for reverse firewalls depends on the security notion(s) of the underlying protocol. As such, our model generalizes much prior work and provides a general framework for building cryptographic schemes that remain secure when run on compromised machine. It is also a modern take on a line of work that received considerable attention in the 80s and 90s.

We show that our definition is achievable by constructing a private function evaluation protocol with a secure reverse firewall for each party. Along the way, we design an oblivious transfer protocol that also has a secure RF for each party, and a rerandomizable garbled circuit that is both more efficient and more secure than previous constructions. Finally, we show how to convert \\emph{any} protocol into a protocol with an exfiltration-resistant reverse firewall for all parties. (In other words, we provide a generic way to prevent a tampered machine from leaking information to an eavesdropper via any protocol.)

09:17 [Pub][ePrint]

Secure 2-party computation (2PC) is becoming practical for some applications. However, most approaches are limited by the fact that the desired functionality must be represented as a boolean circuit. In response, random-access machines (RAM programs) have recently been investigated as a promising alternative representation.

In this work, we present the first practical protocols for evaluating RAM programs with security against malicious adversaries. A useful efficiency measure is to divide the cost of malicious-secure evaluation of $f$ by the cost of semi-honest-secure evaluation of $f$. Our RAM protocols achieve ratios matching the state of the art for circuit-based 2PC. For statistical security $2^{-s}$, our protocol without preprocessing achieves a ratio of $s$; our online-offline protocol has a pre-processing phase and achieves online ratio $\\sim 2 s / \\log T$, where $T$ is the total execution time of the RAM program.

To summarize, our solutions show that the extra overhead\" of obtaining malicious security for RAM programs (beyond what is needed for circuits) is minimal and does not grow with the running time of the program.

09:17 [Pub][ePrint]

Montgomery modular multiplication constitutes the \"arithmetic foundation\"

of modern public-key cryptography with applications ranging from RSA, DSA

and Diffie-Hellman over elliptic curve schemes to pairing-based cryptosystems. The increased prevalence of SIMD-type instructions in commodity processors (e.g. Intel SSE, ARM NEON) has initiated a massive body of research on vector-parallel implementations of Montgomery modular multiplication. In this paper, we introduce the Cascade Operand Scanning (COS) method to speed up multi-precision multiplication on SIMD architectures. We developed the COS technique with the goal of reducing Read-After-Write (RAW) dependencies in the propagation of carries, which also reduces the number of pipeline stalls (i.e. bubbles). The COS method operates on 32-bit words in a row-wise fashion (similar to the operand-scanning method) and does not require a \"non-canonical\" representation of operands with a reduced radix. We show that two COS computations can be \"coarsely\" integrated into an efficient vectorized variant of Montgomery multiplication, which we call Coarsely Integrated Cascade Operand Scanning (CICOS) method. Due to our sophisticated instruction scheduling, the CICOS method reaches record-setting execution times for Montgomery modular multiplication on ARM-NEON platforms. Detailed benchmarking results obtained on an ARM Cortex-A9 and Cortex-A15 processors show that the proposed CICOS method outperforms Bos et al\'s implementation from SAC 2013 by up to 57% (A9) and 40% (A15), respectively. Furthermore, our COS multiplication is faster than lastest GMP 6.0.0 by up to 55% (A9) and 52% (A15), respectively.