Elliptic Curve Cryptography (ECC), independently proposed by Miller [Mil86] and Koblitz [Kob87] in mid 80’s, is finding momentum to consolidate its status as the public-key system of choice in a wide range of applications and to further expand this position to settings traditionally occupied by RSA and DL-based systems. The non-existence of known subexponential attacks on this cryptosystem directly translates to shorter keylengths for a given security level and,
consequently, has led to implementations with better bandwidth usage, reduced power and
memory requirements, and higher speeds. Moreover, the dramatic entry of pairing-based
cryptosystems defined on elliptic curves at the beginning of the new millennium has opened the possibility of a plethora of innovative applications, solving in some cases longstanding problems in cryptography. Nevertheless, public-key cryptography (PKC) is still relatively expensive in comparison with its symmetric-key counterpart and it remains an open challenge to reduce further the computing cost of the most time-consuming PKC primitives to guarantee their adoption for secure communication in commercial and Internet-based applications. The latter is
especially true for pairing computations. Thus, it is of paramount importance to research methods which permit the efficient realization of Elliptic Curve and Pairing-based Cryptography on the several new platforms and applications.
This thesis deals with efficient methods and explicit formulas for computing elliptic curve scalar multiplication and pairings over fields of large prime characteristic with the objective of enabling the realization of software implementations at very high speeds.

To achieve this main goal in the case of elliptic curves, we accomplish the following tasks: identify the elliptic curve settings with the fastest arithmetic; accelerate the precomputation stage in the scalar multiplication; study number representations and scalar multiplication algorithms for
speeding up the evaluation stage; identify most efficient field arithmetic algorithms and optimize them; analyze the architecture of the targeted platforms for maximizing the performance of ECC operations; identify most efficient coordinate systems and optimize explicit formulas; and realize implementations on x86-64 processors with an optimal algorithmic selection among all studied cases.

In the case of pairings, the following tasks are accomplished: accelerate tower and curve arithmetic; identify most efficient tower and field arithmetic algorithms and optimize them; identify the curve setting with the fastest arithmetic and optimize it; identify state-of-the-art techniques for the Miller loop and final exponentiation; and realize an implementation on x86-64 processors with optimal algorithmic selection.
The most outstanding contributions that have been achieved with the methodologies above in
this thesis can be summarized as follows:

• Two novel precomputation schemes are introduced and shown to achieve the lowest costs in the literature for different curve forms and scalar multiplication primitives. The
detailed cost formulas of the schemes are derived for most relevant scenarios.

• A new methodology based on the operation cost per bit to devise highly optimized and
compact multibase algorithms is proposed. Derived multibase chains using bases {2,3}
and {2,3,5} are shown to achieve the lowest theoretical costs for scalar multiplication on
certain curve forms and for scenarios with and without precomputations. In addition, the
zero and nonzero density formulas of the original (width-w) multibase NAF method are
derived by using Markov chains. The application of “fractional” windows to the
multibase method is described together with the derivation of the corresponding density
formulas.

• Incomplete reduction and branchless arithmetic techniques are optimally combined for devising high-performance field arithmetic. Efficient algorithms for “small” modular operations using suitably chosen pseudo-Mersenne primes are carefully analyzed and optimized for incomplete reduction.

• Data dependencies between contiguous field operations are discovered to be a source of performance degradation on x86-64 processors. Three techniques for reducing the number of potential pipeline stalls due to these dependencies are proposed: field arithmetic scheduling, merging of point operations and merging of field operations.

• Explicit formulas for two relevant cases, namely Weierstrass and Twisted Edwards
curves over GF(p) and GF(p^2), are carefully optimized employing incomplete reduction,
minimal number of operations and reduced number of data dependencies between contiguous field operations.

• Best algorithms for the field, point and scalar arithmetic, studied or proposed in this
thesis, are brought together to realize four high-speed implementations on x86-64
processors at the 128-bit security level. Presented results set new speed records for
elliptic curve scalar multiplication and introduce up to 34% of cost reduction in
comparison with the best previous results in the literature.

• A generalized lazy reduction technique that enables the elimination of up to 32% of
modular reductions in the pairing computation is proposed. Further, a methodology that
keeps intermediate results under Montgomery reduction boundaries maximizing operations without carry checks is introduced. Optimized formulas for the popular tower GF(p)->GF(p^2)->GF(p^6)->GF(p^12) are explicitly stated and a detailed operation count that permits to determine the theoretical cost improvement attainable with the proposed method is carried out for the case of an optimal ate pairing on a Barreto-Naehrig (BN) curve at the 128-bit security level.

• Best algorithms for the different stages of the pairing computation, including the
proposed techniques and optimizations, are brought together to realize a high-speed
implementation at the 128-bit security level. Presented results on x86-64 processors set
new speed records for pairings, introducing up to 34% of cost reduction in comparison
with the best published result.

From a general viewpoint, the proposed methods and optimized formulas have a practical
impact in the performance of cryptographic protocols based on elliptic curves and pairings in a wide range of applications. In particular, the introduced implementations represent a direct and significant improvement that may be exploited in performance-dominated applications such as high-demand Web servers in which millions of secure transactions need to be generated.