^{1}

^{2}

^{3}

^{1}

^{3}

^{1}

^{2}

^{1}

^{2}

^{3}

Accelerating scalar multiplication has always been a significant topic when people talk about the elliptic curve cryptosystem. Many approaches have been come up with to achieve this aim. An interesting perspective is that computers nowadays usually have multicore processors which could be used to do cryptographic computations in parallel style. Inspired by this idea, we present a new parallel and efficient algorithm to speed up scalar multiplication. First, we introduce a new regular halve-and-add method which is very efficient by utilizing

The elliptic curve was first imported into the world of cryptography by Neal Koblitz and Victor Miller independently in 1985 [

The efficiency of ECC is dominated by the speed of calculating scalar multiplication. Namely, given a rational point

In constrained environments, scalar multiplication is easily implemented by “double-and-add” variant of Horner’s rule, providing binary expansion of scalar

Protecting against simple side-channel attacks (SSCA) can be achieved by recoding scalars in a regular manner, meaning that scalar multiplications are executed in the same instructions in the same order for any input value. Coron introduced a countermeasure against SSCA named “double-and-add always” algorithm [

A measurement against safe-error fault attacks performs scalar multiplication in a predictable pattern. Besides the most commonly used Montgomery-ladder algorithm [

Another increased interest-focused field of regular executing scalar multiplication is exploiting efficient curve forms that allow complete addition law. For any pair of

In [

In this paper, we provide a similar parallel implementation method using regular recoding technique which should be highly efficient by parallel processing doubling and halving operations in two different coprocessors. It can be concluded as two main contributions.

The first contribution is that we give a new regular algorithm computing halving operation called zero-less signed-digit (ZSD) halve-and-add which saves around

The second contribution concerns the new mixed-parallel algorithm. After analyzing all the algorithms in Table

Complexity comparison for

Method | Point operations | Field operations ( | ||
---|---|---|---|---|

Montgomery-D | 6 | 1418 | 2474 | |

Montgomery-H | 3 | 3495 | 6135 | |

Algorithm | 10 | 2351 | 4111 | |

Algorithm | 9 | 2118 | 3702 | |

Algorithm | 10 | 2351 | 4111 |

Montgomery-D = Montgomery double-and-add algorithm, Montgomery-H = Montgomery halve-and-add algorithm, Algorithm. 2-D (

Complexity comparison for

Algorithm | Split ( | Estimate ( | Split ( | Estimate ( |
---|---|---|---|---|

Montgomery-parallel | 67 | 1028 | 118 | 1782 |

Our mixed-parallel | 87 | 908 | 153 | 1568 |

The rest of this paper is organized as follows. In the next section, we introduce the related arithmetic knowledge of binary elliptic curves, especially on efficient

We focus on elliptic curves

Given two points

Similarly, given

From the above formulas, it is easy to notice that there are inevitable inversion operations in the base field, which would consume much time. Usually, the projective coordinate system is more welcome for its inclusion of no field inversions. In practice, various kinds of coordinate systems are already available to be used. The work in this paper prefers to exploit the state-of-the-art coordinate systems:

Efficient point representation is of great importance to accelerate scalar multiplication. Inversion in the base field takes a large amount of time; however, they are indispensable if points are represented in affine coordinate. The homogeneous projective coordinate system (also called standard projective coordinate system) is usually used to eliminate this obstacle by injecting any

The

Referring to doubling operation,

As for projective conditions, the translation between affine representation

The associated group addition

Having the above formulas, a direct thought is to combine doubling and addition formulas to obtain a formula evaluating

Let

Using this,

Twisted

Let

For the point

Among all the studied coordinate systems on binary curves, twisted

Cost comparison.

Homogeneous | Jacobian | LD | Twisted | ||
---|---|---|---|---|---|

Addition | 14 | 14 | 13 | 11 | 9 |

Mixed-addition | 11 | 10 | 8 | 8 | 7 |

Doubling | 7 | 4 | 3 | 3 | 2 |

The main ingredient we consider is a cyclic subgroup in

The most commonly used method is to solve the second equation for

When

This time we just need two steps, that is to say, solve the first equation for

As proved in [

From the algorithmic view, the halve-and-add method [

Enlightened by the treatment in halve-and-add, if we choose an appropriate number less than

If we already have the binary expression of

The first part is easily executed in the halve-and-add method; meanwhile, the second part can be performed through a double-and-add approach, in two different threads.

As far as side-channel attacks being concerned, noticing that double-and-add can be implemented using Montgomery-ladder point multiplication, Negre and Robert [

Protecting the implementation of scalar multiplication against SSCA can be achieved by many methods. Compared with unprotected implementation, algorithmic countermeasures like recoding scalars in a regular manner always sacrifice efficiency, yet may be easily mitigated by taking advantage of inherent parallelism of modern processors.

In general, point addition and doubling of elliptic curves are very different from the usual arithmetic operations, which are so complicated and time consuming that plenty of scholars have been sparing no effort to find efficient approaches to speed them up like work in this paper. As is well known, the negative of a point is a very cheap operation ensuring subtraction of points on elliptic curves being just as efficient as addition. This motivates modifying the binary method to signed-digit representations, that is to say, the scalar

Zero-less signed-digit expansion [

Let

From a security standpoint, every bit should be nonzero. When

Having known enough about ZSD expansion, we will get regular algorithms combining ZSD expansion and common binary methods to calculate the scalar multiplication. Algorithm

Algorithms

Let

Preprocessing: select a proper

So

Implementing: point multiplication can be done by concurrently implementing

Feed parameters

In the meanwhile, feed parameters

Postprocessing: a single-point addition

Parallel double-and-add/halve-and-add scalar multiplication.

Numerous standards have included NIST-recommended curves as implementation abelian groups for cryptographic protocols. The general conclusion in Tables

The theoretic complexity analysis of the four considered scalar multiplication approaches is reported in Table

For regular implementation against SSCA, the Montgomery methods and our new methods here both need

In Algorithm

In this work, we assume

For double-and-add, the Montgomery-D algorithm is so outstanding that Algorithm

One may ask why the mixed

Negre and Robert [

After analyzing each algorithm in Section

It is evident that Montgomery-D has the least cost among all the algorithms in Table

In this paper, we present a new parallel algorithm to improve the Montgomery algorithm in [

After the careful analysis of these algorithms, we could draw the conclusion that our regular halve-and-add approach, Algorithm

As a result, combining Montgomery-D and Algorithm

All data generated or analyzed during this study are included in this published article.

The authors declare that there are no conflicts of interest regarding the publication of this article.

This work was supported by the National Natural Science Foundation of China (Nos. 61872442, 61772515, 61502487, and U1936209); the National Cryptography Development Fund (No. MMJJ20180216); and the Beijing Municipal Science & Technology Commission (Project no. Z191100007119006).

_{4}-normal form for elliptic curves