Reconfigurable Architecture for Elliptic Curve Cryptography Using FPGA

The high performance of an elliptic curve (EC) crypto system depends efficiently on the arithmetic in the underlying finite field. We have to propose and compare three levels of Galois Field GF(2), GF(2), and GF(2). The proposed architecture is based on Lopez-Dahab elliptic curve point multiplication algorithm, which uses Gaussian normal basis for GF(2) field arithmetic. The proposed GF(2) is based on an efficient Montgomery add and double algorithm, also the Karatsuba-Ofman multiplier and ItohTsujii algorithm are used as the inverse component. The hardware design is based on optimized finite state machine (FSM), with a single cycle 193 bits multiplier, field adder, and field squarer. The another proposed architecture GF(2) is based on applications for which compactness is more important than speed. The FPGA’s dedicated multipliers and carry-chain logic are used to obtain the small data path. The different optimization at the hardware level improves the acceleration of the ECC scalar multiplication, increases frequency and the speed of operation such as key generation, encryption, and decryption. Finally, we have to implement our design using Xilinx XC4VLX200 FPGA device.


Introduction
Many hardware designs of elliptic curve cryptography have been developed, aiming to accelerate the scalar multiplication processes, mainly those based on the field programmable gate arrays (FPGAs).The application field of FPGAs has clearly outgrown the prototype-only use.More and more FPGA implementations are in an environment which used to be ASIC-only territory.When these applications are implemented on an FPGA, they need secure data communication.In this rapidly changing environment, the reconfigurability of an FPGA is a very useful feature which is not available on an ASIC.Secure public key authentication and digital signatures are increasingly important for electronic communications and coerce, and they are required not only on high powered desktop computers, but also on smart cards and wireless devices with severely constrained memory and processing capabilities.Cryptography offers a robust solution for IT security services in terms of confidentiality, data integrity, authenticity, and nonrepudiation.In fact, security deals mainly with the ability to face counterattacks [1], while speed and area, which represent the eternal trade-off, concern the ability to make intensive cryptographic processes, while keeping used hardware as low as possible.In other words, it is the ability to embed a strategic and strong algorithm in very few hardware, that is, finding an optimal solution to the one to many problem: portability against power consumption, speed against area, but the main issue in cryptography is security.
In the last decade, the approach of hardware implementing elliptic curve cryptography (ECC) algorithm knew a very concentrated contest, due essentially to the requirements of security, speed, and area constraints.Cryptography has become one of the most important fields in our life, due essentially to two factors: increase in secrecy and increase in breaking code or hackers in the other side.Organizations tend to increase their benefits by keeping their information system as transparent as possible.On the other hand, hackers and code or key breakers are being organized in kind of unofficial groups; this leads to being a step ahead before getting the codes breakdown.Scientists are tending to complicate the ) end for (5) end for (6) return  Algorithm 1: Bit-level multiplication algorithm for GF(2  ).
reverse engineering process of the encryption system, at the same time, keeping encryption keys as low as possible.This issue is being tackled by many mathematics, mainly those working on elliptic curves [2].
Elliptic curve cryptosystems possess a number of degrees of freedom like Galois field characteristic, extension degree, elliptic curve parameters, or the fixed point generating the working subgroup on the curve.The beauty of this new field is potentially related to the simplicity of the operators used in the encryption process, to the nonsecure transmission constraints used in the exchange of the keys and to the enhanced complexity that might face hackers when unwanted information goes out of the organization.This paper focuses on the high-performance comparisons of hardware design of ECC over GF (2 163 ), GF (2 193 ), and GF(2 256 ) (Table 4).

Elliptic Curve Cryptography
In 1985, Koblitz and Miller introduced the use of elliptic curves in public key cryptography called Elliptic curve Cryptography (ECC).Basically, the main operation of elliptic curves consists of multiplying a point by a scalar in order to get a second point; the complexity arises from the fact that, given the initial point and the final point, the scalar could not be deduced, leading to a very difficult problem of reversibility, or cryptoanalysis, called also the elliptic curve discrete logarithm problem [1].
The ECC algorithms with their small key sizes present nowadays the best challenge for cryptanalysis problems compared to RSA or AES, thus dealing with ECC will lead to smaller area hardware, less bandwidth use, and more secure transactions.
The attractiveness of ECC algorithms is that they operate on a Galois Field (GF), by means of two simple operations, known as the field addition and field multiplication, which define a ring over GF(  ) where  and  are primes.In the particular case, where we deal with hardware designs, a binary field is preferred, where the couple, (, ), defines the set of elliptic curves.In this work,  = 2 and  = 163, 193, and 256.
In this paper, we propose a high-performance elliptic curve cryptographic processor over GF(2  ), that is, GF(2 163 ), GF (2 193 ), and GF (2 256 ).The proposed architecture is based on a modified Lopez-Dahab elliptic curve point multiplication algorithm and uses GNB for GF(2  ) field arithmetic.Three major characteristics of the proposed architecture use fast arithmetic units based on a word-level multiplier which adopts a parallelized point doubling and point addition unit with uniform addressing mode to utilize the benefits of GNB representation.Therefore, the proposed architecture leads to a considerable reduction of computational delay.The proposed architecture has the feature of modularity and a simple control structure; it is well suited to VLSI implementations (see Algorithm 1).
In this research, we present a hardware design of the elliptic curve cryptography scheme, using Montgomery scalar multiplication based on the "add and double" algorithm, targeting as a primary goal of an increase in the speed of the hardware and an optimization in the ensuing inverse component.

Hardware Design.
The strategy of hardware executing the ECC algorithms reposes on the ability of making the scalar multiplication in the GF(2  ) in a very few clock [1].While increasing , implementations become very time and resource consuming.Most of the known architectures concern the acceleration of the multiplication process by modifying the elliptic equations by changing the  coordinate term [3], or by multiplication scalability [4], or by using many serial and parallel Arithmetic units [5], or using High parallel Karatsuba Multiplier [6], those based on the Massy-Omura multipliers, or the work based on a hybrid multipliers approach, also some parallel approach approaches, or the new word level structure, or through the systolic architecture, or by using the half and add method, or by parallelizing both the add and double Montgomery algorithms [7].
The second problem concerns the inversion based on the Fermat little theorem, or the almost inverse algorithm based on Kali ski's research [8].In order to concentrate on one of the problems, some modifications have been done on the ECC equations in order to postpone inversion to the last stage, while dealing only with the multiplication process.

Elliptic Curve Mathematical Background.
ECC is based on the discrete logarithm problem applied to elliptic curves over a finite field.In particular, for an elliptic curve  that relies on the fact, it is computationally easy to find that where  and  = Points of the elliptic curve  and their coordinates belong to the underlying GF(2  ),  = A scalar that belongs to the set of numbers {1 ⋅ ⋅ ⋅ #−1},  is the order of the curve .
Elliptic Curve Key Generation.Let  be an elliptic curve defined over a finite field GF(2  ).Let  be a point in (GF(2  )), and suppose that  has prime order .Then, the cyclic subgroup of (GF(2  )) generated by  is   = {∞, , 2, 3, . . ., ( − 1)}.The prime , the equation of the elliptic curve , and the point  and its order  are the public domain parameters.A private key is an integer  that is selected uniformly at random from the interval [1,  − 1], and the corresponding public key is  =  (see Algorithm 2).An encryption is described in what follows. Encryption.
(1) User A-Alice first selects a random generator point (, ) lying on the elliptic curve.
(2) Message () to be encrypted is coded on to an elliptic curve point  = (, ).
(3) Alice selects a random private key "" and then computes the public key as  =  (, ) . ( (4) To encrypt her message, Alice uses her private key and Bob's (user B) public key.
(5) The encrypted message denoted by  is created as follows: The sender transmits the points {, ( +  ⋅ )} to the recipient who uses her private key  to compute  = ( ⋅ (, )) = ( ⋅ (, )) =  ⋅ , where  is the public key of the recipient.The algorithm for an encryption is described in the following.
As it can be seen from the previous algorithm, point multiplication plays a major role during the encryption process.The same hold during decryption too.The encrypted message is then communicated to the receiver.The receiver, Bob, then decrypts the message using the decryption mechanism [9].
A decryption at the receiver end is as follows.
Decryption.When Bob receives the encrypted message, he first multiplies the public key of Alice, which happens to be the first point in the encrypted message with his private key NB.
(1) When Bob receives the encrypted message, he first multiplies the public key of Alice, which happens to be the first point in the encrypted message with his private key .
(2) The result of this is then subtracted from the second point, the cipher text.
(3) This gives him the original message .

Importance of Elliptic Curve
Cryptography.There are several criteria that need to be considered when selecting a family of public key schemes for a specific application.The principal ones are as follows: (1) functionality, (2) security, (3) performance.
Measuring the Efficiency of Algorithms.The efficiency of an algorithm is measured by the scarce resources it consumes.Typically, the measure used is time, but sometimes other measures such as space and number of processors are also considered.It is reasonable to expect that an algorithm consumes greater resources for larger inputs, and the efficiency of an algorithm is therefore described as a function of the input size.Here, the size is defined to be the number of bits needed to represent the input using a reasonable encoding.
In the affine coordinate representation, a finite field point on GF(2  ) is specified by two coordinates  and  both belonging to GF(2  ) satisfying (4).The point at infinity has no affine coordinates.
In most ECC hardware designs, the choice of using three coordinates responds on avoiding the periodic division of (5), which consumes a lot of resources in terms of execution cycles, as well as memory and power consumption.
The advantage here is, Bob's private key is only known to him and not to anyone else and therefore only Bob can extract the original message by subtracting the product of his private key and Alice's public key with the second point [7].
Nowadays, there is no known algorithm able to compute  given  and  in a subexponential time.The equation of a nonsupersingular elliptic curve with the underlying field GF(2  ) is presented in (4).It is formed by choosing the elements "" and "" within GF(2  ) with A point is converted from a couple of coordinates to a triple system of coordinates using one of the transforms of Thus, a point (, ) is mapped into (, , ), that is; a third projective coordinate is introduced in order to "flatten" the equations and avoid the division.Projective coordinates allow us to eliminate the need for performing inversion.The startup transformation required for the design is simply done by initializing , , and  as follows: [10] { = ◻,  = ◻,  = ◻1◻◻◻◻◻◻◻} . ( Introducing the new tricoordinates into (4) becomes The VHDL implementation will be based now on (7).After completion of the successive operations of addition and multiplication, back to two affine coordinates as follows: In order to make the different computations, the Montgomery point doubling and Montgomery point addition algorithms are used, mainly through the ingenious observation of Montgomery, which states that the  coordinate does not participate into the computations and can be delayed to the first stage and it working with two projective coordinates [11] (see Algorithm 3).
In the Decryption, () function is the point addition operation on the elliptic curve, () is the point doubling computation, and () is the conversion of projective coordinates to affine coordinates.The reader is referred to (Lopez and Dahab 1999) [12] for detailed explanation.Function (), (), and () in the Decryption are defined as follows: return(, ); }.
Requiring 1 field squaring operations, 4 field multiplications, One has and two simple field additions.

𝑀𝑑𝑜𝑢𝑏𝑙𝑒(𝑥
In these functions, (, ) is the coordinate of the original point , which is fixed during the calculation of ; (  ,   ) is the coordinate of  ⋅  is represented on an m bits register.The three basic functions in turn rely on finite field operations such as addition, multiplication, and inversion.
The inversion in GF (2 193 ), required at the final stage, could be realized in one of the two known methods, either via the extended Euclidean algorithm, or by Fermat's theorem which states that knowing after proof that  2  − 1 = 1 leads to consider that  −1 =  2 −2 is also factual.Thus, in order to compute the inverse of one element in GF (2 193 ), one needs to take the power of this element (2 193 -2) times.By using the Itoh-Tsujii algorithm based on the add and multiply Method leads to realize the inverse as presented in Table 3 [13].
C. ECC Components.The ECC processor shown in Figure 1 consists of eight main components.Eight components are host interface (HI), data memory, register file, instruction memory, control-1, control-2, AU-1, and AU-2.The HI communicates with host processor.Processor transmits all parameters for  to HI with start signal and receives "" results and end signal.In the proposed work, we have used Intel 32bit processor as host processor.Therefore, HI is 32-bit interface.The HI transmits all parameters for computing elliptic curve point doubling and point addition to register file and receives "" results from data memory.The data memory consists of 8 × 163-bit dual port block RAM and the instruction memory contains 13 microcode sequences of 11-bits word, respectively.Thus, these two block memories are to compute coordinate conversion.For high-performance implementation of point doubling and addition, we add The 256 bits ECC components have developed a system describes the architecture of the most crucial component in ECDSA, namely that the one that implements elliptic curve operations and modular operations over a finite field GF().For this purpose, a flexible, yet compact, elliptic curve processor is developed for applications where speed is of minor importance.The processor is optimized for FPGA families that were introduced to the market from the year 2003 on.But that are still used in many new products.These FPGAs also contain some Hard-IPs (HIPs).The presented processor uses Block RAM and multiplier HIPs, which are available on the majority of FPGA devices.The proposed system is the design of hardware that executes the ECC algorithm that reposes on the ability of making the modular operation over the GF (2 256 ).The research was based on using the efficient Montgomery ladder algorithm, ECDSA algorithm for EC point multiplication.In this system achieved compact architecture.In Figure 2 the efficient implementation of finite field arithmetic in elliptic curve system.The implementation uses dedicated multiplexer on the FPGA.
In Figure 3 the elliptic curve system achieved high throughput rates.The implementation uses dedicated field adder on the FPGA.
In Figure 4 design of an efficient Field multiplexer over GF(2 163 ) using Gaussian Normal Basis.
In Figure 5 the elliptic curve system achieved high performance.The implementation uses dedicated field squarer on the FPGA.

Results and Performance Comparisons
We present the respective estimated number of cycles, required for each part of the algorithm at each stage of FSM controller (Table 1).
Working with 193 bits and 2 193 order numbers or more is not a direct way and checking the results is very bulky.In this matter, different Matlab scripts with similar input/output behavior to the VHDL programming have been written, in order to compare the execution steps, as well the final results; timing is not taken into consideration in this specific stage.

Synthesis Result of Multiplexer
Speed Grade: −6 Minimum period: no path found   Figure 6 and Table 5 show the output results of the ECC scalar multiplication for a "193 bits" arbitrary value of .
The frequency of encryption operation is 1930 MHZ and speed of operation also increases (Figure 7).It can be used in any application where security is needed but lacks the power, storage, and computational power that is necessary for our current cryptosystems.
The frequency of decryption operation is 1930 MHZ and the speed of operation also increases.It can be used in any  application where security is needed but lacks the power, storage, and computational power that is necessary for our current cryptosystems (Figure 8).
The processor is optimized for compactness on an FPGA.The implementation uses dedicated modular addition on the FPGA (Figure 9).
In Figure 10, the processor is optimized for compactness on an FPGA.The implementation uses dedicated modular subtraction on the FPGA.
In Figure 11, the processor is optimized for compactness on an FPGA.The implementation uses dedicated multiplier on the FPGA.
In Figure 12, the research was based on using the efficient Montgomery ladder algorithm, ECDSA Algorithm for EC point multiplication.In this system architecture achieved less area.

Discussion
The main contribution of the present research concerned three major points: an optimal finite state machine (FSM) controlling the whole components, minimizing empty cycles; optimization of the inversion process, by reducing the number of different squaring from 192 to 21, leading to an inversion; separation of the data path routing from the control part, in order to modify only the multiplier, the squarer, the adder and the modulo components.
The results we have obtained are very encouraging and will impact our decision on the embedding of larger encryption schemes, mainly the extension to the NIST proposed curves (233, 283, 409, and 571), taking into account the use   of two or more multipliers (tuned parallel design), the use of internal memories such as Block RAMs (optimized timing memory accesses), the speedup of the FSM, and using different ECC hardware algorithms; these optimization schemes are constrained to minimize the parallel inputs of the design and reduce routing circuitry that severely decrease efficiency, lower speed, and increase power consumption.

Results and Conclusion
We have presented the design of a fast version of EC cryptohardware based on a Finite State Machine.A new ECC processor for GF (2 163 ) is proposed in this paper.The ECC processor consists of eight main components: host interface (HI), data memory, register file, instruction memory, control-1, control-2, AU-1, and AU-2.Secondly, GF (2 193 ) design introduces a better optimization at the level of multiplier and the squaring components, which utilizes the modular inverse circuit.The main characteristics of this design are concerned with the elimination of delays between the different internal components, the minimization of the global clocking resources, and a strategic separation of the data path from the control part.
We have proposed GF (2 193 ) result, indicating that using different optimization at the design of hardware level improves efficiency (Table 2), acceleration of the ECC scalar multiplication (615384 per seconds), and the frequency that is, the frequency of scalar multiplication, encryption, and decryption operations are 1930 MHZ and speed of operation such as key generation, encryption, and decryption are also increased.It can be used in any application where security is needed but lacks the power, storage, and computational power that is necessary for our current cryptosystem.
Thirdly, GF (2 256 ), an elliptic curve processor, which is used to perform the finite field operations in an elliptic curve digital signature unit.The processor is optimized for compactness on an FPGA.The implementation uses dedicated multipliers and block RAMs on the same generation on FPGAs.In this we have achieved the most compact solution.This makes our processor suitable for applications where a small overhead for security is desirable.Security issues will play an important role in the majority of communication and computer networks of the future.As the Internet becomes more and more accessible to the public, security measures have to be strengthened.Elliptic curve cryptosystems allow for shorter operand lengths than other public-key schemes based on the discrete logarithm in finite fields.We have implemented our design using Xilinx XC4VLX200 FPGA device which uses 16,209 slices and has a maximum frequency of 143 MHz.This design is roughly 4.8 times faster than two times increased hardware complexity compared with the existing methods.

Table 1 :Field
Field operations required for the ECC operation.Occurrences in one Field operations #cycles #cycle of the FSM input arrival time before clock: 11.217 ns Maximum output required time after clock: 6.514 ns Maximum combinational path delay: no path found Logic utilization: Number of slices: 466 out of 6144 7% Number of 4 input LUTs: 932 out of 12288 7% Number of bonded IOBs: 210 out of 240 87.5% Number used as Route-thru: 254.

Figure 6 :
Figure 6: Final result of the scalar multiplication ◻.

Figure 7 :
Figure 7: Simulation result of encryption operation.

Figure 9 :
Figure 9: Simulation result of modular addition.

Table 3 :
Estimation of the FSM stages and their respective execution number of cycles.
* The symbol # stands for "Number of."

Table 5 :
The  and  coordinates of the result  = .