Hardware Implementation of Secure Lightweight Cryptographic Designs for IoT Applications

+e recent expansion of the Internet of +ings is creating a new world of smart devices in which security implications are very significant. Besides the claimed security level, the IoT devices are usually featured with constrained resources, such as low computation capability, lowmemory, and limited battery. Lightweight cryptographic primitives are proposed in the context of IoT while considering the trade-off between security guarantee and good performance. In this paper, we present optimized hardware, lightweight cryptographic designs, of 32-bit datapath, LED 64/128, SIMON 64/128, and SIMECK 64/128 algorithms, for constrained devices. Our proposed designs are investigated on Spartan-3, Spartan-6, and Zynq-7000 FPGA platforms in terms of area, speed, efficiency, and power consumption. +e proposed designs achieved a high throughput up to 891.99Mbps, 838.95Mbps, and 210.13Mbps for SIMECK 64/128, SIMON 64/128, and LED 64/128 on Zynq-7000, respectively. A deep comparison between our three proposed designs is elaborated on different FPGA families for adequate FPGAs-based application deployment. Test results and security analysis show that not only can our proposed designs achieve good encryption results with high performance and a low reduced cost but also they are secure enough to resist statistical attacks.


Introduction
e devices we use every day are becoming connected entities across the planet. e so-called IoT includes technologies combining autonomous embedded sensory objects with communication intelligence. Most of the applications in the IoT have consequently strong real-time requirements and energy limitations [1][2][3]. Moreover, the IoT can be affected by different classes of security: access to intellectual property, sabotage, espionage, and cyber terrorism in critical infrastructures such as traffic monitoring, smart cities, and Industrial Automation [4,5].
is imposes to design performant, cryptographic designs that are efficient in terms of security, computational capability, resource occupation, and power consumption. Indeed, designing cryptography systems must deal with the trade-offs between security, performance, and cost [6,7].
It is generally easy to optimize any two of the three design goals: security and cost, security and performance, or cost and performance; however, it is really difficult to optimize all three design goals at once.
Traditional secure encryption methods are indeed usually calculated intensively with large key sizes which undermine the computation capacity of IoT devices. In the context, lightweight cryptographic primitives are better alternatives while considering the compromise between security guarantee and full performance even if adapted to resource-limited devices. Hence, there is a substantial requirement for designing new lightweight encryption solutions adapted to the IoT-constrained environments [8,9]. e main focus of this work is to propose an optimized hardware implementation of lightweight cryptographic designs and examine the effect on hardware architectures, the area, power, efficiency, and performance of hardware implementations on low-cost Xilinx FPGA platforms. ree different hardware architectures of LED 64/128, SIMON 64/ 128, and SIMECK 64/128 algorithms have been proposed in this study. e security level is evaluated by implementing our designs on diverse types of images. en, test results and security analysis of the suggested designs are elaborated for attack-resistance proofs.
To the best of our knowledge and based on literature review, this work sets the best performances of hardware lightweight cryptographic cipher architectures. e architectures we have proposed are implemented with 32-bit datapath on different platforms for an adequate device choice where FPGAs are deployed. Furthermore, we quantify the cost of our proposed 32-bit datapath architectures and show the trade-off between the area, throughput, efficiency, and power consumption. e robustness of the introduced lightweight cryptographic designs is shown by implementing it on several types of images. A detailed security analysis has been provided using visual testing, information entropy, and correlation coefficient analysis. e remainder of this paper is organized as follows: Section 2 discusses previous works related to lightweight cryptographic designs.Section 5 presents the results of hardware implementation on different FPGAs platforms. e obtained results are compared to the state of the art as well as against each other. Security analysis of the elaborated designs is achieved to demonstrate robustness against possible attacks. Section 6 concludes this study.

Related Works
In recent years, there was a quick advancement of research and development of lightweight cryptography for implementation on devices with limited resources in IoT environments. e principal objective is to design and employ ultralightweight cryptographic algorithms that can be used in such applications while proving desired security levels.
Generally, cipher implementations targeted for low-resource applications are classified into software and hardware implementation. In the case of software, implementation required memory size of embedded software is considered, for the hardware implementation area, speed, and power consumption are taken into account. ese constraints must be respected when it comes to choosing the appropriate security algorithm to be used for resource-limited devices.
Miscellaneous works dealing with both implementations have been made out for lightweight cryptography implementation on constrained devices. In [10], Benadjila et al. explored general software implementations of lightweight ciphers on x86 architectures, with a specific focus on LED, PICCOLO, and PRESENT. ey propose new interesting trade-off, with a theoretical cache modeling to better predict which trade-off will be suitable depending on the target processor. Park et al. [11] proposed efficient parallel implementation methods of the SIMECK family block cipher using an Intel AVX2 (Advanced Vector Extension 2) SIMD and an efficient adaptive encryption method to enhance human care service availability based on the SIMECK family block cipher AVX2-optimized implementations which support different data block sizes. In another work, high software performance implementation of SIMON and SPECK is achieved on the AVR family of 8-bit microcontrollers [12]. Kim et al. [13] investigated lightweight features of HIGHT block cipher and presented the optimized implementations of both software and hardware for low-end IoT platforms, including resource-constrained devices (8-bit AVR and 32-bit ARM Cortex-M3) and application-specific integrated circuit (ASIC).
Other existing researches have focused on hardware and lightweight cryptography. Diehl et al. [14] implement six ciphers, AES, SIMON, SPECK, PRESENT, LED, and TWINE, in hardware using register transfer level (RTL) design and in software using the custom reconfigurable processor.
ese implementations are instantiated in identical Xilinx Kintex-7 FPGAs, enabling direct comparison of throughput, area, throughput-to-area (TP/A) ratio, power, and energy.
Another research presented by Abed et al. [15] proposes implementing, optimizing, and modeling SIMON cipher design for low-resource devices, with an emphasis on energy and power, which are critical metrics for low-resource devices. Several pipelined FPGA implementations of the SI-MON 32/64 lightweight cipher were designed and tested with different numbers of hardware rounds per cycle by many scholars.
Ahir et al. [16] proposed reliable and efficient error detection architectures for the block ciphers SIMON and SPECK with acceptable area and power consumption overheads. e fault injection simulations are performed to fix the error detection capabilities of the proposed architectures implemented on the Zynq-7000 FPGA platform. Beaulieu et al. [17] discussed FPGA performance comparisons of SIMON, SPECK, and PRESENT on low-cost Xilinx Spartan-3 FPGAs. In this article, the authors presented the sort of performance that is achieved by SIMON and SPECK on a broad range of existing software and hardware platforms compared to AES and PRESENT.
In another work [18], Dahiphale et al. proposed, implemented, and evaluated the five most efficient datapaths of different data bus sizes of RECTANGLE cipher. All proposed solutions are implemented on different FPGA platforms with the same implementation conditions and the results are compared on every performance metric.
Almost all cited works are interested in optimizing the software or hardware implementation for low area occupation, high-speed calculation, high throughput, or other metrics, but in any work, all performances are respected at the same time neither with a reasonable security level guaranty.

Proposed Lightweight
Cryptographic Architecture Table 1 presents lightweight cryptography algorithms' cipher specifications: the block/key size (bits), datapath, and the number of rounds.

LED-128.
e Light Encryption Device (LED) is a 64-bit block cipher based on a substitution-permutation network (SPN). LED is a 64-bit block cipher that can handle key sizes from 64 bits up to 128 bits. We denote by LED-x the LED block cipher version that handles x-bit keys [19].
e key schedule of LED is extremely simple as it is almost inexistent, which presents obvious advantages in hardware. is simplicity is also very welcoming for security proofs as we can derive some even for the related-key model. e idea is to just reuse the original key material as is but several times during the computation.
For a 128-bit key, the secret material is divided into two keys K1 and K2 that are repeatedly and alternatively XORed to the internal state every four rounds of the internal permutation as shown in Figure 1.
e keyed permutation of the LED algorithm is largely inspired by the Advanced Encryption Cipher (AES) structure. Namely, the internal state can be viewed as a 4 × 4 matrix of 4-bit cells. One round is described by four functions (see Figure 2): is function applies round-dependent constants to each cell of the two first columns.
(ii) SubCells. is function applies a 4-bit S-box to every cell of the internal state. We chose to use the very small 4-bit PRESENT cipher S-box. (iii) ShiftRows.
is function simply rotates each cell located at row i by i positions to the left. (iv) MixColumnsSerial.
is function updates linearly all columns independently. e matrix underlying the MixColumnsSerial layer is Maximum Distance Separable (MDS) to provide maximal diffusion.
In this work, the LED-128 is applied to an internal permutation of 48 rounds. e serialized architecture of LED-128 is described in Figure 3. It contains two registers reserved for the 128-bit key and the 64-bit message, multiplexers (MUX 4/1 and MUX 2/1 on 32 bits and another 2/1 on 64 bits), 5 XOR operations on 32 bits each, a 32-bit S-box substitution function, and a ShiftRows function applied to a data block of 64 bits only. e 32-bit serial architecture allows all data (message + key) to be loaded in parallel in 32-bit blocks through two "DATA_In" and "Key_in" inputs. is task requires four clock cycles to load all initialization data (128/32 � 4).

SIMON 64/128
. SIMON is one of the recently published lightweight block ciphers from the National Security Agency (NSA) in June 2013 [20]. e SIMON family of lightweight block ciphers is defined for word sizes n � 16, 24, 32, 48, and 64 bits. e key is composed of m n-bit words for m � 2, 3, and 4 (i.e., the key size mn varies between 64 and 256 bits) depending on the word size n. e block cipher instances corresponding to a fixed word size n (block size 2n) and key size mn are denoted by SIMON 2n � mn. In this work, a 32-bit word and a 128bit key are used as a cipher configuration. e SIMON block cipher family relies on Addition, word Rotation, denoted as S y (x) where y is the rotation count, and XOR although it uses AND gates instead of additions. e round functions of SIMON are shown in Figure 4. For encryption, the SIMON round function can be expressed as (1) For decryption, its inverse is where l is the left-most word of a given block, r is the rightmost word, and k is the appropriate round key. e SIMON key schedule function takes the master key and generates a sequence of T key words (k 0 , k 1 , k 2 , . . . , k T−1 ), where T represents the number of rounds. ere are three different versions of the key schedule function, depending on the block size and master key size. In our case, from the initial 128-bit master key, the key schedule generates 44 32-bit sized round keys.
e key schedule function performs two circular shift operations to the right (shift right one and shift right three). e result is XORed with a fixed constant, c, and a constant sequence, zj. ere are five sequences for the constant zj, which are version-dependent (i.e., z 0 , z 1 , z 2 , z 6 and z 4 ). e key expansion function can be expressed as where e key schedule employs the constant c � 2 n − 4 � 0xFF, . . . , FC (where n � 32 represents the word size parameter). e round function architecture is composed of two 32bit size data registers, a 2-input, single-output 32-bit multiplexer, and a combinational circuit containing three 1-bit, 8-bit cyclic shift registers and 2 bits, one AND logic gate, and three XOR logic gates. e results of this circuit are one of the entrances to MUX 2/1 as presented in Figure 6(a). e 128-bit key generation ( Figure 6(b)) architecture is composed of 4 blocks of subkeys of 32 bits (key a, key b, key c, and key d), a MUX 2/1 reserved for inputs, and a combinational circuit with (2n + 1)XOR + (n − 1)XNOR.
Each instance of SIMON uses the familiar Feistel rule of motion. e algorithm is engineered to be extremely small in hardware and easy to serialize at various levels. It is supposed to be more hardware-oriented.  [21]. e round function and the key schedule algorithm follow the Feistel structure. e round function of SIMECK is given in Figure 7, where r i and l i are, respectively, right word and left word. k i denotes the ith round key. e ciphertext is the internal state after T rounds. e SIMECK family of block cipher encryption and decryption round functions has ARX: the bitwise AND ( ⊙ ), rotation (rotation left, ROLr(), and exclusive-OR (⊕)    x i+1 x i+2 x i+1 Figure 4: Feistel stepping of the SIMON round function. e round function (of the ith round) is defined as follows: e function f is defined as Figure 8 shows the SIMECK family block cipher key schedule as a block diagram. To generate the round key ki from a given master key K, the master key K is first segmented into four words and loaded as the initial states (t 2 , t 1 , t 0 , k 0 ) of the feedback shift. First, the least significant n-bits of K are loaded into k 0 , while the most significant n-bits are put into t 2 .
e SIMECK round function R C⊕(zj)i with a round constant C ⊕ (zj)i acts as the round key during each round. e combinational circuit (dashed box in above) in the key schedule of SIMECK in the parallel architecture is composed by (n + 1)XOR + (n − 1)XNOR + n AND.
Our lightweight block cipher family SIMECK is denoted by SIMECK2n/mn, where n is the word size and n is required to be 16, 24, or 32, while 2n is the block size and mn is the key size. SIMECK has three instances; in this work, we focus on the SIMECK 64/128 (see Figures 9 and 10). e combinational circuit (dashed box in above) in the key schedule of SIMECK in the parallel architecture is composed of (n + 1)XOR + (n − 1)XNOR + n AND.
SIMECK is supposed to perform exceptionally well in both hardware and software. e change in the rotations and the key schedule allow an improved hardware implementation. Table 2 shows the complexity of our proposed lightweight cryptographic designs.

Hardware Implementation.
In this section, we provide an overview of our proposed hardware implementation results. e area, speed, efficiency, and power consumption performances of the proposed designs are obtained from the implementation of our VHDL code using Xilinx ISE Design Suite 14.7. e areas of the block cipher implementations on FPGA are compared using slices, flip-flops, and lookup tables (LUTs), which are the basic logic block of Xilinx FPGAs. Latency, maximum frequency, and throughput together determine the speed of execution. Efficiency represents throughput-to-area ratio to meet lightweight application requirements.
To get a good insight into the efficiency and performance, our elaborated designs are implemented on three different Xilinx FPGAs: Spartan-3 (XC3S50-5), Spartan-6 (XC6S16-3), and Zynq-7000 (xc7z010-3) families are used as target platforms. e proposed designs have been tested after place and route using simulation to ensure the right functionality.     Table 3 lists the results of our three proposed 32-bit datapath designs. We can conclude that, when implemented in Spartan-3 FPGA, SIMECK is the smallest block cipher implemented with only 399 LUTs plus FFs. is is due to the number of rounds and the change in the rotations and the key schedule. Furthermore, the parallel architecture processes one round of the message in one clock cycle and one round of the key schedule at the same clock cycle. is allows for improved hardware implementation. SIMON is the second smallest block cipher with 416 LUTs plus FFs. Our proposed SIMON and SIMECK designs are very close in terms of throughput and efficiency. e estimated power consumption is very closer to the three proposed architectures.

Security and Communication Networks
When using the Spartan-6 FPGA platform, LED is the least consumed algorithm with 452 LUTs plus FFs. e main area cost for SIMON and SIMECK comes from the registers storing the message block and the key. However, SIMON and SIMECK provide better throughput, efficiency, and power consumption.
Unfortunately, few works present results using the two used FPGAs for the three algorithms described in this study, a small number of works on LED and SIMON are made on FPGA platfroms, and not all metrics are treated which made the comparison complex. As shown in Table 3, our proposed 32-bit datapath designs provide more throughput and require less area to implement on both Spartan-3 and Spartan-6 FPGA platforms compared to the state of the art.
In [22][23][24], the authors use only generic components such as FFs, LUTs, maximum frequency, and throughput. In fact, other design parameters, a trade-off between area and throughput representing the efficiency, and power consumption have to be considered.
No implementations have been undertaken to date to the best of our knowledge for the three studied algorithms on Zynq-7000 with a block size of 64 bits and a key of 128 bits. Only in [25], the authors proposed a reliable hardware architecture for SIMON 48/96 block ciphers by using time redundancy concurrent error detection techniques. ey claim that their proposed design has acceptable overheads with very high error coverage. However, the obtained results are very poor and are not considering the constraints of the devices, such as ciphers or lightweight ciphers, where cost, power consumption, energy, and available resources are limited. For this reason, comparison cannot go ahead.
To get the overheads, we compare the implementation results obtained from our proposed lightweight cipher architectures on different FPGAs families. Depending on the design metrics, we can choose the adequate lightweight architecture suitable for the need of the application such as FPGAS-based RFID tags [26] or FPGAS-based wireless sensor nodes [27].
From Figure 11, LED-128 architecture requires less area when implemented on Spartan-6 and Zynq-7000 platforms. SIMECK 64/128 provides better area occupation with 399 LUTs and FFs when implemented on Spartan-3 FPGA.
As shown in Figure 12, we noticed that the Zynq-7000 platform is well suited for resource-constrained environments with high throughput requirements. It provides throughput up to 891.99 Mbps, 838.95 Mbps, and 210.13 Mbps for SIMECK 64/128, SIMON 64/128, and LED 64/128 on Zynq-7000, respectively. SIMECK 64/128 architecture provides the best throughput among the proposed architectures when implemented on the three FPGAs.
From Figure 13, we can conclude that the Zynq-7000 platform is optimized for a good throughput-to-area ratio. We notice also that SIMECK 64/128 is the best suited to meet lightweight application needs when efficiency is considered with 1.73 Mbps/slice on Spartan-3 and 4.45 Mbps/ slice on Zync-7000. SIMON 64/128 efficiency presents the highest efficiency with 1.81 Mbps/slice when implemented on Spartan-6 FPGA.
For battery-operated devices, Spartan-3 FPGA is preferable over Spartan-6 and Zynq-7000 platforms as its power consumption is the least with 2 mW for LED-128 and 3 mW for both SIMON 64/128 and SIMECK 64/128. SIMON 64/ 128 and SIMECK 64/128 architectures provide a far lower power consumption compared to LED-128 when implemented on Spartan-6 with 31 mW and 27 mW, respectively, as depicted in Figure 14.

Security Analysis
In this work, statistical analysis has been performed to demonstrate the superior confusion and diffusion properties of the proposed lightweight cryptographic designs against statistical attacks. is is done by performing a series of tests: histogram analysis of the encrypted images, correlation computation of the adjacent pixels in encrypted images, and information entropy calculation [28].

Histograms of Encrypted
Images. In this current section, we apply our introduced lightweight cryptographic designs on several types of images to test their robustness. ree well-known 8-bit greyscale images, Baboon, Barbara, and Lena, with a resolution chart (256 × 256) are tested as plain images. e plain images, encrypted Images using the three proposed cryptographic designs of LED, SIMON, and SIMECK algorithms, and their corresponding histograms are presented in Figures 15-17. As can be shown, there is no perceptual similarity between original images and their encrypted equivalents.
As known, the uniform distribution of intensities after the encryption is an indication of desired security. We can see that provided histograms are almost uniform and are significantly different from those of the three original images. us, the obtained encrypted images respond well to the diffusion properties and the attacker with the histogram analysis of the encrypted images cannot acquire information from the original images. Furthermore, the results of the histograms of all encrypted images using SIMOM and SIMECK lightweight algorithms are fairly uniform compared to the LED algorithm.

Correlation Coefficient Analysis.
e other statistical test consists of computing the correlation between adjacent pixels [29]. e coefficient of correlation for each pair is obtained using where the gray values of any two neighboring pixels of an image are denoted by X and Y; V(.) the variance and Cov (.) the covariance. is method consists of randomly selecting and calculating the correlation coefficient of adjacent pixels (vertical, horizontal, and diagonal) from the original and the encrypted images separately. In the best case, the correlation coefficient of the original image is equal to one, and the correlation coefficient of the encrypted image is equal to zero. Table 4 shows the results of horizontal, vertical, and diagonal neighboring pixel correlation coefficients computations of the plain images and the corresponding encrypted images. e above cases show that the values of correlation coefficients of our proposed lightweight cryptographic designs are very close to zero between adjacents. Any linear dependencies are kept between observed pixels in all three directions, which make our designs secure against correlation attacks. Zynq-7000  randomness or uncertainty in communication theory [30]; it is defined as follows:

Information Entropy Analysis. Entropy in the information-theoretic sense is a statistical measure of
where X is a discrete random variable, p(xi) is the probability density function of the occurrence of the symbol x.
An 8-bit greyscale image can achieve a maximum entropy of 8 bits. From the results in Table 4, it can be seen that the entropy of all encrypted images is close to maximum, depicting an attribute of the algorithm (see Table 5).
As concluded, the obtained information entropy values of the ciphered images are close to the theoretical value of 8.
erefore, it is difficult to conduct a successful attack against our proposed cryptographic designs.

Conclusion and Future Work
e Internet of ings (IoT) has become pervasive, with many resources constrained and tiny devices deployed on a large scale and communicating wirelessly with each other and with the Internet at large. Regarding security needs and limited resource properties, the lightweight cryptography is applied to solve this problem.
is article presents hardware implementations and a comparison of three 32-bit datapath lightweight cryptographic designs for LED 64/128, SIMON 64/128, and SIMECK 64/128 algorithms. All implementations' results were compared fairly with previously published works on different FPGA platforms. A deep study of hardware performances and optimizations of lightweight cryptography is elaborated. Better outcomes, compared to the state of the art, were noticed with a low area occupation, high throughput, good efficiency, and low power consumption.
Besides, experimental tests have been carried out with detailed numerical analysis, which shows the robustness of our proposed designs against statistical attack (visual testing). Performance evaluation tests demonstrate that the proposed encryption designs are sufficiently secure against attacks.
As a future work, it will be very interesting to harden our proposed cryptographic designs against possible sidechannel attacks such as power analysis and fault injection. In addition, instruction set extensions for lightweight cryptography can be an attractive design option for embedded systems which have a need for security.

Data Availability
e obtained results used to support the findings of this study are available from the corresponding author upon request.