Design Space Exploration for High-Speed Implementation of the MISTY1 Block Cipher

+is paper proposes 2× unrolled high-speed architectures of the MISTY1 block cipher for wireless applications including sensor networks and image encryption. Design space exploration is carried out for 8-round MISTY1 utilizing dual-edge trigger (DET) and single-edge trigger (SET) pipelines to analyze the tradeoff w.r.t. speed/area. +e design is primarily based on the optimized implementation of lookup tables (LUTs) for MISTY1 and its core transformation functions. +e LUTs are designed by logically formulating S9/S7 s-boxes and FI and {FO+ 32-bit XOR} functions with the fine placement of pipelines. Highly efficient and highspeed MISTY1 architectures are thus obtained and implemented on the field-programmable gate array (FPGA), Virtex-7, XC7VX690T. +e high-speed/very high-speed MISTY1 architectures acquire throughput values of 25.2/43Gbps covering an area of 1331/1509 CLB slices, respectively. +e proposed MISTY1 architecture outperforms all previous MISTY1 implementations indicating high speed with low area achieving high efficiency value. +e proposed architecture had higher efficiency values than the existing AES and Camellia architectures.+is signifies the optimizationsmade for proposed high-speedMISTY1 architectures.


Introduction
With the advances in high-speed wireless applications, the quest to provide secure transfer of data has been of major concern [1,2]. e efforts are underway to provide a realtime encryption solution for high data transmissions with minimum overhead in terms of power [3][4][5].
is study primarily focuses on high-speed implementations of a 64 bit MISTY1 block cipher for a wide range of applications, i.e., wireless networks, Ethernet devices, image encryption, and radio network controllers (RNCs) [6].
A 64 bit block cipher MISTY1 is an ISO standardized algorithm designed by Mitsubishi Corporation Electric Limited. It is used to handle a 64 bit block of data or less, e.g., 8 byte personal identification numbers (PINs), and is based on a provable 2 −56 probability against linear/differential cryptanalysis [7][8][9][10].
e differential/integral attacks on MISTY1 require large data as well as computational complexities making it practically infeasible for breaking the MISTY1 block cipher. e hardware architecture of MISTY1 and its major subfunctions FO and FI constitute a repetitive loop structure [11]. erefore, the MISTY1 algorithm is suitable for the implementations of resource-constrained and high-speed applications.
Finally, 2 × area-efficient MISTY1 design schemes are proposed in [17] based on the combined substitution unit and threshold throughput requirements. e architectures consist of a very low area of 1853/1546 NAND gates and are the most compact implementations to date. In addition, we analyzed the throughput values of the aforementioned studies and found that the compact MISTY1 architectures attained low throughput values, i.e., ≤500 Mbps, and are therefore unsuitable for high-speed applications [12-14, 17, 20].
Contrary to low-area cryptographic hardware architectures, high-speed encryption algorithms utilize LUTs/RAMs or optimized combinational logic for s-boxes using pipelined schemes [20][21][22][23][24][25]. In the recent era, the focus of the studies has also shifted on the efficient implementations measured in the form of throughput-to-area ratio. Owing to high-speed and efficient implementation requirements, the architecture presented in [20] utilizes FPGA RAM blocks for the implementation of S7/S9 s-boxes. However, the straightforward implementation of LUTs for S9/S7 s-boxes (given in MISTY1 specifications) and longer path delay where 4 × XOR operations are executed in a single clock cycle followed by RAM resulted in a large circuit area and reduced throughput values. e architecture presented in [21] utilizes the double-edge trigger methodology for MISTY1 high-speed pipeline implementation but has a longer path delay. Moreover, no architectural modifications/structural optimizations are made for high-speed MISTY1 implementation. On the contrary, although the MISTY1 architecture proposed in [22] achieves high speed, it costs a large area implementing a large number of pipelines. In this study, an effort has been made for high-speed and efficient MISTY1 implementation. In the last couple of years, multiple studies have been published regarding different block ciphers. In [26], researchers proposed a block cipher based on the chaotic generator and implemented it on Xilinx FPGA to prove its effectiveness. Similarly, in [27], Muthalagu and Jain took an existing block cipher algorithm and enhanced its performance to reduce the encryption time.
e unique contributions of the proposed MISTY1 n � 8-round pipelined architectures are as follows: Optimized implementation of MISTY1 S9/S7 s-boxes and transformation functions, i.e., FL, FI, FO, and 32bit XOR, by logic formulation of 4, 5, and 6 bit input LUTs for area reduction Designing of MISTY1 and its transformation functions to attribute for the distribution of parallel processing in order to obtain a highly efficient pipelined architecture High-speed exploration of 8-round MISTY1 architectures by employing SET and DET techniques is paper is organized into five sections with the introduction, i.e., Section 1, followed by optimizations/designing of LUTs for the implementation of MISTY1 transformation functions described in Section 2. Section 3 proposes 2 × high-speed MISTY1 architectures based on SET and DET pipeline schemes. FPGA implementation results/analysis are described in Section 4. Lastly, a brief conclusion is given in Section 5.

FI Function.
e optimizations made in the design/ implementation of the proposed FI function and its constituent S9 and S7 substitution functions are elaborated in Figures 1(a)-1(e). Figures 1(a) and 1(b) depict the FI function and the equivalent FI with modified S9/S7 paths, respectively. e modifications in Figure 1(b) indicate simultaneous execution of leftmost 9 bits and rightmost 7 bits where the subscripts 'L' and 'R' represent the leftmost and rightmost bits, respectively. T stands for the TRUNCATE function, and the plus sign showing the summer function is actually the XOR gate. e XOR gate with KI R is adding on the LSB side to reduce the path delay. e LSB bits are dependent on MSB bits, and the addition of KI R eliminates the dependency of MSB on LSB bits. We have optimized the LUTs of LSB bits by combining S7 and XOR gate. e hardware cost is reduced by the optimization of LUTs for both MSB and LSB sides. In the next step shown as Figure 1(c), the dotted lines of Figure 1 Table 1 as per the modified logic expressions (i.e., S9 is used in conjunction with the zero-extended XOR operation), whereas lower-left LUTs (S9-5 ∼ S9-7) can be obtained by eliminating (x 10 , x 11 , . . ., x 16 ) bits from the given expressions.
e LUTs for (S7-1 ∼ S7-3) are employed as 4 bit and 5 bit input LUTs as described in [21]. In the steps shown in Figures 1(d) and 1(e), the XOR gates of Figure 1(c) are reordered to configure S9-4, S9-8, S7-4, and S7-5 LUTs. e proposed FI function has the primary advantage of reduced LUTs and can be executed in a maximum of 4 clock cycles. Table 2 summarizes the area reduction of 66.7% and 41.3% with the proposed FI function compared to [20,22], respectively.

FO Function and 32-Bit XOR.
MISTY1 FO transformation function is appended with the 32 bit XOR operation in odd and even rounds (except for the last round) as depicted in Figure 2(a). erefore, the proposed LUT-based architecture of the FO function comprises {FO + 32 bit XOR}. Figure 2 e LUTs of the first and third section include the XOR operations, whereas the second and fourth sections comprise FI functions and XOR operations. However, the left-hand side of the second section symbolized by FI 1 is composed of (FI + XOR), whereas the right-hand side of the second section includes only the FI function. Similarly, the left-hand side of the fourth section shown as FI 3 comprises (FI + (2 × XORs)) as compared to the right-hand side XOR operation.
us, the FI function described in Section 2.1 is modified as per the design requirements of FI 1 and FI 3 as shown in Figures 3 and 4, respectively. It is evident from Figures 3(a)-3(c) and 4(a)-4(c) that changes required to incorporate XORs into the FI function will mainly require the alterations in the last part of the aforementioned FI function. erefore, new LUTs are added in the lower right part shown as S7-6 and S7-7 for FI 1 and FI 3 , respectively. In addition, S9-8 of Figure 1(e) is replaced by newly formed LUTs S9-9 and S9-10 in the lower left section of FI 1 and FI 3 functions, respectively.
A uniformly distributed LUT-based FO function and inclusion of 32 bit XOR reduce the (initial) latency as well as the pipeline requirements of proposed MISTY1 architectures. e reduction in pipelines and latency thought is not evident from the figures, yet the proposed implementation significantly reduces the area. Table 3 summarizes the area of (FO + 32 bit XOR) showing 53.3% and 44.4% reduction compared to [20,22]. e proposed FO function is based on the clock cycle operation required to execute FI 1 / FI 2 /FI 3 functions and will be explained in detail in Section 3. Figure 5 Table 4 summarizes the area for proposed MISTY1 architectures.

Design Space Exploration for High-Speed MISTY1 Architectures
3.1. Architecture 1: DET Pipeline Architecture for High-Speed MISTY1. A high-speed MISTY1 pipelined architecture is shown in Figure 6, whereas the respective FO and FI functions (only the FI 2 function is shown for reference) are depicted in Figures 7(a) and 7(b). High-speed MISTY1 comprises 8-round architecture with 5-stage and 10-stage pipelines in odd and even rounds, respectively. e number of pipelines in odd and even rounds of MISTY1 is based on the number of clock cycles required to execute FO/FI functions. A double-edge-triggered pipeline is employed with each LUT triggering on alternate clock cycles.
is reduces the pipeline requirements of the MISYT1 architecture; however, it has a path delay of 2 × LUTs as mentioned in [11]. e proposed MISTY1 architecture can process 41 × plaintexts and outputs the

Architecture 2: MISTY1 SET Pipeline Architecture for Very
High-Speed MISTY1. Very high-speed MISTY1 and its respective FO and FI functions (FI 1 and FI 3 functions are presented here for reference) employing single-edge-triggered pipelines are depicted in Figures 8 and 9. It is evident that the FI 1 function requires 4 clock cycles, whereas the corresponding FO function is executed in 9 clock 16 ----KO i3  16  ----KO i4 and I 5L  -16  ---FI 2 XOR  16  ----FI 1  14  1  56  43  -FI 2  7  1  65  34  -FI 3  7  8  56  36  7  {FO + XOR}  92  26  177  113  7  [19] FO -244 [21] FO -194 -  Mathematical Problems in Engineering 9 cycles. e pipeline registers are inserted in the FO function as well as MISTY1 architecture to synchronize LSB and MSB bits. e path delay of the SET-based pipelined architecture is 1 × LUT, and therefore, the architecture achieves very high speed. By increasing the pipeline stages, the latency, i.e., the initial ciphertext generation, increases and is found as 77 clock cycles. e proposed architecture is highly suitable for highspeed applications of the order of 40 Gbps.

Hardware Implementation Results
and Comparison e proposed MISTY1 high-speed architectures are implemented on FPGA Xilinx Virtex-7, XC7VX690T. e performance comparison/analysis is carried out with existing high-speed Camellia, AES, and MISTY1 architectures. Table 5

Conclusion
In this paper, we proposed MISTY1 8-round pipelined architectures characterizing high-speed and efficient implementations. e structural optimizations and logic modifications in MISTY1 transformation functions readily reduced the LUTs and pipeline requirements. e proposed high-speed MISTY1 architectures using the SET and DET pipeline explore the speed/area tradeoffs for FPGA implementations. e design/optimization schemes can be extended for the high-speed implementation of the KASUMI algorithm. e high-speed designs have applications in wireless sensor networks, image encryption, and network controllers.

Future Work.
is paper deals only with a high-speed MISTY1 block cipher. In the future, we shall make an en-

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.