We have presented a memory-less design of the advanced encryption standard (AES) with 8-bit data path for applications of wireless communications. The design uses the minimal 160 clock cycles to process a 128-bit data block. For achieving the requirements of low area cost and high performance, new design methods are used to optimize the MixColumns (MC) and Inverse MixColumns (IMC) and ShiftRows (SR) and Inverse ShiftRows (ISR) transformations. Our methods can efficiently reduce the required clock cycles, critical path delays, and area costs of these transformations compared with previous designs. In chip realization, our design with both encryption and decryption abilities has a 29% area increase but achieves 4.85 times improvement in throughput/area compared with the best 8-bit AES design reported before. For encryption only, our AES occupies 3.5 k gates with the critical delay of 12.5 ns and achieves a throughput of 64 Mbps which is the best design compared with previous encryption-only designs.

The AES algorithm has been widely used in data transmission in wireless communications [

For the portability of AES in different platforms and CMOS technologies, our AES uses pure combination logic to design the overall circuit without any memory blocks. The new proposed design methods in major transformations led to the reduction of area cost in AES but still keep the high throughput that meets the requirements of wireless communications. The experiment results show that our AES design has better performance/area ratio compared with previous designs. The remainder of this paper is organized as follows. Section

The AES algorithm for 8-bit data path that processes a 128-bit data block will take at least 160 rounds. The encryption processes perform ShiftRows (SR), SubBytes (SB), MixColumns (MC), and AddRoundKey (ARK) transformations. A separate KeyExpansion (KE) unit is required to generate the

Four kinds of transformations and one key generation unit in the AES algorithm are described as follows.

^{8}) and followed by an affine transformation (AF) over the same field. Similarly, the ISB transformation performs the inverse affine transformation (IAF) followed by the operation of MI in GF(2^{8}).

^{8}). The MC transforms each column to a new one by multiplying it with a constant polynomial

The optimization of separate transformations focuses on two major transformations, SR and MC, and their inverses, ISR and IMC. The designs of these transformations are described as follows.

Proposed combined SR/ISR architecture.

For completing the rotation sequences, several multiplexers are added in Figure

The states

^{8}) for generating output states

As shown in Figure

Proposed MC architecture.

We realized iterative AES architecture designs using TSMC 0.18 µm cell library. Figure ^{4})^{2}) and Affine Transformation (AF) units. The ISB can be realized by the same MI calculation with SB and inversed affine transformation (IAF) units. For reducing the area cost of the combined implementations of SB/ISB units, the MI logic is usually shared. The key expansion unit is used to generate and output the required 8-bit round key to the ARK. Since the round keys are in reverse order in decryption, the inverse cipher process can start only after generating the last round key. Afterward, the key expansion with the same round keys can be executed concurrently with the decryption process.

Proposed AES architecture.

In Table

Performance comparison of different 8-bit AES designs.

Design | Tech. (um) | Mode | Max. clock freq. (MHz) | Clock cycles | Area (k-gates) | Max. throughput (Mbps) | Max. throughput/area |
Power consumption |
---|---|---|---|---|---|---|---|---|

Feldhofer et al. [ |
0.35 | Enc only | 0.1 | 992 | 3.628 | 0.013 | 0.0036 | 26.9 uW at100 KHz |

Kaps and Sunar [ |
0.13 | Enc only | 0.5 | 534 | 4.07 | 0.12 | 0.0295 | 23.85 uW at |

Kim et al. [ |
0.25 | Enc only | 0.1 | 870 | 3.9 | 0.015 | 0.0038 | 4.85 uW at |

Feldhofer and Wolkerstorfer [ |
0.35 | Both | 80 | Enc: 1,032 Dec: 1,165 | 3.4 | 9.9 | 2.91 | 4.5 uW at |

Good and Benaissa [ |
0.13 | Enc only | 12 | 356 | 5.5 | 4.31 | 0.78 | 99 uW at |

Ours (chip) | 0.18 | Enc only | 80 | 160 | 3.5 | 64 | 18.3 | 65 uW at |

Ours (chip) | 0.18 | Both | 60 | 160 | 4.4 | 48 | 10.9 | 93 uW at |

In Table

We observe that most realizations are encryption only due to the fast verification of their designs. Most realizations of 8-bit data path AES require more clock cycles to compute a 128-bit data block, resulting in smaller throughput rate. Therefore, most realizations are suitable for those applications with low frequency and throughput rate requirement, such as RFID. On the other hand, our design with higher throughput can be used in applications such as 802.11 series wireless network. Our AES design with only encryption ability occupies 3.5 k gates with the critical delay of 12.5 ns. The major improvement of our AES in this version is to minimize the required number of clock cycles and critical path delay for processing MC and SR operations by our architecture designs.

The area cost and critical path delay of our AES are similar with the best design in [

In this paper, we have presented new design methods of AES transformations and their architecture. The major transformations, SR/ISR and MC/IMC, dominate the required clock cycles and path delays for processing the data encryption and decryption. We presented two design methods that can efficiently optimize these transformations, and the proposed architecture design can improve the throughput but keep low area cost compared with other previous designs. The design is suitable for area-limited applications that require high throughput, such as wireless communications. The implementation results demonstrate that the proposed design has the highest throughput with low area cost.

This work is supported in part by Taiwan’s Ministry of Education under Project no. 101B-09-027. One of the authors would like to thank Professor Shen-Fu Hsiao for providing the comments.

^{m})

^{2})

^{2})

^{2}) Advanced Encryption Standard (AES) S-box with algebraic normal form representation in the subfield inversion