The wave dynamic differential logic (WDDL) has been identified as a promising countermeasure to increase the robustness of cryptographic devices against differential power attacks (DPA). However, to guarantee the effectiveness of WDDL technique, the routing in both the direct and complementary paths must be balanced. This paper tackles the problem of unbalance of dual-railsignals in WDDL design. We describe placement techniques suitable for tree-based and mesh-based FPGAs and quantify the gain they confer. Then, we introduce a timing-balance-driven routing algorithm which is architecture independent. Our placement and routing techniques proved to be very promising. In fact, they achieve a gain of 95%, 93%, and 85% in delay balance in tree-based, simple mesh, and cluster-based mesh architectures, respectively. To reduce further the switch and delay unbalance in Mesh architecture, we propose a differential pair routing algorithm that is specific to cluster-based mesh architecture. It achieves perfectly balanced routed signals in terms of wire length and switch number.
FPGAs are an attractive platform for cryptographic applications due to their low cost compared to full custom ASIC design and their short time to market period. In addition, their reprogrammability allows upgrading easily the cryptographic algorithm. However, unprotected hardware implementations are vulnerable to side channel attacks (SCA). It has been shown that differential power analysis (DPA) attack [
During the last years, many countermeasures have been proposed to protect cryptographic devices against SCA. They fall into two main categories: the masking logic and the hiding logic.
The principle of masking logic is to randomize the power consumption by using a random mask and thus decorrelate the intermediate data from the circuit power consumption. This technique was introduced first at algorithmic level [
On the other side, the principle of hiding logic consists in consuming the same amount of power consumption regardless of data inputs. This is achieved by using differential logic (signals are encoded as two complementary wires) and precharging the differential signals in every clock cycle. It is also called dual rail precharge logic (DPL). Several implementations of secure dual rail cells have been proposed, specifically for ASICs, such as SABL [
The wave dynamic dual rail logic (WDDL) technique, developed by Tiri and Verbauwhede [
To address the routing problem, Yu and Schaumont [
There are some other proposed logic-level styles derived from WDDL technique, such as IMDPL [
In this paper, we deal with the problem of delay unbalance of dual signals, in WDDL design, without adding logic to balance dual networks to avoid area increase. We study the impact of different placement techniques on differential timing, and we propose a balanced-timing routing algorithm to balance propagation delays of dual signals. FPGAs we are targeting are a tree-based FPGA (MFPGA) [
The rest of the paper is organized as follows. In Section
We consider an island style FPGA [
A unidirectional disjoint switch box connects different tracks (or wires) of vertical and horizontal channels together. Each input pin of a CLB is connected to the adjacent routing channel, and the output pin connects with the routing channel on its top and right through the diagonal connections of the switch box (highlighted in the bottom-left switch box shown in Figure
Mesh FPGA.
In this work, we assume wires are of logical length
A first multilevel hierarchical FPGA architecture (MFPGA) was designed and evaluated in [ The downward network, based on the butterfly fat tree (BFT) topology and involving a logarithmic population growth of unidirectional downward switch boxes (DS), connects these switch boxes to LBs inputs. The upward network comprises upward switch boxes (US). These USs allow LBs outputs to reach all DSs at different levels of the architecture.
Figure
MFPGA architecture with 2 levels of hierarchy (
This section presents the method used for creating the MFPGA layout. In Figure
Floorplan of MFPGA architecture
Figure
MFPGA Chip (2048 LBs).
In this paper, we are concerned with the WDDL strategy as a countermeasure against DPA. WDDL is characterized by the following features. The netlist is duplicated into a true and a false part. For every logic gate, a complementary gate exists. This latter expresses the complementary (false) output of the direct (true) gate using the complementary inputs of the direct gate. Figure The calculation is composed of two phases: when the clock is high, the precharge operation is performed, in which all signals are reset. When the clock becomes low, the evaluation phase is achieved, and exactly one of the two dual outputs is evaluated to “1.” If any direct gate switches to high, the dual gate does not and vice versa. Therefore, the activity of the circuit is constant regardless of the values of input data. The components used are limited to positive logic (AND and OR gates), and inverters are implemented by cross coupling complementary outputs. This allows the precharge wave propagation throughout the combinatorial gates.
WDDL components.
The constant activity is a necessary condition to achieve constant power consumption, but it is not sufficient. To guarantee the robustness of WDDL technique, dual signals must be balanced which means the following: match source-sink delay: the delay from match load capacitances:
So, special constraints on placement and routing tools must be applied in order to balance the direct and complementary networks.
Differential load capacitance.
To estimate the complementary networks balance, we use the following metrics. We compute the number of mismatched (unbalanced) signals (or connections) in terms of number of switches used for routing, called here We compute for all dual
Propagation delay is computed using the Elmore delay model [
Figure
RC model for FPGA routing elements.
The Elmore delay of a (
After routing is completed, our router builds an equivalent
RC tree network.
To determine the resistance and capacitance of routing wires, we need to know how long these wires are. Before starting the routing process, our router builds the routing graph of the FPGA architecture and calculates lengths of all wires based on layout regularity. For example, in the multilevel FPGA, the wire length depends on its level in the architecture, its direction (downward or upward), its source, its destination, and the arity of the architecture. After routing and building the RC tree of each net, the router computes the propagation delay between the source node and each destination node, using the Elmore delay.
In the rest of the paper, we present a case study on the DES [
Characteristics of WDDL designs.
WDDL Benchs | 4-LUTs | IN | OUT | Functionality |
---|---|---|---|---|
DES | 2152 | 10 | 19 | Cryptographic design |
BarreI16 | 586 | 42 | 34 | Shift register |
BarreI32 | 1406 | 76 | 66 | Shift register |
BarreI64 | 3112 | 143 | 130 | Shift register |
Mux8 | 2966 | 22 | 130 | Bus of muxes |
Mux32 | 2888 | 74 | 34 | Bus of muxes |
xbar_ |
546 | 160 | 34 | Crossbar |
Our placement tool uses the simulated annealing algorithm [
To improve dual rail unbalance in WDDL design, we investigate two placement techniques.
The mesh architecture presents a homogeneous interconnect. So, we can divide the architecture into two separate and symmetrical domains which present identical routing resources. We wanted to exploit this symmetry to balance the routing of the differential netlist by performing a symmetrical placement.
For this purpose, we started by placing the direct network on the half of the mesh architecture. Then we placed symmetrically the complementary network on the second half of the architecture, as shown in Figure
Symmetrical placement in mesh FPGA.
Table
Placement results (
WDDL |
Nets | Unconstrained placement | Symmetrical placement | Adjacent placement | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Max |
Mean |
Std. Dev. |
Switch |
Max |
Mean |
Std. Dev. |
Switch |
Max |
Mean |
Std. Dev. |
Switch | ||
DES | 2162 | 5317 | 607 | 746 | 2375 | 4623 | 289 | 551 | 1431 | 1989 | 121 | 176 | 1476 |
BarreI16 | 628 | 2323 | 458 | 466 | 815 | 1480 | 164 | 229 | 532 | 2448 | 165 | 299 | 479 |
BarreI32 | 1482 | 6825 | 829 | 908 | 2072 | 5504 | 333 | 648 | 1266 | 5140 | 222 | 440 | 1247 |
BarreI64 | 3254 | 8351 | 1114 | 1229 | 4760 | 10330 | 674 | 1269 | 3158 | 7090 | 252 | 545 | 2841 |
Mux8_64 bit | 2988 | 7023 | 579 | 939 | 2242 | 6263 | 230 | 645 | 676 | 2960 | 76 | 255 | 553 |
Mux32_16 bit | 3072 | 3683 | 447 | 536 | 2301 | 4271 | 108 | 353 | 630 | 1702 | 58 | 147 | 620 |
xbar_ |
706 | 5745 | 634 | 850 | 732 | 3706 | 258 | 375 | 485 | 3013 | 92 | 195 | 384 |
Average |
|
|
|
|
|
|
|
|
|
|
|
|
|
First, as already explained in Section
Keeping the direct and dual gates in adjacent places in mesh architecture can be favorable to obtain nets as symmetrical as possible. Besides, current techniques to control routing in ASICs choose to keep dual gates close to each other.
We choose to place each true instance above the complementary instance. When an instance is moved to a new block position, the dual instance must be moved to the adjacent block position. This is achieved for true and false instances. Figure
Adjacent placement in mesh FPGA.
With this constraint, the balance between dual nets greatly improves. The last four columns of Table
The MFPGA configuration flow begins by a top-down recursive multilevel partitioning phase. The multilevel partitioning algorithm [
In the MFPGA partitioning, first, the top level clusters of the tree-based architecture are constructed; then each cluster is partitioned into subclusters until the lowest level of the architecture is reached. For example, we run a recursive netlist partitioning for a
After the partitioning, we run the placement phase. Each cluster is assigned to a random position inside its owner cluster since clusters positions are equivalent inside the same cluster. After placement, the router assigns nets that connect placed instances to routing resources in the interconnect structure, using the PathFinder routing algorithm [
As for the mesh-based FPGA, we investigate symmetrical and adjacent placement techniques in MFPGA.
The MFPGA architecture presents an interesting symmetry. Indeed, all the clusters of a same level are identical, such as
Symmetrical placement in MFPGA architecture
Figure
Symmetrical partitioning steps in multilevel architecture
To obtain a symmetrical placement in MFPGA architecture, we have to perform identical partitioning for the two complementary networks. For this purpose, we start by partitioning the real network. Only the hypergraph representing real gates is considered. We perform a recursive partitioning based on a multilevel approach to this hypergraph. In each partitioning phase we apply a multilevel coarsening followed by a multilevel refinement.
In this partitioning phase, only the half of the MFPGA architecture is considered. In this example, the half of MFPGA corresponds to a
Once real gates contained in the
After partitioning, we place the direct network and then we place the dual network. We attribute randomly to every real cluster a position inside its owner. Then, we assign to each dual cluster the same position as the real cluster. This process is performed in all levels
At level
Table
Placement results (
WDDL |
Unconstrained placement | Symmetrical placement | Adjacent placement | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Max |
Mean |
Std. Dev. |
Switch |
Max |
Mean |
Std. Dev. |
Switch |
Max |
Mean |
Std. Dev. |
Switch | |
DES | 7006 | 1420 | 1438 | 1131 | 6840 | 732 | 870 | 110 | 4106 | 365 | 429 | 34 |
BarreI16 | 3828 | 648 | 660 | 242 | 2505 | 375 | 312 | 21 | 1946 | 231 | 235 | 11 |
BarreI32 | 5317 | 1165 | 1094 | 961 | 3631 | 420 | 468 | 104 | 3054 | 330 | 426 | 90 |
BarreI64 | 9157 | 1664 | 1752 | 1368 | 7000 | 1128 | 933 | 145 | 6902 | 542 | 636 | 96 |
Mux8_64 bit | 8940 | 2004 | 2205 | 1088 | 6303 | 813 | 843 | 79 | 5385 | 386 | 539 | 25 |
Mux32_16 bit | 8834 | 1882 | 2056 | 956 | 6742 | 849 | 869 | 82 | 6458 | 401 | 560 | 34 |
xbar_ |
3281 | 724 | 672 | 272 | 2258 | 347 | 249 | 8 | 1344 | 137 | 139 | 2 |
Average |
|
|
|
|
|
|
|
|
|
|
|
|
Instead of separating direct and dual networks in two separate domains of the architecture, we choose to keep real and dual gates in adjacent places in the architecture. In other words, we place 2 complementary gates in the same cluster of the MFPGA, as shown in Figure
Adjacent placement in MFPGA architecture
Figure
Let:
Let:
for (int lev = arch_levels-1; lev >= 0; lev− −){
partitioning (sub_hypergraph, TopLevelCluster, Arity[lev]); // partition instances inside TopLevelCluster into // } else { partitioning (sub_hypergraph, Cluster, Arity[lev]); // partition instances inside Cluster into // } } }
uncoarsening sub_hypergraph;
Adjacent partitioning steps in multilevel architecture
The last four columns of Table
In this part, we present the enhancements we made to the PathFinder router to improve the differential timing balance. We called the new router a timing-balance-driven router. Routing was performed on adjacent placed netlists.
The new router is based on congestion and delay-balance negotiation. Interconnect resources are presented by a routing graph with nodes corresponding to wires and CLBs pins and edges presenting switches.
Consider the connection to sink
The first term in ( To explain The The unbalance criticality of a sink The connection unbalance criticality is a fractional number between 0 and 1. High connection unbalance criticality means that the real and dual routed connections have an important delay unbalance. The connection unbalance criticality is related to the routing results of the previous iteration. In the first iteration, the The To determine the differential number of routing switches used, at a resource node The estimated total switches number that will be used to reach the sink We note that If The
Let index(src) = 0; count(src) = 0; index( count( if ( equal to the dual routing path*/ if ( index( count( } else { /* } } else { /*the current routing path is longer than the dual routing path*/ if (equivalent_index index( } else { /* index( count( } }
Figure
Routing true and dual connections.
In this expansion, we can distinguish 3 scenarios.
In the case of the node
Based on the aforementioned notations, we can illustrate the flow of our routing algorithm.
In the first iteration, one of two complementary nets is routed, based on the congestion cost. The routing does not take into consideration the differential delay, since the dual net has not yet been routed. Once the original net is routed, the router stores information about the routing: the used resources, the delay of each resource, and the number of used switches.
Then, during the routing of the dual net, the router computes the delay of each used resource
During the next iteration, the original net will be ripped up and rerouted taking into consideration the last routing results of the dual net (in the previous iteration) and vice versa. We note that dual connections are routed in the same order.
It can be seen from (
The PathFinder routing algorithm performs many iterations until all resource conflicts are resolved. We note that this routing algorithm is not architecture specific and it can be applied to all target platforms.
From Table
Timing-balance-driven routing results (
WDDL |
Max |
Mean |
Std. Dev. |
Sw. Mis. |
Total |
Routability_driven |
Timing_balanced_driven |
---|---|---|---|---|---|---|---|
DES | 571 | 85 | 92 | 0 | 0 | 56.62 | 51.55 |
BarreI16 | 395 | 50 | 55 | 0 | 0 | 24.04 | 21.14 |
BarreI32 | 921 | 70 | 75 | 0 | 0 | 21.23 | 18.52 |
BarreI64 | 1333 | 133 | 148 | 0 | 0 | 61.72 | 59.9 |
Mux8_64 bit | 881 | 44 | 71 | 0 | 0 | 28.75 | 25.84 |
Mux32_16 bit | 801 | 61 | 88 | 0 | 0 | 40.11 | 37.82 |
xbar_ |
366 | 32 | 55 | 0 | 0 | 13.39 | 13.84 |
Average |
|
|
|
|
|
|
|
The two last columns of Table
Routing results obtained with simple mesh FPGA are summarized in Table
Timing-balance-driven routing results (
WDDL |
Max |
Mean |
Std. Dev. |
Sw. Mis. |
Total |
Routability_driven |
Timing_balanced_driven |
---|---|---|---|---|---|---|---|
DES | 739 | 64 | 75 | 1419 | 2191 | 19.84 | 19.55 |
BarreI16 | 540 | 45 | 60 | 371 | 508 | 9.4 | 8.7 |
BarreI32 | 975 | 56 | 83 | 875 | 1432 | 8.32 | 8.59 |
BarreI64 | 1620 | 71 | 107 | 2198 | 3831 | 25.29 | 20.77 |
Mux8_64 bit | 798 | 25 | 56 | 371 | 568 | 12.09 | 10.62 |
Mux32_16 bit | 494 | 20 | 39 | 268 | 383 | 17.47 | 16.53 |
xbar_ |
539 | 30 | 49 | 232 | 330 | 8.59 | 6.49 |
Average |
|
|
|
|
|
|
|
Table
Timing-balance-driven routing results (
WDDL |
Adjacent placement and routability-driven routing | Adjacent placement and timing-balance-driven routing | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Max |
Mean |
Std. Dev. |
Sw. Mis. |
Total |
Critical |
Max |
Mean |
Std. Dev. |
Sw. Mis. |
Total |
Critical | |
DES | 2383 | 107 | 218 | 1099 | 3209 | 16.23 | 813 | 34 | 70 | 631 | 990 | 15.66 |
BarreI16 | 2400 | 260 | 364 | 604 | 2757 | 9.43 | 620 | 91 | 102 | 536 | 977 | 9.2 |
BarreI32 | 4810 | 463 | 770 | 1455 | 10654 | 18.32 | 1616 | 117 | 158 | 1351 | 2824 | 16.2 |
BarreI64 | 8926 | 646 | 1093 | 3391 | 32299 | 26.64 | 2131 | 149 | 197 | 3190 | 7759 | 23.96 |
Mux8_64 bit | 5751 | 464 | 781 | 1706 | 14421 | 11.58 | 1899 | 61 | 132 | 894 | 1940 | 9.94 |
Mux32_16 bit | 4505 | 159 | 451 | 852 | 4936 | 14.02 | 1376 | 46 | 95 | 846 | 1445 | 14.07 |
xbar_ |
3245 | 182 | 375 | 423 | 1863 | 8.25 | 621 | 58 | 82 | 355 | 581 | 6.32 |
Average |
|
|
|
|
|
|
|
|
|
|
|
|
To evaluate the overhead of the new router, we compared the critical path delay obtained with timing-balance-driven and routability-driven routing algorithms. The last column of Table
When we compare the three FPGA architectures explored in this paper, we can see that the multilevel architecture is the best one in terms of switch balance. In fact, in this architecture, the number of switches used to route a connection is limited. So, it is easier to balance routed dual connections in terms of switch number. The remaining delay unbalance comes from unbalance between architecture wires length in all levels. For the mesh architecture, we take advantage of the homogenous interconnect, but on the other side, the number of switches to route a connection is far from being limited. The simple mesh architecture presents better delay balance than the multilevel architecture and than the cluster-based architecture for cluster size 2. But in terms of area, Table
Characteristics of simple mesh, cluster-based mesh and MFPGA architectures with timing-balance-driven routing.
WDDL |
MFPGA | Simple mesh | Cluster-based mesh ( |
||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Architecture |
SW |
Area |
|
Channel |
SW |
Area |
|
Channel |
SW |
Area |
|
DES |
|
506 | 1499 |
|
20 | 625 | 1797 |
|
16 | 474 | 1354 |
BarreI16 |
|
177 | 526 |
|
18 | 160 | 465 |
|
16 | 141 | 405 |
BarreI32 |
|
380 | 1089 |
|
24 | 492 | 1387 |
|
22 | 415 | 1156 |
BarreI64 |
|
786 | 2268 |
|
32 | 1418 | 3914 |
|
24 | 980 | 2705 |
Mux8 |
|
729 | 2132 |
|
32 | 1368 | 3776 |
|
12 | 526 | 1559 |
Mux32 |
|
729 | 2139 |
|
24 | 989 | 2794 |
|
14 | 564 | 1638 |
xbar |
|
203 | 587 |
|
12 | 102 | 304 |
|
12 | 102 | 301 |
Average |
|
|
|
|
|
|
|
From previous results, we show that the cluster-based mesh architecture has the smallest area. However, this architecture presents an important switches unbalance between dual connections, which causes a delay unbalance and should cause an unbalance in power consumption. The optimal balance is obtained when we have balanced wire lengths and balanced number of routing switches. For this purpose, we propose to apply a specific differential pair routing algorithm to cluster-based mesh FPGA. The goal is to always route two differential signals through identical tracks and the same number of switches. Thus, the two routes have the same capacitances and the same delays, if we do not consider parasitics and crosstalk effects.
The routing technique we propose is inspired from the differential routing in ASIC proposed in [
To achieve differential routing, we propose first to build a double-wire routing graph. We group equivalent architecture wires into pairs to form “double” wires.
Two wires they belong to the same routing channel; they have the same logical length they have the same direction (North, South, East, and West); wires wires
Concerning wires which are inputs (or outputs) of a CLB, they are considered as equivalent if they are inputs (or outputs) at the same CLB side (top, bottom, left, and right).
Double-wire routing graph.
To obtain a double-wire routing graph, each pair of successive equivalent wires is transformed to a single double wire. Figure
Once we obtain a double-wire routing graph and double-signal netlist, we route each pair of differential signals as a one double signal using double wires of the routing graph. Single-rail signals can be routed using double wires or single wires which do not have equivalent wires. We use the shortest path PathFinder algorithm [
At the end of the routing, double wires of the routing graph are split into 2 single wires, and each double signal is decomposed into 2 differential signals. Then, each wire is affected to the appropriate signal. Thus, we ensure that the 2 differential signals are routed through wires of the same length and with the same number of switches.
We note that this methodology can be applied to mesh architecture with different cluster sizes, different wire lengths, and interconnect flexibilities. But, it is obvious that an architecture containing more equivalent wires is more adapted to differential pair routing of WDDL design.
Adjacent placement and differential routing techniques are applied in cluster-based mesh architecture with 2 LBs in each cluster, wires of logical length
Differential pair routing results (
WDDL |
Nets | Unconstrained placement and routing | Adjacent placement and differential routing | ||||
---|---|---|---|---|---|---|---|
Max |
Mean |
Sw. Mis. |
Max |
Mean |
Sw. Mis. | ||
DES | 2162 | 5450 | 319 | 2206 | 13 | 1.2 | 0 |
BarreI16 | 586 | 2400 | 407 | 752 | 11.3 | 1.6 | 0 |
BarreI32 | 1406 | 4810 | 623 | 1816 | 12 | 2.4 | 0 |
BarreI64 | 3112 | 11318 | 1131 | 4275 | 16 | 3.2 | 0 |
Mux8 | 2966 | 7468 | 624 | 1707 | 14.2 | 1.09 | 0 |
Mux32 | 2988 | 4642 | 423 | 1773 | 13.9 | 0.9 | 0 |
xbar_ |
546 | 1790 | 336 | 673 | 14 | 1.14 | 0 |
Figure
Ratio between interconnect capacitances at true and false nets in WDDL DES design.
The first columns of Table
Cluster-based mesh architecture characteristics for different cluster sizes.
WDDL |
Differential pair routing | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
| ||||||||||
|
Channel |
SW |
Area |
|
Channel |
SW |
Area |
|
Channel |
SW |
Area |
|
DES |
|
18 | 522 | 1486 |
|
20 | 625 | 1742 |
|
28 | 938 | 2460 |
BarreI16 |
|
16 | 141 | 405 |
|
28 | 236 | 638 |
|
28 | 325 | 851 |
BarreI32 |
|
24 | 447 | 1236 |
|
26 | 475 | 1296 |
|
40 | 802 | 2076 |
BarreI64 |
|
26 | 1051 | 2879 |
|
30 | 1150 | 3104 |
|
44 | 1749 | 4505 |
Mux8 |
|
12 | 526 | 1559 |
|
16 | 730 | 2058 |
|
28 | 1299 | 3405 |
Mux32 |
|
14 | 564 | 1638 |
|
20 | 791 | 2204 |
|
32 | 1274 | 3318 |
xbar_ |
|
12 | 102 | 301 |
|
18 | 145 | 412 |
|
32 | 286 | 744 |
Average |
|
|
|
|
|
|
|
|
|
To evaluate the impact of the cluster size on the differential pair routing results, we apply the adjacent placement and the differential-pair routing for cluster sizes 2, 4, and 8. Tables
Impact of cluster size on critical path delay.
WDDL |
Differential pair routing | Routability-driven routing | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
| |||||||
Critical |
Critical |
Critical |
Critical |
Critical |
Critical |
Critical |
Critical |
Critical |
Critical |
Critical |
Critical | |
DES | 147 | 16.52 | 154 | 16.78 | 106 | 18.18 | 149 | 16.23 | 139 | 14.58 | 92 | 16.86 |
BarreI16 | 102 | 10.8 | 98 | 13.67 | 82 | 14.06 | 85 | 9.43 | 118 | 14 | 72 | 12.42 |
BarreI32 | 175 | 20.32 | 167 | 22.37 | 103 | 20.47 | 176 | 18.32 | 176 | 21.52 | 106 | 20.7 |
BarreI64 | 211 | 23.7 | 233 | 32.0 | 121 | 24.8 | 232 | 26.64 | 228 | 27.8 | 153 | 29.39 |
Mux8 | 96 | 9.07 | 79 | 9.45 | 74 | 12.54 | 194 | 18 | 107 | 11.18 | 90 | 15.64 |
Mux32 | 133 | 13.11 | 116 | 14.0 | 94 | 16.17 | 150 | 14 | 131 | 14.15 | 112 | 18.62 |
xbar_ |
73 | 7.04 | 52 | 6.79 | 51 | 9.56 | 84 | 8.25 | 53 | 6.53 | 58 | 9.95 |
Average |
|
|
|
|
|
|
|
|
|
|
|
|
Concerning the delay unbalance between differential nets, it should increase when the cluster size increases. In fact, we suppose that mesh routing wires have the same length, and that intracluster local interconnect is unbalanced. So, when the cluster size increases, the layout of the cluster becomes bigger and the unbalance between local routing wires becomes more important. To have exact values of delay unbalance, we should make the layout of the different clusters.
The wave dynamic differential logic (WDDL) is a promising countermeasure to improve the robustness of secure devices. Nevertheless, to be effective, dual signals must be balanced to have equal propagation delays. To improve the delay balance in the design, we first performed an adjacent placement technique. This placement consists in placing real and dual gates close to each other. Then routing results were improved by an adaptation of the PathFinder routing algorithm. The proposed router is architecture independent. It is based on the congestion-delay negotiation and takes into consideration the differential delay, the differential number of used routing resources, and the congestion cost. Placement and routing techniques are performed on tree-based, simple mesh, and cluster-based mesh architectures. The results obtained are very interesting. Indeed, compared to unconstrained placement and routing, the new placement and routing techniques achieve 95%, 93%, and 85% of average timing improvement in tree-based, simple mesh, and cluster-based mesh architectures, respectively. In addition, the router succeeded to route all pairs of dual connections of WDDL designs with the same number of resources in Tree FPGA. The remaining delay unbalance is dominated by wire delays (wires belonging to the same level have not the same length). On the other hand, in the mesh architecture, the remaining delay unbalance is dominated by switch delays.
However, the cluster-based mesh architecture with cluster size 2 presents the smallest architecture. So, to reduce further the switch unbalance and to take advantage of the homogenous interconnect of the mesh architecture, we proposed a specific differential pair routing technique to cluster-based mesh FPGA. This routing algorithm presents the best tradeoffs. It succeeded to route differential signals through balanced routing paths in terms of wire length and switch number. However, this routing technique can be applied only for cluster-based mesh architecture.