Testing and Diagnosing Dynamic Reconfigurable FPGA

Dynamic reconfigurable field-programmable logic arrays (FPGAs) are receiving notable attention because of their much shorter reconfiguration time as compared with traditional FPGAs. The short reconfiguration time is vital to applications such as reconfigurable computing and emulation. We show in this paper that testing and diagnosis of the FPGA also can take advantage of its dynamic reconfigurability. We first propose an efficient methodology for testing the interconnects of the FPGA, then present several universal test and diagnosis approaches which cover all functional units of the FPGA. Experimental results show that our approach significantly reduces the testing time, without additional cost for diagnosis.


INTRODUCTION
With the advent of deep-submicron VLSI technology, system-on-a-chip is no longer a dream.
However, as the integration density and design complexity of system chips keep increasing, design verification is more and more difficult. Emulation and rapid prototyping by field programmable gate arrays (FPGAs) are now widely used to speed up the verification process. They also are used in some first-generation products that need to get into the market soon. In addition to prototyping and emulation, the in-system reprogrammable feature of dynamic reconfigurable FPGAs has made them a natural platform for reconfigurable computing or custom computing.
A typical RAM-based FPGA consists of an array of function units and interconnect channels/ matrices, as shown in Figure 1. The function unit and interconnect switches are programmable, i.e., they can be configured to perform different logic functions. Configuration data generated by software tools need to be downloaded to the control memory of the FPGA before it can be used as a FPGAs can be one time programmable, bootup configurable, or dynamic reconfigurable. A one time programmable FPGA normally stores the configuration data in a built-in non-volatile memory such as an EEPROM; and a boot-up configurable FPGA normally stores the configuration data in a RAM (configuration bit-stream loading is required for each system boot-up). A dynamic reconfigurable FPGA actually is a special type of boot-up configurable FPGAits configuration memory can be partially reconfigured: a section of the device can be reconfigured without disturbing circuits already configured in other sections. One time programmable FPGA and boot-up configurable FPGA have been widely used for hardware prototyping. The enhanced programmability of dynamic reconfigurable FPGA makes it even more suitable for emulation and reconfigurable comput- FPGA testing is not trivial. Unfortunately, the FPGA manufacturer is not the only one concerned about its testing. Often, the user needs to do an incoming test to reduce the overall cost of system test. FPGA testing can be done in two ways: testing the unprogrammed FPGA and testing the programmed FPGA. The latter is normally done by a user with test patterns generated for the target circuit configured into the FPGA; however, such user patterns are not efficient even for faults in the configured circuit because of the technology mapping problem [2]. An unprogrammed FPGA can realize a huge amount of different functions, so testing all possible configurations to verify the correctness of the FPGA is not feasible. However, by proper fault modeling and careful selection of configurations, the FPGA can be tested efficiently. A test sequence that fully test the FPGA for the target faults without exercising all possible configurations (i.e., with only a small amount of test configurations) is called a universal test [3]. It is universal because it has nothing to do with the target circuit. Note that a universal test still requires a small number of different test configurations (TCs) and their corresponding test patterns (TPs). TC generation is a very timeconsuming process; moreover, TC downloading occupies most of the testing time, i.e., time (TC) >> time(TP) for each TC. To speed up the universal testing process, we must reduce the total number of TCs while still able to cover all target faults in the programmable resources of the FPGA, i.e., function units and interconnects.
So far the reported works in FPGA testing are all for boot-up configurable FPGAs, including testing and diagnosis for LUTs [3][4][5], interconnect testing [6][7][8][9], array approaches for testing CLBs in FPGA [10][11][12], and BIST-based approaches [13 15]. These approaches can be applied to dynamic reconfigurable FPGAs if their architectures are similar, especially for interconnect testing. However, approaches for testing LUTs or CLBs in RAM-based FPGAs are not suitable for dynamic reconfigurable FPGAs because the architectures of their function units are different. Moreover, previous approaches do not take advantage of the dynamic reconfiguration capability during the testing process.
In this paper we focus on testing and diagnosis of dynamic reconfigurable FPGAs. The basic idea is configuring the FPGA into an easily testable array and apply test patterns via appropriate TESTING FPGA 323 interconnect configurations. Covering all resources by a minimal number test configurations is the goal. Also, using one part of the FPGA to help test other parts is usually helpful. We take advantage of the enhanced programmability of dynamic reconfigurable FPGA and propose universal test approaches that take only a few milli-seconds for testing a typical FPGA. We use a commercial dynamic reconfigurable FPGA, the Xilinx XC6200 [16], as an example for discussing our test methodology. We first introduce the architecture of XC6200, then define the fault models and test patterns for its function units and interconnects.
We present in detail how a small amount of test configurations and test patterns can be derived for the interconnects and function units. We also propose universal test and diagnosis approaches for dynamic reconfigurable FPGAs. Our approaches significantly reduce the testing time, and concurrently provide diagnosis capability for faulty function units.

XC6200 ARCHITECTURE
The function unit of XC6200 is multiplexer-based, as depicted in Figure 2. The multiplexers are controlled by the configuration memory which is not shown. The function unit can be configured as any two-input logic gate, buffer, inverter, 2to-1 multiplexer, or any of these in addition to a D-type flip-flop (DFF). There are several ways to configure the DFF. Figure 3 shows three sequentialmodes of the function unit of XC6200 using different configurations, where the DFF in the rightmost one is said to be in protected mode, which makes the DFF accessible only by the programming interface.
The function unit as well as its surrounding interconnect switches (multiplexers) are called the basic cell (BC), as shown in  wire can be used to connect to the boundary switches of blocks, to neighboring 16 16 arrays, or even to the length-64 wires if it is on the boundary of a 64 64 array. Higher level wires provide efficient long-distance or global routing. For example, the XC6216 FPGA chip is composed of a 64 64 array of BCs. The configuration memories of XC6200 are SRAMs. This allows fast dynamic reconfiguring of function units and interconnect switches. Its full and partial context switching capability is ideal for reconfigurable computing.

PATTERNS
Multiplexer is the elementary component of XC6200, so we propose fault models and test patterns for multiplexer first. A multiplexer is a group of switches which forward exactly one of the inputs directly to the output according to the configuration of switches: the switch for the selected input is on and all others are off. In FPGA, multiplexer control inputs come from the configuration memory. Functionally, the multiplexer can be viewed as a set of configurable switches as shown in Figure 6. In this work, we assume that (1) if all switches are off (open), the multiplexer output is either stuck-at-1 or stuck-at-0, and (2) if two switches are on (closed) simultaneously, the multiplexer output is equivalent to the wired-OR of the two selected inputs. The second assumption is only for ease of discussion. It does not affect the result of our test methodology if it is wired-AND instead of wired-OR.
We consider switch stuck-on faults and stuck-off faults, line bridging faults, and line stuck-at faults as our basic fault models. However, in our case line stuck-at faults are covered by switch stuck-on/ off faults. For example, to detect a switch stuckoff or stuck-on fault (assuming CMOS circuits), we must trigger a transition on the corresponding data line. Obviously, then, it also detects stuck-at faults on the input and output data lines. A stuckat fault on a control input line (also equivalent to a stuck-at fault in the configuration memory) results in multiple stuck-on/off faults. For example, in Figure 6, when Co has a stuck-at-0 fault, it is equivalent to a stuck-off fault at Sl and a stuckon fault at So if Cl is 0, or a stuck-off fault at s3 and a stuck-on fault at s2 if Cl is 1. All such cases are detectable by testing all switch stuck-on and stuck-off faults. We obtain the following theorem. THEOREM A test which detects all switch stuckon and stuck-offfaults of a multiplexer also detects stuck-at faults on its I/0 nets. We now propose a test called MP for detecting switch stuck-on and stuck-off faults as well as line bridging faults in the multiplexer. By Theorem 1, all target faults will be covered. It is obvious that MP activates all switch stuckon and stuck-off faults, and any fault effect can be observed from the output Y. For example, with c=0, MP and MPo together activate the stuckoff fault of So, since if the switch is always off then it will fail to transmit either 0 or 1. Also, MP activates the stuck-on faults of all switches except So, because each of these faults results in a faulty output value (i.e., 1) according to the second assumption of the multiplexer model mentioned above. Note that when the wired-AND logic is assumed instead of wired-OR, MP still activates all stuck-on and stuck-off faults, though stuck-on faults will be activated by MP instead of MP .
Bridging (short) faults on input nets of a multiplexer are covered by MP if the fault behavior is equivalent to wired-OR logic, or by MP if wired-AND is assumed. The detection of bridging faults on multiplexer inputs is an important feature of MP because all interconnect switches in the XC6200 series FPGAs are implemented by multiplexers. Detecting bridging faults of the multiplexer inputs implies detecting bridging faults of the interconnect wires.
In summary, to test a multiplexer, we turn on the switches one by one and apply the corresponding MP (see Fig. 7 on and stuck-off faults of the switches, stuck-at faults of the I/O nets, and bridging faults of the data input nets. Note that although we assume single faults, most multiple faults can also be detected. We will discuss this later.

TESTING THE BASIC INTERCONNECTS
Basic interconnects are implemented by four 4-input multiplexers in the BC, as shown in Figure 8.
Parallel testing of these multiplexers is achieved by three TCs, as shown in Figure 9 [17]. In the figure, we show only a 2 2 array for clarity. It can be directly extended to any N N array and tested with the same approach.
Multiplexers whose outputs are Nout, Wout, Sout, and Eout are denoted as MN, Mw, Ms, and Me, respectively. In the test configuration TC (Cl, c2, c3, Ca), cl defines the switch control inputs for MN, c2 for Mw, c3 for Ms, and Ca for Me, respectively. For example, if TC=(E,S, W, N), it means that switches E, S, W, and N are turned on in MN, Mw, Ms, and Me, respectively.
Also, the orthogonal test configuration as shown in Figure 9 is TCo (N, W, S, E).  To test So of M1, an additional pattern is required to invert the DFF value, as shown in Table III, where the control inputs are CMlC2C4C5=0111. The P2 pattern reset X1 to 0 to transmit the Q value. Upon the application of the next pattern, the DFF value is inverted. As a result, the expanded MP is successfully applied to M1. Testing s2 and s3 of M1 are similar   Table IV. Before applying Pl, we configure M4 to leave the protected mode so that we can invert the DFF value. As a result, MP can successfully be applied to M4. This test also covers So of M5. In summary, the function unit can be fully tested by 11 TCs instead of 96.

TESTING AND DIAGNOSIS OF THE FPGA
Although the number of TCs to test a single function unit is only 11 for XC6200, testing all function units in the FPGA one by one is not acceptable because that would require tens of thousands of TCs. We will show how they can be tested in parallel, requiring only a small number of TCs. We first define the notation. Let the array size be N N; the number of required TCs to test a single function unit be fc; the average number of test patterns associated with a TC for the func- [--] IO Block dynamically reconfiguring a function unit be tfr; the time for programming the whole FPGA (i.e., downloading a complete TC to the FPGA) be trc, where trc tint + N2tfr; the wire delay of.the nearest-neighbor interconnect be twa; the function unit delay be tfa; and the cell delay be tst, where tst tfd + twd.
A simplest parallel test approach is to test a row or a column of function units at a time, as shown in Figure 13, which is called the bruteforce parallel approach. The two vertical wires in the figure are meant to be global interconnects which deliver test patterns to all BCs under test. The interconnects actually involves wires from the highest level to the lowest level. We use this to represent global nets for simplicity. The bruteforce approach is not good enough because it still requires Nfc TCs. The TC count grows with the array size, and is not acceptable for large arrays. We propose better approaches below.

Two-phase Parallel (TPP) Approach
The Reed-Muller propagation chain (RMPC), which is also called the collector row in the Reed-Muller canonic network [20], can be used for parallel testing of the XC6200 function units [19]. As shown in Figure 14, the RMPC receives many identical inputs (f) and generates the output Y in the fault-free case. Any single fault at the inputs can automatically be propagated to the output Y, i.e., the value of Y changes given any single input fault. With RMPCs, multiple function units can be tested simultaneously by the configurations as shown in Figure 15. When the function units in the odd rows are under test, function units in the even rows are configured as RMPCs to propagate possible fault effects. Likewise, when the even rows are under test, the odd rows are configured as RMPCs.
Fault location (diagnosis) of the function units can be done if, in addition to the row-wise configurations, similar column-wise configurations are included to form a 2D addressing of the faulty unit. Apparently any single faulty function unit can be located by using only four TCs. This complete test and diagnosis approach requires 2(fc + k) TCs, where k is the number of detected faults. The weakness of this approach is that we are unable to detect an even number of faulty units in the same row.

Dynamic Serial (DS) Approach
The dynamic reconfiguration feature of the FPGA not only increases its programmability but also its testability and diagnosability. Here we propose a new testing and diagnosis approach called the dynamic serial (DS) approach. We first link all function units into a chain, as shown in Figure 16, where all function units are configured to be in the bypass mode (i.e., as buffers). After testing the integrity of the chain in the bypass mode, we test each function unit by its fc TCs and the corresponding patterns, then configure it back to the bypass mode. We repeat the procedure and test the subsequent function units, and continue until all function units have been tested, as shown in Figure 17  we test a specific function unit, its configuration data is down-loaded to the FPGA dynamically, i.e., the configuration of other function units remain unchanged. Therefore, the total configuration time is tTc+N2(fc + 1)tfd, which is much shorter than that for the two-phase parallel approach, especially for a large N. Although this approach takes advantage of the dynamic reconfiguration feature of the FPGA and reduces the configuration time, the delay time of the serial path (i.e., the application time for a test pattern) still increases with the array size N2. To solve the problem, we propose an improved approach called the dynamic serial-parallel (DSP) approach, which is discussed next. Note that fault diagnosis is automatic in both approaches.

Dynamic Serial-parallel (DSP) Approach
The idea is simple. We partition the original single serial path (the chain of all function units) in the DS approach into multiple paths (still covering all function units) to reduce the path delay in large arrays, as shown in Figures 18 and 19. For the shortest path delay, we can configure the paths so that each of them consists only of a single row or column of function units. This approach maintains the short test configuration IS] IO Block

TIME COMPLEXITY AND ANALYSIS
Normally the testing time is dominated by the configuration time. As we have mentioned, our primary objective was to minimize the number of required TCs. However, the path delay also should be taken into consideration when we use the dynamic approaches, since test pattern application time is dependent on the path delay. The test time of the TPP, DS, and DSP approaches are shown, respectively, by the following equations: TTPP 2(fc + k)trc + 2(fc + k)fapNtst; TDS trc + N 2 (fc + 1) rid +fcfapN 4 tst; TDSP tTC -+-N 2 (fc -+tfd +fcfapsZN 2 tst; where sN is the length of a path (i.e., < s < N).
In each equation, the first term in the righthand side represents the test configuration time, and the second term represents the time to apply the test patterns. Take XC6216 as an example, where N 64, fc 11 Clearly DSP is the fastest approach in this case. For larger chips, DSP will remain to be the best, and the improvement over TPP will be even more significant, as can be seen from Figure 20. The reason is that in the equation for TTpp, there is a larger and growing coefficient for trc, which grows linearly with the array size (N2). The curve for DSP in the figure also shows that the serial path delay does increase the testing time as expected, though with a much lower weight as compared with trc. However, in DS the test pattern application time is even longer than the test configuration time, so it becomes the worst of the three.
From our time complexity analysis, DSP is faster than TPP, and the gap grows with the array size. In practice, DSP is also more flexible than TPP when we take the number of I/O pins into consideration. With DSP, we can trade I/O pins for pattern application time, e.g., we can double the length of the serial path to reduce the number of I/O pins in half. The test configuration time remains the same. However, with TPP, we have to double the test configuration time in order to reduce the same number of I/O pins.
TPP and DSP are both general approaches for dynamic reconfigurable FPGAs, but DSP is more suitable for those with fine-grain reconfiguration capability, while TPP can also be applied to bootup configurable FPGAs. 8. CONCLUSIONS FPGA has been widely used in hardware prototyping and emulation, and considered the key hardware component in custom and reconfigurable computing. Testing FPGAs therefore is an important issue to the manufacturers as well as the end users. We have shown that the testing time is dominated by the time to download the test configurations, and have proposed approaches whose primary objective is to minimize the number of test configurations. The experimental results justify the objective that we have aimed at. We also have proposed universal test and diagnosis approaches for dynamic reconfigurable FPGAs, including two dynamic approaches which take advantage of the enhanced programmability (i.e., dynamic partial reconfigurability) of the dynamic reconfigurable FPGAs. Our dynamic serialparallel approach significantly reduces the testing time, and concurrently provides diagnosis capability for faulty function units. Finally, we have implemented several test configurations with the Xilinx XACT6000 design kit and have done some experiments on a PCI board. Correct results have been obtained. However, the speed was limited by the interface and the PCI board. The issue should be able to be solved easily by the industry.