Partitioning Techniques for Built-In Self-Test Design

An efficient, unified algorithm, Advanced Two-Phase Cluster Partitioning, is proposed for automated synthesis of pseudo-exhaustive test generator for Built-In Self-Test (BIST) design. A prototype of the algorithm, TwoPhase Cluster Partitioning, has been proposed and the hierarchical design procedure is computationally efficient and produces test generation circuitry with low hardware overhead. However, in certain worst case, the algorithm may generate a sub-optimal design which requires more test patterns and/or hardware overhead. In order to generate a globally optimal design, further improvement of two-phase algorithm can be achieved by expanding the design space for the formation of linear sum so that the number of test signals required for pseudo-exhaustive testing can be reduced. We demonstrate the effectiveness of our approach by presenting detailed comparisons of our results against those that would be obtained by existing techniques.


INTRODUCTION
esting a circuit consists of applying an input sequence and observing the resulting output sequence. With respect to test generation, the central problem is that in today's IC technology a highly complex chip has low accessibility of internal circuit nodes (controllability and observability). The weak accessibility makes traditional testing techniques costly and ineffective. Especially, modern RISC processors incorporate a hundred of thousands of transistors to perform the bulk of the task of a complete computer system on a single chip. This level of integration has substantially increased the complexity of testing modern RISC processors to the point where classical functional based testing methods are simply inadequate. Thus, design for testability techniques are in demand.
Built-In Self-Test (BIST) has been proposed as a powerful technique for addressing the highly complex problems of VLSI testing [1][2][3][4][5][6][7][8][9]. The basic idea is to include the test generator and evaluator into the design and to perform the testing internal to the chip. A widely adopted design approach is one in which BIST is combined with some type of scan design methodology. In this situation, the internal state of the system under test is completely accessible so that the BIST hardware need only create and analyze tests for purely combinational circuits. While exhaustively checking all possible input patterns is neither feasible nor required in many situations. For a general multi-input, multi-output combinational circuit in which no output is a function of all input variables, pseudo-exhaustive testing may be applied: For circuits of this type,, it is possible to obtain the same amount of information as obtained in exhaustive testing, but with a reduced number of test patterns. One needs to find sets of input pins which can share common test signals. This problem is completely analogous to resource sharing in data-path synthesis, and certain modifications of efficient algorithms developed for the latter can be applied to the test generation problem.
There have been many variations on the basic idea of BIST design. The first method for pseudo-exhaustive test pattern generation was proposed in [10] in which a syndrome drive counter (SDC) designed by a minimum covering procedure was proposed for generating test patterns. In that technique, it can be checked whether the different inputs of the CUT (circuit under test) can be tested by using the same test signal. But, test time may be still too long when 186 CHIEN-IN HENRY CHEN only a few inputs share the same test signal. Also, no attempt is made to find the minimum number of test signals for the non-MTC circuits. If the number of required test signals is equal to the maximum number of inputs upon which any.output depends, the circuit is called an MTC (Maximum Test Concurrency) circuit [11]. Using the proposed technique, fewer test signals are needed for the non-MTC circuits.
Two universal testing techniques have been proposed to reduce the number of test signals for pseudo-exhaustive testing. The first technique, verification testing [11], uses constant weight counters (CWC) to implement pseudo-exhaustive generators.
A major problem with this approach is that input stimulus generation must be computed, and that is an NP-complete problem. Therefore, for circuits having a higher order m-out-of-n code [11], CWC is very costly to implement. Next, a combination of LFSRs and exclusive-or gates was proposed in [12][13]. However, the techniques proposed in these papers did not attempt to allocate the minimum number of required test signals, nor did they attempt to minimize the extra hardware overhead (i.e. XOR gates). Because these problems are NP-complete, a universal procedure (LFSRs/XORs) was proposed in [12] to solve the problem where an upper bound of the required test signals for the CUT is derived. Both the proposed universal testing techniques do not keep track of the specific dependencies for each output and simply focus on w, the maximum number of inputs upon which any output depends. The techniques derive tests in which any output having the dependency subset of w (or less) can be pseudoexhaustively tested. However, in general, not all outputs depend on the same number, w, of inputs. Therefore, the number of test patterns found in both techniques can be more than is necessary for pseudoexhaustive testing.
Condensed LFSR testing based on linear codes and structural partitioning was proposed for self-testing in [14]. This technique is most effective when w >n/2. However, when w < n/2, more test patterns than necessary will be generated by this technique.
Another design technique using an LFSR to generate pseudo-exhaustive testing patterns was proposed in [15]. The technique is based on cyclic codes which are easier to implement and have less hardware overhead than an LFSR designed using general linear codes. However, more test patterns than necessary may be generated by this technique in some cases. For example, to design an (n, k) LFSR for an (n, w) CUT, from APPENDIX D in [16], we need to find a cyclic code that has a code length n and a minimum distance, w + 1. However, such a cyclic code may not exist. If a cyclic code for code length n does not exist, then we need to find a cyclic code which has next higher code length (>n). If the minimum distance w + 1 does not exist, then find a cyclic code with next higher distance -> w + 1. In these worse cases, the test patterns generated by this cyclic code generator may be more than what is required for pseudo-exhaustive testing.
An efficient solution to the problem of test generation for BIST has been proposed in [1]. The polynomial-time algorithm, two-phase cluster partitioning, executes quickly and produces test generation circuitry of low complexity. However, in certain worst case, the algorithm may generate a sub-optimal design which requires more test patterns and/or hardware overhead. In this paper, an efficient algorithm, advanced two-phase cluster partitioning, is presented which is a further improvement on the original two-phase algorithm [1] to automatically design a test generator for BIST. The advanced twophase algorithm generates less number of test signals than previous techniques and is suitable for both MTC and non-MTC circuits. The test generation procedure operates .in two phases. The first, phase consists of a cluster partitioning technique that itself is sufficient for circuits having the MTC property. The second phase further reduces the required number of test vectors for non-MTC circuits by forming linear sum of the test signals that were obtained at the end of the first phase. We will present detailed examples which will show the results superior to the previous techniques and also show how the techniques can improve the original two-phase algorithm.

PHASE ONE: CLUSTER PARTITIONING
It is well known that the same test signal can be applied to two inputs of a circuit under test (CUT) if none of the outputs is functionally dependent on both of the inputs. This fact can be used to reduce the required number of test signals. The problem of systematically reducing the number of test signals in this fashion while still exhaustively testing the CUT has been formulated as a covering problem of the cliques of a graph [10]. However, the problem of finding the cliques of a graph, a maximal complete sub-graphs of the original graph, is NP-complete. Thus, an efficient heuristic algorithm is needed to obtain a practical implementation for large problems.  An efficient "clique partitioning" algorithm has been proposed to solve an analogous problem of resource sharing which occurs in the area of high-level data-path synthesis [17]. (We prefer the term "cluster partitioning" since the partitions that are obtained are not necessarily cliques.) Further investigation of the heuristic algorithm in the context of data-path synthesis has shown that a smaller number of disjoint sets can be obtained if certain priorities in the algorithm are reversed. These algorithms have polynomial time complexity and as we will show, yield excellent results when applied to the test generation problem.
Before we discuss the phase one algorithm in detail, some terms used in [10]  x, corresponding to the n input lines. There is an edge between node x and node x if and only if the pair (x, x) is nonadjacent, i.e., the two inputs x and x can be tested by the same test signal. Definition 4: The replacement of two circuit inputs by a single common test signal is represented in the graph by a composite node. In other words, we replace the two nodes xi and x and the edge joining them with a single composite node labeled as (xi, x). As in Ref [10], a non-adjacency (NA) graph is used to determine which sets of circuit inputs can be connected to common test signals. Specifically, a set of input lines can all be connected to the same test signal if they form a clique in the NA graph. Therefore, the objective is to find the minimum number of disjoint cliques in the NA graph. However, rather than using a covering procedure, we exploit the "neighborhood property" of Ref [17] to obtain a polynomial-time procedure. Definition 5: A node x in the NA graph is said to be a common neighbor of an edge (xi, xi) if edges exist from Xk to both xi and xi. Definition 6: A deleted edge of a composite node (x, xi) can arise in either of the following three ways" (1) A node X k which is not a common neighbor of x and x will no longer be connected to the composite node (x, x). Thus, the edge (x, Xk) or (xi, Xk) has to be deleted, as appropriate.
(2) A node Xk which is a common neighbor of xi and x will still be connected to the composite node (xe, x.). However, only one of the original two edges needs to be retained. Thus, one of the edges (xg, Xk) or (xi, Xk) has to be deleted. In this algorithm, the edge (xi, x) will be deleted if j > i. Otherwise, the edge (xi, Xk) will be deleted.
Definition 7: A candidate pair (xp, Xq) in the NA graph is determined by the following criteria (listed in order of priority)" (1) (xp, Xq) has the minimum number of deleted edges.
The above order of priority for pairing nodes is the reverse of that recommended in the "clique partitioning" approach of Ref [17]. It is found that while the above order leads to better designs in data-path synthesis problems [18], either priority scheme yields excellent results for the test generation problem.
The cluster partitioning algorithm can now be stated in the following way: Step 1: Establish the NA graph using the functional dependency sets F of the rn outputs of the CUT.
Step 2: Traverse. the list of edges in the NA graph. Step 3: Find the candidate pair (x, Xq). Choose the smaller of p and q as the head of the cluster. Remove the deleted edges of the composite node (xe, Xq). Update the list of edges accordingly and recompute the numbers of common neighbors and deleted edges. If the list of edges is empty, then cluster partitioning is complete.
Step 4: Assume xp is the head of the current cluster.
Find a candidate pair which loins node Xp and another node, x. Choose the smaller of p and r as the head of the resulting cluster. Remove the deleted edges of the composite node (xp, Xr). Update the list of edges accordingly and recompute the numbers of common neighbors and deleted edges. If the list of edges is empty, then the cluster partitioning is complete. Otherwise, if node xo (or x if r < p) no longer appears in the updated list of edges, go to step 3 and start 188 CHIEN-IN HENRY CHEN to form another cluster. Otherwise, repeat step 4 and continue to find other nodes to add to the current cluster.
The p disjoint clusters obtained through the above cluster partitioning algorithm are equivalent to the number of required test signals, i.e. the test set T {s, s2, s}. Each element of T is composed of the CUT inputs which have been formed into a cluster. All inputs in the cluster s are represented by the same symbol xi, which is the input in the cluster having the smallest numerical value, and the functional dependency sets are now updated accordingly. If w, the maximum number of inputs upon which any output depends is equal to p (i.e., the MTC situation) then there is no need to proceed with phase two. Otherwise, the phase two algorithm described in Section 3 must be subsequently applied to obtain the optimized test generation hardware.
In order to demonstrate the power and computational efficiency of the above cluster partitioning algorithm, we have tested it on several very large problems. In Ref. [19], several large circuits have been proposed as benchmarks for test generation algorithms. Of these, the circuits C880, C2670, C5315 and C7552 have the property that no primary output depends on all the primary inputs, so that pseudo-exhaustive testing techniques may be used. After applying the phase one algorithm to these four circuits, we find that all of them are MTC circuits. The number of test signals that are required for pseudo-exhaustively testing each of these circuits is shown in Table I. The computation times (on a Sun 3/160 workstation) required to obtain these results are also listed in the table. These circuits contain up to several thousand gates and several hundred input/ output lines, yet the computation time in all cases is quite reasonable. Comparable results using the opposite priority ordering (i.e., using "clique partitioning") are also shown in Table I. As can be seen, application of either of these "data-path synthesis" procedures yields excellent results for these large examples.

PHASE TWO: FORMATION OF LINEAR SUM
For non-MTC circuits, the number of required test signals can be further reduced by using linear combinations of a smaller number of signals. Akers has proposed a linear sum approach for the test generation problem [12]. However, our approach differs from his in two respects. First, we do not begin to form linear sum until reductions in the number of test signals by the phase one procedure have been obtained. Second, this procedure seeks to find the minimal number of exclusive-or (XOR) gates which must be added. These two factors usually result in significant savings in both hardware and number of test patterns that are required to test non-MTC circuits.
In the phase two procedure [1] The new phase two algorithm is listed as follows.
(In the following discussion, we use p to represent the number of disjoint clusters partitioned by the phase one algorithm, and it is different from the index p representing the node x in the phase one algorithm.) Step Step 3: (i) (one step look-ahead) If < p w, choose a cluster C of size two in the NA/ graph which has the largest number of elements that are also members of the previous clusters Cg, k < from the graphs NAg.
(Note that these clusters are not necessarily disjoint since they are obtained from different NA graphs.) If there is more than one such cluster, then choose the cluster which will result in the maximum number of edges in the NA/+ graph. (ii) If p w, then arbitrarily choose any cluster C of size two in the NA graph. ( Step 4: Replace the test signal for et by the XOR of the two signals in the cluster found in the previous step. Update the functional dependency sets accordingly.
Step 5: If p w, then the phase two algorithm is complete and the size of Fd is the number of required test signals. Otherwise, set + 1 and go to step 2.
The results of applying the combined two-phase algorithm to four non-MTC circuit examples are shown in Figures 1-3 and 4(a)--(d). Note that the number of required XOR gates is quite small in all cases. Table II compares the number of required test patterns for these four examples using this approach ("ADVANCED TWO PHASE") with the number obtained using six previously proposed test generation methods ("TWO PHASE" [1], "SDC" [10], "LFSRs/XORs" [12], "CWC" [11], "Condensed LFSR" [14] and "LFSR with Cyclic Code" [15]). As can be seen from the Table II, the advanced twophase algorithm requires fewer test patterns in all cases.
The following is an application of the advanced two-phase algorithm to the non-MTC circuit of example 4 to demonstrate the steps of the algorithm and show how the new techniques can further reduce the number of test signals than does the original twophase algorithm.
Example: Consider the circuit given in Figure 4(a).
In step 1 of the phase one algorithm, the NA graph is established as shown in Figure 5.       x6. The NA2 graph is established by the following sets. This is shown in Figure 6 Based on Advanced Two-Phase Cluster Partitioning Algorithm, a design generator named BISTSYN has been developed and implemented to facilitate the BIST design. The input to the design generator can be either a circuit description at the gate level which is viewed as a netlist or the circuit output functional dependency sets. BISTSYN provides the BIST mechanisms as the output.