A Low-Cost BIST Scheme for Test Vector Embedding in Accumulator-Generated Sequences

Test set embedding built-in self test (BIST) schemes are a class of pseudorandom BIST techniques where the test set is embedded into the sequence generated by the BIST pattern generator, and they displace common pseudorandom schemes in cases where reverse-order simulation cannot be applied. Single-seed embedding schemes embed the test set into a single sequence and demand extremely small hardware overhead since no additional control or memory to reconfigure the test pattern generator is required. The challenge in this class of schemes is to choose the best pattern generator among various candidate configurations. This, in turn, calls for a need to evaluate the location of each test pattern in the sequence as fast as possible, in order to try as many candidate configurations as possible for the test pattern generator. This problem is known as the test vector-embedding problem. In this paper we present a novel solution to the test vector-embedding problem for sequences generated by accumulators. The time overhead of the solution is of the order O(1). The applicability of the presented method for embedding test sets for the testing of real-world circuits is investigated through experimental results in some well-known benchmarks; comparisons with previously proposed schemes indicate that comparable test lengths are achieved, while the time required for the calculations is accelerated by more than 30 times.


INTRODUCTION
The problem of testing VLSI chips is becoming more and more time-and memory-consuming.For the testing of the chips fabricated today, complicated testing scenarios are applied, which incorporate both external testers and on-chip resources.The latter fall into the category of built-in self-test (BIST) techniques that provide for test pattern generation and response verification operations on chip [1].
BIST pattern generators apply the test vectors to the inputs of the circuit under test.The effectiveness of a BIST pattern generator is judged by the hardware overhead imposed on the circuit, the length of the applied test sequence, and the impact on the timing parameters of the circuit.In pseudorandom BIST schemes [2], either easily synthesizable modules (i.e., modules that can be easily implemented by altering existing registers, such as linear feedback shift registers or cellular automata [3]) or modules that already exist into VLSI chips (e.g., counters or accumulators [4]) are utilized for the generation of test patterns.
With pseudorandom BIST, both the hardware overhead and the impact on the circuit timing parameters are kept low.In order to reduce the length of the pseudorandom sequence, test vector embedding [5,6] has been proposed.With test vector embedding, a precomputed (deterministic) test set is embedded into a sequence generated by a pseudorandom generator.In this way, the number of the applied pseudorandom patterns is decreased, without affecting the hardware or the impact on the timing parameters; furthermore, such schemes apply when reverse-order simulation [7] cannot be applied.In test vector embedding schemes, an embedding algorithm [5] is used.An embedding algorithm calculates the location (L) of a vector (V ) in a sequence generated by the generator that starts from a specific value.Thus, L is the number of cycles that the generator module needs to operate until V appears at its outputs.
Embedding test vectors into sequences generated by hardware modules has been the goal of various researchers.For example, Lempel et al. [5] presented an algorithm for embedding test vectors into sequences generated by LFSRs, utilizing results of the theory of discrete logarithms, based on the results of [8,9] in time proportional to O(2 n ), where n is the number of stages of the LFSRs.Kagaris and Tragoudas [10,18] have presented results on embedding test vectors into counter-based sequences by using permutation and complementation operations on the counter outputs.
Current VLSI chips (implementing data paths or digital signal processors) commonly contain accumulators (see Figure 1); thereby, the utilization of such modules for the generation of test patterns or verification of responses of a circuit under test has no impact on the circuit timing parameters.For example, in [12] a response compaction scheme based on accumulators is presented for the testing of RAM modules.In [13], a scheme was presented that generates weighted patterns, that is, patterns where the probability of an output is different from 0.5 (purely pseudorandom vectors) based on a properly modified accumulator module.The pseudorandom nature of accumulator-generated sequences has been studied in [4].Dorsch and Wunderlich [6] presented a test vector embedding approach utilizing accumulators and results of the theory on reducedorder binary decision diagrams [9].Independently, Stroele and Meyer [7] explored methods to reduce test application time for accumulator-based self-test by skipping test patterns and utilizing reverse-order simulation.Recently, Manich et al. [14,15] further advanced the field by presenting a scheme that minimizes memory requirements for storing the seeds and addends that feed the accumulator inputs.Their scheme is based on the observation that by using as addends the test patterns extracted from an automatic test pattern generator tool, the fault coverage is increased.
The above-mentioned schemes [6,7,14,15] are based on the utilization of multiple seeds, where the accumulator is loaded with different values during the test phases and different addends feed the accumulator inputs.Therefore, they share the common need to store the accumulator addends and seeds, as well as additional control to handle the BIST operations.However, the requirement for additional memory and control for BIST purposes cannot always be met.In certain low-budget applications, the BIST hardware overhead needs to be as simple as possible.In these cases, singleseed solutions, where the test pattern generator is initialized and left to operate for a predetermined number of cycles until all faults under question are detected, may be a preferable solution.The cornerstone of such schemes is their test set embedding algorithm.However, the problem of embedding test patterns into hardware-generated sequences has been typi- cally considered to be of exponential complexity (see, e.g., [5]).
In a recent work [11], a solution to the problem of embedding a test pattern into an accumulator-generated sequence was presented, which depends on the number of the stages of the accumulator; that is, it is of the order O(n).However, when a test set of T vectors has to be embedded, the complexity becomes O(n × T).
Nikolos et al. [16] exploited ideas of the number theory [17] in order to speed up the calculation of the locations of the test patterns of the test set.More precisely, they found a way to speed up the calculation from O(n × T) to O(n + T); therefore, the time required to calculate the locations of all the patterns of the test set is reduced by a factor that ranges from 16 to 29 times compared to [11].
The hardware overhead of single-seed accumulatorbased BIST schemes is extremely low, since the need for storage is eliminated; for example, the module presented in Figure 1 can be easily configured in such way that the inputs of the accumulator are driven by the outputs of a register of the register file.Hence, the hardware overhead is minimal, compared even to LFSR-based schemes, where the hardware overhead is n two-way multiplexers (n is the width of the LFSR).In Figure 2, the configuration of a 4-stage accumulator that accumulates the pattern 1001 is presented.The register A (one of the registers of the register file of Figure 1) is set to the 1001 value.
Another advantage of the schemes in [11,16] is that no additional reordering of the inputs is required, as has been proposed by other schemes (see, e.g., [10,18]); therefore, the data path does not have to be reconfigured during the BIST operations.
In this paper, we present a novel solution to the problem of embedding test patterns into accumulator-generated sequences; more precisely, we prove that the location of a pattern V in the sequence generated by an n-stage accumulator containing one's complement adder that accumulates the pattern B = 2 b , where b is an integer, can be calculated by a simple formula; hence the embedding algorithm is of the order O(1).To the best of our knowledge, this is the only result on embedding test vectors of the order O(1) presented in the literature.Comparisons with previous single-seed accumulator-based schemes [11,16] indicate that significant reduction is achieved in the calculation time to embed the test set, while the length of the resulting sequence is comparable.
The proposed scheme may be well incorporated into a generic scheme for the testing of processor cores, since it can be effectively utilized to test combinational parts of the core.For example, as shown in Section 3.3, the testing of a specific benchmark (c6288), which is a 16 × 16 array multiplier, can be performed in realistically low time.
The paper is organized as follows.In Section 2, we present the theory underlying the proposed scheme.In Section 3, comparisons with previously proposed schemes [11,16] are performed.Finally, in Section 4, we conclude the paper.
It has been proved (see, e.g., [4]) that an accumulator with one's complement adder starting from a nonzero value and accumulating a constant pattern B generates all nonzero vectors if and only if B is mutually prime with N − 1.Therefore, an (N − 1, B)-sequence as described in Definition 1 generates all N − 1n-bit vectors since N − 1 and 2 b are always prime.Indeed, the only numbers dividing 2 b are 2, 2 2 , 2 3 , . . ., 2 b−1 .On the other hand, N −1 is an odd number since N = 2 n is even.
From Definition 1, an accumulator containing a carryrotate adder that accumulates a constant pattern B generates an (N − 1, B)-sequence.
From the definition of the (N − 1, B)-sequence, it is evident that for every value of n there exist exactly n(N − 1, B)sequences, one for each number B = 2 b , for 0 ≤ b < n.For example, for n = 4, the four (15, B)-sequences are presented in Table 1.
Definition 2. The location of a vector V in an (N − 1, B)sequence, denoted by L(N − 1, B, V ), is the position of the vector V in the (N − 1, B)-sequence starting from 0.
Following Definitions 1 and 2, the problem of embedding a vector V in a sequence generated by an n-stage accumulator fed by a constant pattern B is transformed into calculating L(N − 1, B, V ).The presented scheme is based on Theorem 1.In the sequel, "mod" will be used to in Algorithm 1: C-language routine implementing the presented scheme.
dicate the remainder of the division of two integer numbers.
Proof.It is enough to prove that Indeed, since from the definition of the mod and div operators From this, we have In Algorithm 1, the C function implementing the formula (1) is presented.
Example 1. Suppose we want to utilize the results of Theorem 1 in order to calculate the location of V = 01001 = 9 in the sequence generated by a 5-stage accumulator with one's complement adder accumulating the constant value It is easy to see that (5 × 8) mod 31 = 9; therefore, L = (31, 8, 9) = 5.Thus, V = 9 is expected at the 5th position in the (31, 8)-sequence.From Table 5 of the appendix, we can see that, indeed, this is the case.

COMPARISONS
In this section, we will evaluate the proposed scheme in three directions.At first, we will compare the proposed scheme to serial-and linear-search algorithms with respect to the time required to calculate the locations of all 2 n patterns.This first comparison is indicative of the speed of the algorithm.Then, we will perform comparisons with the scheme proposed in [11] for randomly generated test sets.The purpose of this comparison is to investigate the effect of narrowing the search space from 2 n (in [11,16]) to n (in the proposed scheme).Finally, we will compare the proposed scheme for test sets of real benchmarks, from the ISCAS'85 suite [19], in order to investigate the applicability of the proposed scheme in real-world circuits.

Comparisons with serial-and linear-search algorithms
We implemented the serial-search algorithm that examines the 2 n − 1 test vectors until the target vector is found (this algorithm operates in O (2 n ) time and is, therefore, representative of the exponential time algorithms) as well as the linearsearch algorithm [11], that is representative of the O(n) time algorithms.The C-language routine we utilized for the implementation of the serial-search algorithm is shown in Algorithm 2. For the linear search, we utilized the algorithm given in [11].We run C-programs in order to find the locations of all nonzero N − 1 = 2 n − 1 vectors for a single seed (B) of the accumulator (the computer used was Pentium III 933 MHz, with 256 MB of RAM) for various values of n.The execution times of the programs are presented in Table 2.For the calculation of the time required by the serial search for values of n ≥ 20, we simulated the time required to calculate the locations of a number of vectors 2 i , for i < n, and then projected these values to the total number of vectors.. From Table 2, it is evident that as the number of bits increases, the time required by the exponential as well as the linear-search algorithms may become impractical, whilst the time required by the presented method remains interestingly low.

Comparisons for randomly generated patterns
In order to validate the applicability of the presented algorithm and the quality of the applied test sequence, we performed simulations to embed sets of test patterns into accumulator-generated sequences.Our aim was to choose a "good" pair of numbers (B, S), where B is the constant value accumulated and S is the initial value of the accumulator such that the whole test set is generated in as few cycles as possible.We performed simulations utilizing random vectors generated by the random function of C for various values of the CUT inputs.
We experimented with all n candidate values of the input vector B. For every value of B, we kept L min and L max , that is, the locations where the first and last vectors of the test set were generated.We also calculated the distance as d = L max − L min .Every time a value of B was found that generated the test set within fewer cycles, that is, d < d min , d was assigned to d min and the new values of B and d min were stored.For the calculation of the initial seed of the accumulator, denoted by S, it is trivial to see that if the accumulator is initialized to the starting value S calculated by S = (B × L min ) mod N − 1. and operates for d min clock cycles, then the whole test set is generated.Therefore, in the sequel, S is not stated explicitly since it can be directly calculated from B and L min .Every time a value of B was found that generated the test set within fewer cycles, that is, d < d min , d was assigned to d min and the new values of B and d min were kept.
For each value of n (the CUT inputs), we performed experiments for four values of T (the number of test vectors in the test set), namely, 10, 20, 50, and 100.For each value, we performed three experiments.The average value of these three experiments is presented in Table 3.In Table 3, the first column presents the number of the inputs of the CUT.The second column presents T, the number of (randomly generated) vectors of the test set.The third column presents the minimum distance d min given by the scheme in [11].The fourth column presents the minimum distance for the proposed scheme.The fifth column presents the % difference in d min of the proposed scheme over the one in [11].The value of d min is, in general, expected to be higher than the one in [11], since in the proposed work, we have a smaller solution space (n instead of 2 n−1 in [11]).In the sixth column, the expected mean value of The last cell of the table (rightmost cell of the last line) indicates the average increase of d min of the proposed scheme over the scheme of [11].This cell indicates an average increase of 0.39%, that is, negligible.Therefore, we can conclude that the quality of the test sequence of the proposed scheme (measured by the length of the sequence) is comparable to that of [11,16].Furthermore, by comparing the values of the fourth and sixth columns, we can see that the values of d min given by the proposed scheme are smaller than the expected mean value of d, E [d].
Next, we investigated the relationship of d min with d max (the maximum value of d for each experiment).We present the value of the ratio d min /d max for various values of n and T in Figure 3. From Figure 3, it is extracted that-as we expected-the smaller the value of T, the better the performance of the embedding task, since this gives lower values for the quantity d min /d max .Furthermore, for small values of the number of patterns (i.e., no. 10 and no.20), we can see a trend for decrease as the number of inputs of the test set increases.

Comparisons for benchmark circuits
In order to illustrate the applicability of the presented scheme in real-world circuits, we applied our embedding algorithm to test sets extracted by COMPACTEST [20] for the IS-CAS'85 circuits [19]; the fault coverage achieved by the utilized test patterns scales over 99% of the detectable single stuck-at faults.The BIST community generally considers the ISCAS'85 as good platforms for evaluating testing methodologies.Following the rationale of [10,18], we have considered that the test set, once given, is not altered.This approach is mostly favorable when embedded modules such as intellectual property (IP) cores are utilized, whose inner structure is not available to the test designer; in such cases, the test designer utilizes test sets given by the designers of the modules and cannot exploit techniques such as reverse-order simulation (see, e.g., [7]).The scheme in [11] introduced a linear-time algorithm to calculate the location of a test pattern in a sequence generated by an accumulator that accumulates a constant pattern.Nikolos et al. [16] proposed the use of the Diophantine equation in order to calculate the location of one of the test patterns of the test set.For the remaining patterns, they utilized a formula given in [17], eliminating the need to resolve the Diophantine equation.Therefore, they achieved a reduction of 16-29 times; that is, the time is reduced to 3.45% (in the best case) of the time reported in [11].
In Table 4, we present comparison data for the three schemes.In the first three columns, the circuit name, the number of its inputs, and the number of vectors extracted by COMPACTEST are presented.The fault coverage of these patterns (for single stuck-at faults) is over 99%.In the fourth column, the test length reported by [11,16] is shown (the test lengths of [11,16] are similar, having a difference from 0% to 2.8%, i.e., negligible) and in the fifth column, the test length of the proposed scheme is illustrated.In the sixth column, the % difference of the test length is presented (the ≈ symbol is used since the test lengths of [11,16] are not exactly equal); in the last row of the table, the calculation times of the three schemes are presented.It is noted that the complexity of the scheme presented in [11] is O(n × T), where T is the number of patterns in the test set, the complexity of [16] is O(n + T), and the complexity of the proposed scheme is O(1 × T).
The results reveal that for almost all benchmarks, the proposed scheme results in test sequence lengths that are comparable to those reported in [11,16]; in one case (c1908), it even outperforms previous schemes.This result is somewhat "strange" since the space of solution of [11], which includes a much larger set of candidate values for the addend, should give much better results with respect to the value of d min .However, this paradox may be deciphered as follows.
In the experiments conducted in [11], since the times were prohibitively large, simulation was stopped after a predetermined time limit, for example, 20 minutes.Therefore, although better solutions might exist, they were not found.The scheme proposed in this work, due to the extremely low time required for the calculations, exhausts the solution space very quickly.
As for the calculation time, the proposed scheme always requires less than 0.7 seconds; that is, it is reduced by ≈850 times compared to the time of [11] (i.e., less than 0. 12%) and ≈30 times compared to the scheme in [16].
It should be further noted that the proposed scheme could be utilized in combination with the schemes proposed in [11,16] as follows.At first, a very quick (in orders of milliseconds) test can be performed using the scheme proposed here in order to investigate if an acceptable test length can be achieved.If the test length given by the proposed scheme is acceptable, then the procedure stops; if a shorter test sequence is required, then the schemes in [11,16] can be utilized and left to run for longer test time in order to try to find such a shorter sequence.Furthermore, if the achieved test length is not acceptable, then we may resort to alternative solutions like reseeding and multiple addends; such solutions are the subject of ongoing research.

CONCLUSIONS
A novel solution has been presented to the problem of embedding test vectors into sequences generated by accumulators containing one's complement adders.The presented solution calculates the location of a pattern into the sequence generated by a carry-rotate accumulator accumulating a constant pattern B of the form B = 2 b .The time complexity of the presented algorithm is of the order O(1).To the best of our knowledge, this is the first solution to the problem of embedding patterns into hardware-generated sequences of the order O(1) presented in the literature.Comparisons with previous schemes based on exponentiall O(2 n ) and linear O(n) time algorithms reveal that the proposed scheme results in comparable test lengths, in noticeably lower time.
For the examined ISCAS85 benchmark circuits, the reduction in time to embed the test set is 30 times lower than [16] and 850 times lower than [11].

Figure 3 :
Figure 3: d min /d max for various values of n and T.

Table 2 :
Execution times of serial and linear searches versus the proposed O(1) time algorithm.

Table 3 :
Test set embedding into randomly generated test sets (average of three experiments).

Table 4 :
Simulation results for the ISCAS'85 circuits.