BIST Hardware Synthesis for RTL Data Paths ... - Semantic Scholar

7 downloads 2157 Views 389KB Size Report
a single signature analysis register is required to compress the responses of each TCC which leads ... To illustrate TCC grouping methodology e ciency, various.
BIST Hardware Synthesis for RTL Data Paths Based on Test Compatibility Classes Nicola Nicolici, Bashir M. Al-Hashimi, Andrew D. Brown, and Alan C. Williams

Mr. Nicola Nicolici Dr. Bashir M. Al-Hashimi Professor Andrew D. Brown Dr. Alan C. Williams Department of Electronics and Computer Science University of Southampton Southampton SO17 1BJ U.K.

Contact address: Dr. Bashir M. Al-Hashimi, MIEE, CEng Electronic System Design Group Department of Electronics and Computer Science University of Southampton Southampton SO17 1BJ U.K. Tel: +44-1703-593374 Fax: +44-1703-592901 Email: [email protected] s.ac.uk

A short and preliminary version of this work is published in: Proceedings of the Design, Automation and Test in Europe (DATE), 1999, pp. 289-295

BIST Hardware Synthesis for RTL Data Paths Based on Test Compatibility Classes Abstract New BIST methodology for RTL data paths is presented. The proposed BIST methodology takes advantage of the structural information of RTL data path and reduces the test application time by grouping same-type modules into test compatibility classes (TCCs). During testing, compatible modules share a small number of test pattern generators at the same test time leading to signi cant reductions in BIST area overhead, performance degradation and test application time. Module output responses from each TCC are checked by comparators leading to substantial reduction in fault-escape probability. Only a single signature analysis register is required to compress the responses of each TCC which leads to high reductions in volume of output data and overall test application time (the sum of test application time and shifting time required to shift out test responses). This paper shows how the proposed TCC grouping methodology is a general case of the traditional BIST embedding methodology for RTL data paths with both uniform and variable bit width. Due to the complexity of the testable design space, a new BIST hardware synthesis algorithm is given. The proposed BIST hardware synthesis algorithm employs ecient tabu search-based testable design space exploration which combines the accuracy of incremental test scheduling algorithms and the exploration speed of test scheduling algorithms based on xed test resource allocation. The huge size of the testable design space is initially reduced during the local neighborhood search and it is further shrunk by an incremental TCC scheduling algorithm using simultaneous test scheduling and signature analysis registers allocation. To illustrate TCC grouping methodology eciency, various benchmark and complex hypothetical data paths have been evaluated. When compared with BIST embedding methodology it has been shown that the TCC grouping reduces test application time up to 50%, produces saving in performance degradation up to 67%, and in volume of output data up to 94%. Moreover when considering the shifting time necessary to shift out test responses the reduction in overall test application time is up to 61%. Exponential reduction in fault-escape probability with comparable or even lower BIST area overhead is achieved. Furthermore, the computational time is low allowing BIST hardware to be synthesized in acceptable time scale ( Cx if (Tx > Tx ) or (Tx = Tx and Ax > Ax ) 1

2

1

2

1

2

1

2

The main objective of the cost function is test application time with BIST area overhead used as tie-breaking mechanism among many possible solutions with same test application time. It should be noted that the minimization of other parameters outlined in section 2, performance degradation, volume of output data, overall test application time and fault escape probability, is a by-product of the proposed optimzation using the previously de ned cost function. Based on the value of the cost function and on the tabu status of a move, a new solution is accepted or rejected as described from lines 14 to 19 in Figure 3. The tabu list contains registers involved in a move as described in section 3.2. A move is classi ed as tabu if a register involved in the move is present in the tabu list. The tabu tenure (length of the tabu list) varies from 5 (small designs) to 10 (complex designs). A move is aspirated as shown in line 14 if it has produced a solution which is better than the best solution reached so far. The testable design space exploration continues until the number of iterations since the previous best solution exceeds a prede ned Niter . 13

ALGORITHM: Testable Design Space Exploration INPUT: Data Path DP OUTPUT: Fully Testable Data Path FT-DPbest 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

for every module Ma from DP with a = 1; : : : ; nmod do for every input port IPk of Ma with k = 1; 2 do

choose randomly Rx from IRS (Ma ; IPk ) and assign it to perform TPGF (this results into PT-DPinit ) PT-DPcurrent PT-DPinit

repeat

for each register Rx from PT-DPcurrent with x = 1; : : : ; nreg do f

generate the new solution PT-DPx (section 3.2) generate a global test incompatibility graph T using PT-DPx (section 3.3.1) generate test schedule Sx and fully testable data path FT-DPx using T and PT-DPx by simultaneous test scheduling and signature analysis registers allocation (section 3.3.2) compute test application time Tx using test schedule Sx compute BIST area overhead Ax using FT-DPx

g for each FT-DPx ordered using Tx and Ax do f if not tabu(FT-DPx) or aspirated(FT-DPx) then f PT-DPcurrent PT-DPx if best solution so far then FT-DPbest FT-DPx

g

break

g until iterations since previous best solution > Niter return FT-DPbest Figure 3: Tabu search-based testable design space exploration

3.2 Generation of new solutions and speed up techniques for local neighborhood search The neighborhood of the current solution in the testable design space PT-DPcurrent is de ned with nreg feasible neighbor solutions. For each data path register there is a single neighbor solution. Each of the nreg solutions is provided by an independent subroutine designed to identify better con guration of test registers based on two new metrics. Due to the huge size and complexity of the testable design space, speed up techniques for 14

ecient exploration are required. Before de ning the neighbor solution for each register two new metrics and a theorem used for reducing the testable design space are presented.

De nition 6 The current spatial sharing degree CSSD (Rx; j; IPk ) of register Rx for input port k (IPk ) of module-type j is the number of modules of j for which Rx performs test pattern generation function (TPGF) for IPk in the current partially testable data path.

De nition 7 The maximum spatial sharing degree MSSD (Rx; j; IPk ) of register Rx for

input port k (IPk ) of module-type j is the number of modules of j for which Rx can perform TPGF for IPk . The value of MSSD (Rx; j; IPk ) is the cardinality of the set of modules of module-type j whose IPk is connected to Rx through only multiplexers.

Theorem 1 Consider two current solutions, PT-DPcurrent and PT-DPcurrent, with di er1

2

ent CSSD (Rx; j; IPk ) for given Rx, j and IPk . In PT-DPcurrent the current spatial sharing degree is 0 < CSSD (Rx; j; IPk ) < MSSD (Rx; j; IPk ), whilst in PT-DPcurrent the current spatial sharing degree is CSSD (Rx; j; IPk ) = MSSD (Rx; j; IPk ). Then PT-DPcurrent has at most the number of TCCs as PT-DPcurrent. Proof: Let fM ; : : : ; Mng be the set of modules of module-type j whose IPk is connected to Rx through only multiplexers. In PT-DPcurrent, Rx performs TPGF for fM ; : : : ; Mt g, whilst fRy1 ; Ry2 ; : : : ; Rym g perform TPGF for fMt ; : : : ; Mng. Because CSSD (Rx; j; IPk ) > 0 all the incompatibilities with both same-type modules and di erent-type modules are already created. In PT-DPcurrent, by increasing CSSD (Rx; j; IPk ) to MSSD (Rx; j; IPk ) no more incompatibilities will be created, which implies that the number of TCCs is not increased. Furthermore, according to De nition 1 of section 2.2, each of fMt ; : : : ; Mng has one and only one test register that performs TPGF for IPk in PT-DPcurrent. Hence test registers fRy1 ; Ry2 ; : : : ; Rym g will not perform TPGF for IPk of fMt ; : : : ; Mng any more. This can lead to a decrease in CSSD (Ryi ; j; IPk ), where i = 1 : : : m. If any of the CSSD (Ryi ; j; IPk ) becomes 0, the incompatibilities are reduced and the number of TCCs is decreased in PT-DPcurrent.  1

2

2

1

1

1

1

+1

2

+1

2

+1

2

The above theorem presents a very important theoretical result which has the following two implications. The rst implication of the theoretical result of Theorem 1 reduces the total testable design space to the representative testable design space. The total testable design space consists of partially testable data paths with all the possible values 0CSSD (Rx; j; IPk )MSSD (Rx; j; IPk ) such that all the modules are assigned one and only one test pattern generator. The representative testable design space consists of partially 15

testable data paths for which CSSD (Rx; j; IPk ) is considered only MSSD (Rx; j; IPk ) such that all the modules are assigned one and only one test pattern generator. Consider the simple data path of Figure 1. In the rst case when the current spatial sharing degree for R is CSSD (R ; Atype; IP ) = 1 two more test registers LFSR and LFSR are necessary to generate test patterns for IP of modules A and A as shown in Figure 1(a). On the other hand when CSSD (R ; Atype; IP ) = MSSD (R ; Atype; IP ) = 3 only one test pattern generator is necessary to generate test patterns for IP of all the three modules as shown in Figure 1(b). The case when CSSD (R ; Atype; IP ) = 1 has greater BIST area overhead and performance degradation due to LFSR and LFSR . Furthermore if the simple data path of Figure 1 is a small part of a more complex data path, where LFSR and LFSR are already allocated to perform TPGF for di erent module-types, assigning LFSR and LFSR to perform TPGF for IP of A and A respectively, will introduce con icts between test resources leading to incompatible modules and hence increase in test application time. Theorem 1 justi es the reduction of the total testable design space where all the CSSD (R ; Atype; IP ) = f0; 1; 2; 3g are examined in the search of feasible partially testable data paths to the representative testable design space where only CSSD (R ; Atype; IP ) = 3 is considered. The second implication of the theoretical result of Theorem 1 is concerned with ecient generation of moves in the representative testable design space. Generation of a move in the testable design space for register Rx consists of two phases: 1

1

1

3

1

1

2

1

5

3

1

1

1

1

1

3

3

5

5

3

5

1

1

1

2

3

1

1

i. The rst phase computes: 4x(j; IPk ) = MSSD (Rx; j; IPk ) ? CSSD (Rx; j; IPk ); 4x is a metric that measures the di erence between the potential and actual use of Rx as a test pattern generator for IPk of j modules. Note there are 2nres values of 4x for each register Rx. ii. In the second phase the move for Rx that has the maximum value of 4max is chosen. If there are two or more jm and/or IPkn for which 4x(jm ; IPkn ) = 4max the move for jm and IPk with the maximum value of MSSD (Rx; jm ; IPk ) is chosen. Let jmax be the index of module-type and kmax be the index of input port for which 4max is maximum. Let fM ; : : : ; Mng be the set of modules of module-type jmax whose IPkmax is connected to Rx through only multiplexers. Before the move, Rx performs TPGF for fM ; : : : ; Mt g, whilst fRy1 ; Ry2 ; : : : ; Rym g perform TPGF for fMt ; : : : ; Mng. After the move, Rx performs TPGF for fM ; : : : ; Mng, whilst CSSD (Ryi ; jmax ; IPkmax ) are decreased, with i = 1 : : : m. The previously described two phases are repeated for each data path 1

+1

1

1

16

LFSR

A

LFSR

1

A

1

LFSR

2

A

2

LFSR

3

A

3

B

4

LFSR

4

B

1

B

2

5

3

B

4

B

4

(a) Before the move LFSR

A

LFSR

1

A

1

LFSR

2

A

2

3

3

A

4

R

4

B

1

R

B

B

2

5

3

(b) After the move

Figure 4: Example of a partially testable data path to illustrate generation of new solutions and speed up techniques for local neighborhood search register and hence a neighborhood of nreg feasible solutions is generated. Increasing the current spatial sharing degree of the selected test registers leads to a smaller number of test pattern generators and hence reductions in BIST area overhead and performance degradation. Furthermore, the number of incompatibilities between TCCs is decreased which leads to lower test application time. Moreover the most important feature of the local neighborhood search is the speed up technique for ecient exploration caused by reduction in the size of the testable design space to be explored. To illustrate generation of new solutions consider the following example.

Example 3 The move for register R of the data path shown in Figure 4(a), is ex1

plained in detail. R is con gured as an LFSR and generates test patterns for input port 2 (IP ) of B . The following metrics 4 (Atype ; IP ), 4 (Atype; IP ), 4 (Btype; IP ) and 4 (Btype; IP ) are computed. Because MSSD (R ; Atype; IP ) = 2 and CSSD (R ; Atype; IP ) = 0 the following 4 (Atype; IP ) = 2?0 = 2 is obtained. Similarly 4 (Atype; IP ) = 2?0 = 2 and 4 (Btype; IP ) = 0 ? 0 = 0. Since R already performs TPGF for IP of Btype then 4 (Btype; IP ) = 4 ? 1 = 3. Because 4 (Btype; IP ) is maximum the TPGF for IP of Btype is moved to R and area overhead is reduced because R and R do not perform TPGF any more (Figure 4(b)). 1

2

4

1

1

2

1

1

1

1

1

1

2

1

1

1

2

1

2

1

1

1

1

1

2

1

2

1

2

4

17

5

3.3 Incremental TCC scheduling algorithm So far the testable design space to be explored was reduced with respect to the number of test registers required for test pattern generation using the speed up techniques for local neighborhood search. The algorithms outlined in this section further shrink the size of the testable design space by considering simultaneous TCC scheduling and signature analysis registers allocation for each partially testable data path generated by local neighborhood search. The next two subsections introduce the steps which lead to a fully testable data path with low test application time and reduced BIST area overhead. Firstly the assignment of every data path module to test compatibility classes to maximize test concurrency is presented. Secondly the algorithm for simultaneous TCC scheduling and signature analysis registers allocation is described.

3.3.1 Generation of the global test incompatibility graph To achieve maximum test concurrency it is required that a large number of di erenttype test compatibility classes are compatible. Following the second property of TCCs (De nition 3-(ii)) a high number of incompatible modules are sought to be merged in a small number of incompatible TCCs. This will reduce the number of edges in the global test incompatibility graph which is de ned as follows. A global test incompatibility graph (G-TIG) is a graph where a node appears for every TCC and an edge exists between nodes TCCi;j and TCCk;l if test compatibility classes TCCi;j and TCCk;l are incompatible. All the edges from G-TIG belong to the edge set E. The generation of G-TIG is outlined in Figure 5. For each partial testable data path PT-DP generated by the previously described local neighborhood search a G-TIG is generated which is used for simultaneous test scheduling and signature analysis registers allocation. The generation of G-TIG is carried out in three steps. Step 1: The rst step assigns incompatible same-type modules into test compatibility classes and generates the initial G-TIG. Local test incompatibility graphs L-TIG(j ) are created for each module-type j , with j = 1: : :nres, of the current partially testable data path according to module incompatibility described in De nition 2. Each L-TIG is partitioned in TCCs using an ecient graph partitioning algorithm [20]. Step2: Data path modules that are incompatible with di erent-type modules are considered in the second step. A module is called an assigned module if it belongs to a TCC. An module is called an unassigned module if it is not assigned to any TCC during steps 1 and 2. If for two assigned di erent-type incompatible modules Ma and 18

ALGORITHM: Generate Global Test Incompatibility Graph (G-TIG) INPUT: Partially Testable Data Path PT-DP OUTPUT: Global Test Incompatibility Graph T 1 Step1: 2 3 Step2: 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 Step3: 30 31 32 33

for every module-type j from PT-DP with j = 1; : : : ; nres do

Partition same-type incompatible modules in TCCs and generate initial G-TIG T with edge set E for every pair of di erent-type incompatible modules (Ma ; Mb) do f j module-type(Ma); l module-type(Mb); if 9i such that Ma 2TCCi;j and 9k such that Mb 2TCCk;l then if (TCCi;j ,TCCk;l)62E then Add Edge (TCCi;j ,TCCk;l) to E if @ i2f1: : :nclasses(j )g such that Ma 2TCCi;j and 9k such that Mb 2TCCk;l then f if 9i such that (TCCi;j ,TCCk;l)2E then Add Ma to TCCi;j such that (TCCi;j ; Ma) is maximum

else f

Add Ma to TCCi;j such that (TCCi;j ; Ma) is maximum Add Edge (TCCi;j ,TCCk;l) to E

g g if @ i2f1: : :nclasses(j )g such that Ma 2TCCi;j and @k2f1: : :nclasses(l)g such that Mb2TCCk;l then f if 9i and 9k such that (TCCi;j ,TCCk;l)2E then f

Add Ma to TCCi;j such that (TCCi;j ; Ma) is maximum Add Mb to TCCk;l such that (TCCk;l; Mb ) is maximum

g else f g

Add Ma to TCCi;j such that (TCCi;j ; Ma) is maximum Add Mb to TCCk;l such that (TCCk;l; Mb ) is maximum Add Edge (TCCi;j ,TCCk;l) to E

g g for every module Mk that is compatible with all modules from PT-DP do f j module-type(Mc) Add Mc to TCCi;j such that (TCCi;j ; Mc) is maximum

g return T

Figure 5: Generation of global test incompatibility graph for maximum test concurrency 19

Mb , with Ma 2 TCCi;j and Mb 2 TCCk;l, test compatibility classes TCCi;j and TCCk;l are compatible, then an edge between TCCi;j and TCCk;l is added to G-TIG as shown from lines 5 to 7 in Figure 5. Unassigned modules should be assigned to the already existing TCCs such that the number of incompatibilities between di erent-type TCCs is decreased leading to maximum test concurrency. If there is an unassigned module Ma which is incompatible with an assigned module Mb, with Mb 2 TCCk;l, then an edge is sought in the edge set E between TCCk;l and a TCC of the same-type with Ma . If at least one edge is found Ma is added to TCCi;j such that: (TCCi;j ; Ma ) = jORS (TCCi;j [ Ma )j ? jORS (TCCi;j )j is maximum. If no edge is found then Ma is assigned to TCCi;j such that (TCCi;j ; Ma ) is maximum and a new edge between TCCi;j and TCCk;l is added to G-TIG as shown from lines 12 to 15. If  has the same value for all the classes of a module-type then the class with the lowest index is considered. The newly introduced measure  increases the number of potential signature analysis registers which has the following implications during the test scheduling process described in section 3.3.2:  reduction of con icts between signature analysis registers that are allocated during the test scheduling process leading to lower test application time;

 reuse of same signature analysis registers at di erent test times for di erent TCCs,

which has straight impact on BIST area overhead, performance degradation and shifting time required to shift out signatures when the test process is accomplished;

 minimization of the number of highly expensive CBILBOs required for testing the

self-loops in the data path since the di erence between the output register set of a TCC and the input register sets of every module from a TCC is maximized; Finally, if two unassigned di erent-type modules Ma and Mb are incompatible then the same assignment reasoning outlined previously is applied as shown from lines 17 to 27. Step3: In the third step unassigned modules Mc of module-type j which are compatible with all TCCs are added to a class i such that (TCCi;j ; Mc) is maximum. The proposed algorithm for generation of G-TIG guarantees by construction that every module is assigned to a TCC and the number of nodes and edges in G-TIG is minimum. This implies maximum test concurrency of the partially testable data path which is a good starting point for the incremental test scheduling algorithm.

Example 4 To illustrate the generation of G-TIG consider the data path shown in Figure 6. The module-type indexes of Atype , Btype and Ctype are 0,1 and 2 respectively. In the 20

LFSR

A

1

LFSR

A

1

R 12

LFSR

2

A

2

R 13

LFSR

3

A

3

R 14

LFSR

4

B

4

LFSR

5

B

1

LFSR

6

B

2

B

3

R 15

LFSR

7

B

4

R 16

LFSR

8

C

5

LFSR

9

C

1

R 17

10

LFSR

C

2

R 18

11

3

R 19

Figure 6: Data path example to illustrate the incremental TCC scheduling algorithm rst step, local test incompatibility graphs L-TIG(Atype), L-TIG(Btype), and L-TIG(Ctype) are created. Since modules that belong to Ctype do not share any test registers with same-type modules then L-TIG(Ctype) is void. The initial test compatibility classes are: TCC ; = fA g, TCC ; = fA g, TCC ; = fA g, TCC ; = fB ; B g, and TCC ; = fB ; B g. In the second step edges between TCCs representing di erent module-types are added to G-TIG. So far the only edges in the edge set are the ones between the TCCs of identical module-types: (TCC ; ; TCC ; ), (TCC ; ; TCC ; ), (TCC ; ; TCC ; ) and (TCC ; ; TCC ; ). Due to sharing of LFSR between A and B an edge between TCC ; and a TCC of Atype must be added to G-TIG. First of all, A is assigned to one of the TCCs of Atype. The computed output register set increase measures are (TCC ; ; A ) = 1, (TCC ; ; A ) = 1, and (TCC ; ; A ) = 1. All the three TCC ; , TCC ; and TCC ; are candidates. Due to the smallest class index of TCC ; , module A is assigned to TCC ; and the incompatibility edge (TCC ; ; TCC ; ) is added to G-TIG. In the third step B is assigned to one of the TCCs of Btype. The computed output register set increase measures are (TCC ; ; B ) = 0 and (TCC ; ; B ) = 1. B is assigned to TCC ; due to greater increase in the number of potential signature analysis registers. The test compatibility classes are TCC ; = fA ; A g, TCC ; = fA g, TCC ; = fA g, TCC ; = fB ; B g, TCC ; = fB ; B ; B g and TCC ; = fC ; C ; C g and the G-TIG is shown in Figure 8. 00

3

1

10

2

20

3

01

1

2

11

4

00

01

10

11

10

5

20

4

20

00

1

01

4

00

10

4

20

4

00

00

00

01

5

11

3

4

1

5

4

20

4

00

5

5

5

10

02

10

01

11

00

4

2

1

2

11

20

3

01

1

2

3

3.3.2 Simultaneous TCC scheduling and signature analysis registers allocation Test scheduling is performed using the global test incompatibility graph outlined in section 3.3.1. Fault sets of di erent-type modules have di erent detection probability pro les as outlined in section 2.2. Hence, TCCs of di erent module-types need di erent test application times to satisfy the required fault coverage. Thus, the TCC scheduling algorithm 21

ALGORITHM: Simultaneous TCC Scheduling and

Signature Analysis Registers Allocation

INPUT: Global Test Incompatibility Graph T Partially Testable Data Path PT-DP OUTPUT: Test Schedule S Fully Testable Data Path FT-DP 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

P

T ; S ?; Li ai ; Ii Li ? i + jjtj 2N fti g Lj ; while AN 6= ? do f M N (S ); C AN \ M ; C 0 C \ N (M ); while C 6= ? do f if C 0 6= ? then choose ti2C 0 where Ii is maximum else choose ti2C where Ii is maximum if 9 Rk 2 ORS (ti) and Rk 62 B then f if 9 Rk 2 U then choose Rk 2 U where fanin(Rk ) is maximum

AN

(

)

else

g

g

g

choose Rk 2 ORS (ti) where fanin(Rk ) is maximum AN AN ? ftig; B B [ fRk g; S S [ ftig; Ii Ii ? 1; Ij Ij ? 1 where tj 2N (fti g);

M M [ (N fti g \ AN ); C C [ (ftig [ N fti g); C 0

C \ N fM g;

time minijti 2S fLig; for all ti2S do f Li Li ? time; Ii Ii ? (time ? 1); for all tj 2N (ftig) do Ij Ij ? (time ? 1); if Li = 0 then f S S ? ftig; B B ? fRk g; U U [ fRk g;

g

g for all Rk 2U do

modify Rk from PT-DP into a test register in FT-DP

return (S,FT-DP)

Figure 7: Algorithm for simultaneous TCC scheduling and signature analysis registers allocation 22

deals with unequal test lengths. The test scheduling algorithm for partitioned testing with run to completion from [20] needs to be modi ed such that test scheduling and signature analysis registers allocation is done simultaneously, as shown in Figure 7. For the formal presentation of the algorithm, the sets and notations from [20] are preserved. Two more notations are introduced: U is the set of used test registers that have compressed output responses at a previous test time and B is the set of busy test registers that are compressing output responses at the current test time. The algorithm 2M from [20] schedules tests for a xed test resource allocation. The following three necessary modi cations are carried to perform simultaneous TCC scheduling and signature analysis register allocation. i. if all the registers in ORS (ti) are busy at the current test time then test ti is removed from the candidate node set being postponed for a later test time, as shown from lines 7 to 16; otherwise for every available register Rk in ORS (ti), it is checked whether Rk belongs to the used test register set and the Rk with the maximum fanin is chosen; this choice will allow Rk to be reused at a later test time. ii. when the shortest currently active test ti is completed, the test register Rk that has served as signature analysis register is removed from the busy register set B and added to the used register set U , as shown from lines 24 to 28. iii. after the completion of test scheduling all the registers from the used register set U are modi ed to signature analysis registers as shown in lines 30 and 31; the algorithm returns a test schedule S and a fully testable data path FT-DP which are used to compute test application time and BIST area overhead in the tabu search testable design space exploration (Figure 3). The rst modi cation solves the con icts between signature analysis registers during the test scheduling process reducing both the size of the testable design space to be explored and test application time. Thus the eciency of testable design space exploration is improved by combining the accuracy of incremental test scheduling algorithms with the exploration speed of test scheduling algorithms based on xed test resource allocation. The second and third modi cations reduce the number of signature analysis registers by reusing them at di erent test times leading to further reductions in BIST area overhead, performance degradation, and overall test application time.

Example 5 To illustrate the above three modi cations consider Figure 8 which shows

the G-TIG of the data path shown in Figure 6. It is assumed that the number of test 23

ORS(TCC TCC

TCC

1 ,0

TCC

0 ,0

ORS(TCC ORS(TCC TCC

TCC

2 ,0

TCC

0 ,1

0 ,0

) = {R

12

,R }

T =T

}

T =T

}

T =T

0

15

0 ,2

ORS(TCC

1 ,1

ORS(TCC ORS(TCC

1 ,0

2 ,0

0 ,1

1 ,1

0 ,2

) = {R

0

13

) = {R

0

14

) = {R , R }

T = 2x T

) = {R , R , R }

T = 2x T

) = {R , R , R }

T = 3x T

12

1

17

13

16

14

18

1

17

19

2

Figure 8: Global test incompatibility graph for data path of Figure 6 TCC 0 ,0 SA is R

TCC 1 ,0 SA is R

12

TCC

TCC

1 ,1

SA is R TCC

SA is R

13

14

0 ,1

SA is R

13

TCC 2 ,0

12

0 ,2

SA is R

14

T

0

2x T

3x T

4x T

Figure 9: TCC schedule for data path of Figure 6 using the proposed algorithm for simultaneous TCC scheduling and signature analysis registers allocation patterns to test Atype modules is T = T , the number of patterns to test of Btype modules is T = 2T , and the number of test patterns to test Ctype modules is T = 3T , with T a reasonable large integer. The rst scheduled test is TCC ; at test time 0. The signature analysis (SA) register for TCC ; is R which is chosen from ORS (TCC ; ) because its fanin is maximum. Two more tests, TCC ; and TCC ; are scheduled at test time 0. The signature analysis registers for TCC ; and TCC ; are R and R respectively. Registers R , R and R are added to the busy register set B . At test time T test TCC ; is completed and R is removed from B and added to the used register set U . The attempt to schedule TCC ; at test time T fails because R belongs to B . Similarly the attempt to schedule TCC ; at test time T fails because R also belongs to B . At test time 2T test TCC ; is completed and R 3 is removed from the busy register set B and added to the used register set U . A new attempt to schedule TCC ; succeeds and the allocated signature analysis register is R . In addition at test time 2T test TCC ; is scheduled and the chosen register from U is R . Finally at test time 3T test TCC ; is completed and test TCC ; is scheduled reusing the signature analysis register R . Note that only three test registers are used to analyze test responses. A graphical representation of the test schedule is presented in Figure 9. 0

0

2

00

00

12

00

11

02

11

12

13

02

13

14

14

00

12

11

10

13

20

14

1

10

13

01

12

02

20

14

24

4 Experimental results The BIST hardware synthesis for the TCC grouping methodology has been implemented on SUN SPARC 20 workstation using 6000 lines of C++ code. To give insight into the eciency of testability achieved using the presented approach Table 1 shows a comparison of BIST resources and test application time (TAT) using the BIST embedding methodology and the TCC grouping methodology. The results for the BIST embedding methodology were obtained using the same BIST hardware synthesis algorithm assuming that every pair of modules in the data path are di erent (nres = nmod ) as described in section 2.2. The comparison is carried out for a number of benchmark examples including elliptic wave digital lter (EWF) and 8 and 32 point discrete cosine transform (DCT). The benchmarks were synthesized using the ARGEN high-level synthesis system [31, 32] for di erent execution time constraints ranging from 10 to 40. Data paths consisting of di erent number of modules and registers have been generated by ARGEN. For example, in the case of EWF-17 we have 6 modules (MOD), 3 multipliers (*) and 3 adders (+), and 12 registers (REG). The test application time lengths of adders and multipliers are assumed to be T = T , and respectively T = 4T , where T is a reasonably large integer. In general the TCC grouping methodology produces less test registers than the BIST embedding methodology. For example, in the case of EWF-20 the number of LFSRs is reduced from 6 to 4, and the number of MISRs is reduced from 4 to 2. There is further reduction as the design complexity increases. For example, in the case of 32DCT-33 the number of LFSRs is reduced from 30 to 14, and the number of MISRs is reduced from 19 to 2. The reduction in test registers in case of TCC grouping is achieved at the expense of comparators. In the case of 32DCT-33 there are one 5 input comparator (C5), one 6 input comparator (C6) and one 8 input comparator (C8). However, the TCC grouping methodology requires reduced BIST area overhead when compared with the BIST embedding methodology as shown in Table 2 for data path width varying from 4-bit up to 16-bit. Note that the proposed BIST methodology is capable of dealing with variable bit width data paths as outlined in section 2.2. For benchmark circuit 32DCT-33 the reductions in BIST area overhead in terms of equivalent gates are of 45.63% in the case of 4-bit data path, 43.72% in the case of 8-bit data path and 42.65% in the case of 16-bit data path. But there are cases such as EWF-17 and 8DCT-13 where the BIST embedding methodology produces better BIST area overheads. This has been achieved without reaching the minimal TAT. For example circuits EWF-17 and 8DCT-13 reductions of 20% in TAT are achieved by the TCC grouping methodology. This result is derived using the TAT from +

25

Table 1: Comparison of BIST resources and test application time using the TCC grouping and the BIST embedding methodologies for benchmark examples M R BIST embedding TCC grouping CPU Design O E BIST resources TAT BIST resources TAT time D G Test registers Test Registers Comparators (s) EWF-17 3*,3+ 12 6 LFSR, 3 MISR 5T 5 LFSR, 2 MISR 2 C3 4T 1.05 EWF-18 2*,3+ 12 5 LFSR, 4 MISR 4T 5 LFSR, 1 MISR 1 C3, 1 C2 4T 2.17 EWF-19 2*,2+ 12 6 LFSR, 4 MISR 4T 4 LFSR, 2 MISR 2 C2 4T 0.73 EWF-20 2*,2+ 12 6 LFSR, 4 MISR 4T 4 LFSR, 2 MISR 2 C2 4T 0.75 EWF-21 2*,3+ 13 5 LFSR, 4 MISR 5T 5 LFSR, 2 MISR 2 C2 4T 1.58 EWF-23 1*,2+ 11 5 LFSR, 3 MISR 4T 4 LFSR, 2 MISR 1 C2 4T 0.69 8DCT-10 4*,4+ 15 7 LFSR, 6 MISR 5T 8 LFSR, 3 MISR 2 C2, 1 C4 4T 2.56 8DCT-11 4*,3+ 15 8 LFSR, 7 MISR 5T 6 LFSR, 2 MISR 1 C3, 1 C4 4T 2.57 8DCT-12 4*,3+ 16 8 LFSR, 6 MISR 5T 8 LFSR, 2 MISR 1 C3, 1 C4 4T 1.86 8DCT-13 4*,4+ 16 9 LFSR, 5 MISR 5T 8 LFSR, 2 MISR 2 C4 4T 2.67 8DCT-14 3*,3+ 16 9 LFSR, 6 MISR 4T 5 LFSR, 2 MISR 2 C3 4T 1.20 8DCT-16 3*,2+ 16 7 LFSR, 5 MISR 4T 5 LFSR, 2 MISR 1 C2, 1 C3 4T 1.15 32DCT-30 9*,12+ 60 33 LFSR, 21 MISR 4T 18 LFSR, 2 MISR 1 C5, 1 C7, 1 C9 4T 129.70 32DCT-31 9*,12+ 62 33 LFSR, 21 MISR 4T 19 LFSR, 2 MISR 1 C5, 1 C7, 1 C9 4T 124.40 32DCT-32 8*,12+ 62 32 LFSR, 20 MISR 4T 16 LFSR, 2 MISR 1 C4, 2 C8 4T 103.70 32DCT-33 8*,11+ 62 30 LFSR, 19 MISR 4T 14 LFSR, 2 MISR 1 C5, 1 C6, 1 C8 4T 55.00 32DCT-37 8*,9+ 63 26 LFSR, 17 MISR 4T 16 LFSR, 2 MISR 1 C3, 1 C6, 1 C8 4T 86.64 32DCT-38 9*,9+ 59 27 LFSR, 18 MISR 4T 16 LFSR, 2 MISR 2 C9 4T 38.47 32DCT-39 8*,9+ 60 26 LFSR, 17 MISR 4T 17 LFSR, 2 MISR 1 C8, 1 C9 4T 45.46 32DCT-40 7*,10+ 61 27 LFSR, 17 MISR 4T 16 LFSR, 2 MISR 1 C7, 1 C10 4T 45.24 Table 1, where the minimum reached TAT for the TCC grouping methodology is 5T , and for the BIST embedding methodology is 4T . So far the reductions in TAT and BIST area overhead achieved by the TCC grouping methodology when compared to the BIST embedding methodology were outlined. Table 2 also shows the reductions in performance degradation (PD), volume of output data (VOD), and overall test application time (overall-TAT). The reduction in PD represents the reduction in the number of data path registers modi ed in test registers. For example the reduction in PD for EWF-17 is 22.22%. The reduction in PD is increased up to 67.35% as in the case of 32DCT-33. Similarly the reduction VOD varies from 33.33% in the case of EWF-17 up to 90.48% in the case of 32DCT-30 and 32DCT-31. The volume of output data is considered directly proportional to the number of signature analysis registers. The number of signature analysis registers is very small due to the large number of modules grouped in TCCs and reuse of signature analysis registers at di erent test times. The volume of output data does not have impact only on the storage required for test data but also on the overall test application time which consists of the test application time (TAT) and the shifting time required to shift out the test responses at the end of the testing process. The shifting time 26

Table 2: Reduction in test application time, BIST area, performance degradation, volume of output data and overall test application time for benchmark examples using the TCC grouping methodology

area overhead overallTAT BIST PD VOD reduction(%) TAT Design reduction reduction reduction reduction (%) (%) (%) (%) Data path width

4 bit

8 bit

16 bit

EWF-17

20

-14.29

-17.95

-20.00

22.22

33.33

20.93

EWF-18

0

3.17

0.00

-1.78

33.33

75.00

8.33

EWF-19

0

18.57

16.15

14.80

40.00

50.00

5.56

EWF-20

0

18.57

16.15

14.80

40.00

50.00

5.56

EWF-21

20

-1.59

-4.27

-5.78

22.22

50.00

22.73

EWF-23

0

11.61

10.10

9.25

25.00

33.33

2.86

8DCT-10

20

-18.13

-21.60

-23.54

15.38

50.00

23.91

8DCT-11

20

20.95

18.46

17.07

46.67

71.43

27.66

8DCT-12

20

1.02

-1.65

-3.14

28.57

66.67

26.09

8DCT-13

20

-3.06

-6.04

-7.71

28.57

60.00

24.44

8DCT-14

0

31.43

29.23

28.00

53.33

66.67

10.53

8DCT-16

0

19.05

16.67

15.33

41.67

60.00

8.11

32DCT-30

0

41.14

39.25

38.19

62.96

90.48

35.85

32DCT-31

0

39.29

37.39

36.33

61.11

90.48

35.85

32DCT-32

0

43.82

41.94

40.88

65.38

90.00

34.62

32DCT-33

0

45.63

43.72

42.65

67.35

89.47

33.33

32DCT-37

0

33.72

31.75

30.65

55.81

82.35

28.57

32DCT-38

0

37.46

35.56

34.49

60.00

88.89

32.00

32DCT-39

0

33.55

31.66

30.60

55.81

88.24

30.61

32DCT-40

0

37.34

35.49

34.45

59.09

88.24

30.61

requires nSAk clock cycles, where nSA is the number of signature analysis registers and k is the data path width. The last column of Table 2 shows the reduction in overall-TAT given the data path width as 8 bits and T = 64. For all benchmark circuits where both BIST embedding and TCC grouping methodologies achieved minimal test application time (4T ) the overall-TAT is reduced in the case of the TCC grouping methodology due to a smaller number of signature analysis registers. For example, in the case 32DCT-30 the overall-TAT reduction achieved by the TCC grouping methodology when compared to the BIST embedding methodology is 35.85%. The BIST hardware synthesis algorithm has excellent computational time. The CPU time required to achieve lowest TAT for benchmark circuits is shown in the last column of Table 1. For example, in the case of EWF and 8 point DCT designs, the computational time varies from 0.7s to 3s. In the case of designs with huge testable design space like 32 point DCT, high quality solutions are achieved in computational times ranging from 38s to 130s. A high quality solution is 27

a fully testable data path with test application time equal (or almost equal) to the longest test application time required to test the most random pattern resistant module (4T in the case of benchmark circuits of Table 1). The BIST hardware synthesis for TCC grouping methodology allows the huge testable design space to be explored eciently by combining the accuracy of incremental test scheduling algorithms and the exploration speed of test scheduling algorithms based on xed test resource allocation, as outlined in section 3. This means it can be used with extremely complex hypothetical designs of dimensions not often reported in literature. Complex hypothetical data paths have been generated as described in the following. The number of modules nmod varies from 35 to 45, and the number of registers nreg varies from 90 to 115. The number of module-types is nres = 5. The maximum fanin for every register or input port of a module is Mfanin = 8. The input register set of each input port of every module contains a random number nr , with 1  nr  Mfanin , of randomly chosen registers. Similarly, the number of modules multiplexed at the input of each register is a random number nm , with 1  nm  Mfanin , of randomly chosen modules. The TAT of three module-types is assumed T , and in the case of the other two module-types the TAT is considered 4T . Table 3 shows a comparison of BIST resources and test application time using the TCC grouping and the BIST embedding methodologies for complex hypothetical data paths. In case of most of the complex hypothetical data paths the BIST embedding methodology needed BILBO registers to achieve low TAT. For example, in the case of EX-16 with 45 modules and 105 registers, the BIST embedding methodology required 60 LFSRs, 31 MISRs and 2 BILBOs to achieve a TAT of 12T . The TCC grouping achieves a reduction of TAT to 6T and reduces the number of test registers from 94 to only 49, without use of BILBO registers. Furthermore, it requires only 2 signature analysis registers reusable in di erent test sessions. This proves the eciency of simultaneous TCC scheduling and signature analysis register allocation. Table 4 clearly demonstrates that the TCC grouping methodology overcomes the problems of the BIST embedding in dealing with complex hypothetical data paths. Furthermore, the computational time for obtaining high quality solutions is still very low related to the size of the testable design space. For example it took less than 600s to nd high quality solutions for data paths with 45 modules and up to 115 registers. Finally, Figure 10 shows how the proposed TCC grouping methodology decreases the fault-escape probability when compared to the BIST embedding methodology. The 1

1

The complex hypothetical data paths are available on request from the authors

28

Table 3: Comparison of BIST resources and test application time using the TCC grouping and the BIST embedding methodologies for complex hypothetical data paths with 35 to 45 modules and 90 to 115 registers M R BIST embedding TCC grouping CPU Design O E BIST resources BIST resources TAT TAT time D G Test registers Test registers Comparators (s) C8, 2C7 EX-01 35 90 54 LFSR, 30 MISR 8T 42 LFSR, 4 MISR 5T 242.20 EX-02

35

95

EX-03

35 100

EX-04

35 105

EX-05

35 110

EX-06

35 115

EX-07

40

90

EX-08

40

95

EX-09

40 100

EX-10

40 105

EX-11

40 110

EX-12

40 115

EX-13

45

90

EX-14

45

95

EX-15

45 100

EX-16

45 105

EX-17

45 110

EX-18

45 115

55 LFSR, 28 MISR 1 BILBO 56 LFSR, 29 MISR 2 BILBO 67 LFSR, 32 MISR 1 BILBO 65 LFSR, 32 MISR 1 BILBO 67 LFSR, 34 MISR 1 BILBO 58 LFSR, 25 MISR

9T

35 LFSR, 3 MISR

8T

38 LFSR, 3 MISR

5T

43 LFSR, 4 MISR

9T

44 LFSR, 3 MISR

5T

44 LFSR, 4 MISR

8T

39 LFSR, 4 MISR

58 LFSR, 26 MISR 8T 4 BILBO 59 LFSR, 29 MISR 8T 1 BILBO 61 LFSR, 30 MISR 8T 3 BILBO 65 LFSR, 32 MISR 8T 1 BILBO 63 LFSR, 35 MISR 8T 2 BILBO 56 LFSR, 24 MISR 8T 4 BILBO 57 LFSR, 26 MISR 9T 3 BILBO 57 LFSR, 32 MISR 8T 4 BILBO 60 LFSR, 31 MISR 12T 3 BILBO 63 LFSR, 30 MISR 8T 9 BILBO 67 LFSR, 32 MISR 8T 2 BILBO

41 LFSR, 3 MISR

29

47 LFSR, 3 MISR 40 LFSR, 3 MISR 45 LFSR, 4 MISR 48 LFSR, 4 MISR 48 LFSR, 3 MISR 46 LFSR, 3 MISR 52 LFSR, 4 MISR 47 LFSR, 2 MISR 52 LFSR, 5 MISR 53 LFSR, 4 MISR

2C6 C9, 2 C7 C5, 2 C3 C9, C7, C6 C5, 2 C4 2 C8 3 C6 C8, C7, 2 C6 C4, 2 C2 C9, 2 C7 C6, C5 C12, C6, C5 2 C4, 2 C3 C9, C8, 2 C6 2 C4, C3 C9, C8, 2 C7 C5, C3 2 C10, 2 C6 2 C4 C9, 2 C8 C7, C6 C9, C8 2 C7, C6 C9, C8, 2 C7 C5, C4 C12, C9, C7 C6, 2 C4, C3 C13, C7, 3 C6 C5, C2 C12, C10 3 C7, C2 C13, C7, C6 2 C5, C4, C2 C12, C11 C9, C7, C5

8T

383.00

5T

491.10

4T

317.10

9T

559.50

5T

515.20

6T

480.70

8T

416.60

8T

352.70

6T

532.50

8T

588.90

5T

554.00

6T

543.70

8T

561.30

5T

417.10

6T

594.20

8T

502.60

5T

593.50

Table 4: Reduction in test application time, BIST area, performance degradation, volume of output data and overall test application time for complex hypothetical data paths using the TCC grouping methodology

area overhead overallTAT BIST PD VOD reduction(%) TAT Design reduction reduction reduction reduction (%) (%) (%) (%) Data path width

4 bit

8 bit

16 bit

EX-01

37

30.10

25.79

22.62

45.24

86.67

53.19

EX-02

11

39.68

35.45

32.33

54.76

89.66

33.66

EX-03

37

37.88

33.76

30.74

52.87

90.32

54.74

EX-04

20

40.29

36.74

34.14

53.00

87.88

50.68

EX-05

0

38.78

35.02

32.26

52.04

90.91

28.57

EX-06

0

40.48

37.00

34.45

52.94

88.57

41.33

EX-07

25

30.29

25.12

21.31

48.19

84.00

41.57

EX-08

0

28.72

33.06

28.60

25.34

90.00

50.00

EX-09

0

27.45

22.88

19.53

43.82

90.00

28.72

EX-10

25

38.35

34.07

30.94

54.26

90.91

47.42

EX-11

0

35.47

31.44

28.47

50.00

87.88

29.90

EX-12

37

34.14

30.38

27.62

48.00

89.19

56.44

EX-13

25

21.49

16.89

13.54

39.29

89.29

44.57

EX-14

11

22.20

16.60

12.49

43.02

89.66

33.66

EX-15

37

20.12

14.94

11.16

39.78

88.89

56.00

EX-16

50

28.77

23.64

19.87

47.87

94.12

61.54

EX-17

0

25.44

20.96

17.71

44.12

87.18

33.01

EX-18

37

27.20

22.78

19.53

43.56

88.24

55.10

experiments were done for a data path module with possible 10 error sequences, where the aliasing error sequences, for a given characteristic polynomial of signature analysis register, vary from 10 to 90. Fault-escape probability of a module varies from Pm = 0:01% to Pm = 0:09%. As it can be seen from Figure 10(a), in the case of BIST embedding methodology the fault escape probability for group of modules (Pg ) increases as the number of modules tested simultaneously increases. On the other hand, in the case of the TCC grouping, the fault-escape probability decreases exponentially with the number of modules tested simultaneously as shown in Figure 10(b). This is due to the fact that a fault is not detected in the TCC grouping methodology only when initially the n-input k-bit comparator fails to detect the fault and subsequently the signature of a TCC also fails to detect the fault. A previous work on reducing fault-escape probability at the expense of increased area overhead, performance degradation, and volume of output data was presented in [33]. Note that the proposed methodology does not introduce any area overhead, nor performance degradation, whilst the reduction in fault-escape probability is exponential. 6

30

B IS T e m b e d d in g 1 0.9 0.8

log(P g/P m )

0.7

P m = 0.01

0.6

P m = 0.03

0.5

P m = 0.05

0.4

P m = 0.07 P m = 0.09

0.3 0.2 0.1 0 1

2

3

4

5

6

7

8

n u mb e r o f same -typ e mo d u le s te ste d simu ltan e o u sly

(a) Increase in fault-escape probability for BIST embedding T C C g r o u p in g 1

2

3

4

5

6

7

8

0 -2 -4

log(P g/P m )

P m = 0.01 -6

P m = 0.03

-8

P m = 0.05 P m = 0.07

-10

P m = 0.09 -12 -14 -16 n u mb e r o f same -typ e mo d u le s g ro u p e d in a T C C

(b) Decrease in fault-escape probability for TCC grouping

Figure 10: Comparison in fault-escape probability when 1 to 8 same-type modules are tested simultaneously in BIST embedding and TCC grouping methodologies 31

5 Conclusion This paper has addressed the testability of RTL data paths. It has been shown that an improvement in terms of test application time, BIST area overhead, performance degradation, volume of output data, overall test application time (the sum of test application time and shifting time required to shift out test responses) and fault-escape probability is achieved using the newly introduced test compatibility classes-based methodology. The new BIST methodology is based on grouping modules with identical physical information into TCCs and testing the compatible modules by sharing a small number of test pattern generators at the same test time. An n-input k-bit comparator checks module output responses from each TCC reducing the fault-escape probability and the number of signatures that have to be shifted out. The proposed TCC grouping methodology is suitable for RTL data paths with both uniform and variable bit width. A new BIST hardware synthesis uses ecient tabu search-based testable design space exploration which combines the accuracy of incremental test scheduling algorithms with the exploration speed of test scheduling algorithms based on xed test resource allocation. The huge size of the testable design space is reduced by considering only the representative partially testable data paths during the local neighborhood search. An incremental TCC scheduling algorithm further shrinks the size of the testable design space by generating a fully testable data path using simultaneous test scheduling and signature analysis registers allocation. BIST hardware synthesis algorithm for the proposed TCC grouping methodology has been tested exhaustively for benchmark and complex hypothetical data paths. When compared to the traditional BIST embedding methodology, the TCC grouping methodology is capable of reducing the test application time with comparable of even lower BIST area overhead and high reductions in performance degradation, volume of output data, fault-escape probability and overall test application time. Furthermore the proposed BIST hardware synthesis algorithm achieves high quality of the nal solution in low computational time. The proposed methodology and the BIST hardware synthesis algorithm have been successfully integrated in high-level synthesis design ow [32] leading to lower design cycle by considering testability at higher levels of abstraction than the gate-level. This reinforces the conclusion reached recently by other researchers [6{8] that testability of digital circuits is best explored and optimized at the register transfer level.

Acknowledgement

The authors would like to thank Professor Melvin Breuer of University of Southern 32

California for providing a copy of reference [33].

References [1] G. de Micheli, Synthesis and Optimization of Digital Circuits. McGraw-Hill International Editions, 1994. [2] M.C. McFarland, A.C. Parker, and R. Camposano, \The high-level synthesis of digital systems," Proceedings of the IEEE, vol. 78, pp. 301{318, Feb 1990. [3] V. Chickername, J. Lee, and J.K. Patel, \Addressing design for testability at the architectural level," IEEE Transactions CAD, vol. 13, pp. 920{934, Jul 1994. [4] S. Narayanan and M.A. Breuer, \Recon guration techniques for a single scan chain," IEEE Transactions on CAD, vol. 14, pp. 750{765, Jun 1995. [5] R. Gupta and M.A. Breuer, \Partial scan design of register-transfer level circuits," Journal of Electronic Testing: Theory and Applications (JETTA), vol. 7, pp. 25{46, Aug 1995. [6] S. Dey and M. Potkonjak, \Nonscan design-for-testability techniques using RT-level design information," IEEE Transcations on CAD, vol. 16, pp. 1488{1506, Dec 1997. [7] I. Ghosh, A. Raghunathan, and N.K. Jha, \Design for hierarchical testability of RTL circuits obtained by behavioral synthesis," IEEE Transcations on CAD, vol. 16, pp. 1001{1014, Sep 1997. [8] I. Ghosh, A. Raghunathan, and N.K. Jha, \A design for testability technique for RTL circuits using control/data ow extraction," IEEE Transcations on CAD, vol. 17, pp. 706{723, Aug 1998. [9] Y. Makris and A. Orailoglu, \RTL test justi cation and propagation analysis for modular designs," Journal of Electronic Testing: Theory and Applications (JETTA), vol. 13, pp. 105{120, Oct 1998. [10] V.D. Agrawal, C.R. Kime, and K.K. Saluja, \A tutorial on built-in self test - part 1: Principles," IEEE Design and Test of Computers, pp. 73{82, Mar 1993. [11] D. Gizopoulos, A. Paschalis, and Y. Zorian, \An e ective BIST scheme for datapaths," in Proc. International Test Conference, pp. 76{85, 1996. 33

[12] D. Berthelot, M.L. Flottes, and B. Rouzeyre, \BISTing datapaths under heterogenous test schemes," Journal of Electronic Testing: Theory and Application (JETTA), vol. 14, pp. 115{123, Jan 1999. [13] I. Ghosh, N.K. Jha, and S. Bhawmik, \A BIST scheme for RT level controllerdata paths based on symbolic testability analysis," in Proc. 35th Design Automation Conference, pp. 554{559, 1998. [14] S. Ravi, N.K. Jha, and G. Lakshminarayana, \TAO-BIST: A framework for testability analysis and optimization of RTL circuits for BIST," in Proc. 17th VLSI Test Symposium, 1999. [15] L. Goodby and A. Orailoglu, \Redundancy and testability in digital lter datapaths," IEEE Transcations on CAD, vol. 18, pp. 631{644, May 1999. [16] D.D. Gajski, Principles of Digital Design. Prentice-Hall International, 1997. [17] P.R. Chalsani, S. Bhawmik, A. Acharya, and P. Palchaudhuri, \Design of testable VLSI circuits with minimum area overhead," IEEE Transactions on Computers, vol. 38, pp. 1460{1462, Sep 1989. [18] A. Basu, T.C. Wilson, D.K. Banerji, and J.C. Majithia, \An approach to minimize testability for BILBO based built-in self-test," in Proc. 5th International Conference on VLSI Design, pp. 354{355, 1992. [19] S.P. Lin, C.A. Njinda, and M.A. Breuer, \Generating a family of testable designs using the BILBO methodology," Journal of Electronic Testing: Theory and Applications (JETTA), vol. 4, no. 2, pp. 71{89, 1994. [20] G.L. Craig, C.R. Kime, and K.K. Saluja, \Test scheduling and control for VLSI built-in self-test," IEEE Transactions on Computers, vol. 37, pp. 1099{1109, Sep 1988. [21] W.B. Jone, C.A. Papachristou, and M. Pereira, \A scheme for overlaying concurrent testing of VLSI circuits," in Proc. 26th Design Automation Conference, pp. 531{536, 1989. [22] C.I.H. Chen, \Graph partitioning for concurrent test scheduling in VLSI circuits," in Proc. 28th Design Automation Conference, pp. 287{290, 1991. 34

[23] A. Orailoglu and I.G. Harris, \Test path generation and test scheduling for selftestable designs," in Proc. International Conference on Computer Design, pp. 528{ 531, 1993. [24] I.G. Harris and A. Orailoglu, \Microarchitectural synthesis of VLSI designs with high test concurrency," in Proc. 31st Design Automation Conference, pp. 206{211, 1994. [25] S. Chiu and C.A. Papachristou, \A desing for testability scheme with applications to data path synthesis," in Proc. 28th Design Automation Conference, pp. 271{277, 1991. [26] E.J. McCluskey, \Design for testability," in Logic Design Principles With Emphasis On Testable Semicustom Circuits, pp. 424{488, New Jersey: Prentice Hall, 1986. [27] S.K. Gupta and D.K. Pradhan, \Utilization of on-line (concurrent) checkers during built-in self-test and vice versa," IEEE Transactions on Computers, vol. 45, pp. 63{ 73, Jan 1996. [28] M.F. Abdulla, C.P. Ravikumar, and A. Kumar, \Optimization of mutual and signature testing schemes for highly concurrent systems," Journal of Electronic Testing: Theory and Applications (JETTA), vol. 12, pp. 199{216, June 1998. [29] P.H. Bardell, W.H. McAnney, and J. Savir, Built-In Self Test - Pseudorandom Techniques. John Wiley & Sons, 1986. [30] F. Glover and M. Laguna, \Tabu search," in Modern Heuristic Techniques for Combinatorial Problems (C.R. Reeves, ed.), pp. 70{150, McGraw-Hill Book Company, 1995. [31] P. Kollig and B.M. Al-Hashimi, \A new approach to simultaneous scheduling, allocation and binding in high level synthesis," IEE Electronics Letters, vol. 33, pp. 1516{ 1518, Aug 1997. [32] P. Kollig, Algorithms for Scheduling, Allocation and Binding in High Level Synthesis. PhD thesis, Sta ordshire University, UK, Apr 1998. [33] S.P. Lin, A Design System to Support Built-In Self-Test of VLSI Circuits Using BILBO-Oriented Test Methodologies. PhD thesis, University of Southern California, May 1994. 35