An efficient algorithm for modelling progressive

0 downloads 0 Views 610KB Size Report
Feb 15, 2005 - Numerical results using random resistor networks for modelling ... Random thresholds fuse network with triangular lattice topology. Each of the ...
INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING Int. J. Numer. Meth. Engng 2005; 62:1982–2008 Published online 15 February 2005 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/nme.1257

An efficient algorithm for modelling progressive damage accumulation in disordered materials ‡ Phani Kumar V. V. Nukala1, ∗, † , SrYan Šimunovi´c1 and Murthy N. Guddati2 1 Computer

Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831-6164, U.S.A. 2 Department of Civil Engineering, North Carolina State University, Raleigh, NC 27695-7908, U.S.A.

SUMMARY This paper presents an efficient algorithm for the simulation of progressive fracture in disordered quasi-brittle materials using discrete lattice networks. The main computational bottleneck involved in modelling the fracture simulations using large discrete lattice networks stems from the fact that a new large set of linear equations needs to be solved every time a lattice bond is broken. Using the present algorithm, the computational complexity of solving the new set of linear equations after breaking a bond reduces to a simple triangular solves (forward elimination and backward substitution) using the already Cholesky factored matrix. This algorithm using the direct sparse solver is faster than the Fourier accelerated iterative solvers such as the preconditioned conjugate gradient (PCG) solvers, and eliminates the critical slowing down associated with the iterative solvers that is especially severe close to the percolation critical points. Numerical results using random resistor networks for modelling the fracture and damage evolution in disordered materials substantiate the efficiency of the present algorithm. In particular, the proposed algorithm is especially advantageous for fracture simulations wherein ensemble averaging of numerical results is necessary to obtain a realistic lattice system response. Copyright 䉷 2005 John Wiley & Sons, Ltd. KEY WORDS:

damage evolution; brittle materials; random thresholds model; lattice network; statistical physics

∗ Correspondence

to: P. K. V. V. Nukala, Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831-6164, U.S.A. † E-mail: [email protected] ‡ The submitted manuscript has been authored by a contractor of the U.S. Government under Contract No. DE-AC05-00OR22725. Accordingly, the U.S. Government retains a non-exclusive, royalty-free license to publish or reproduce the published form of this contribution, or allow others to do so, for U.S. Government purposes. Contract/grant sponsor: U.S. Department of Energy; contract/grant number: DE-AC05-00OR22725

Copyright 䉷 2005 John Wiley & Sons, Ltd.

Received 9 June 2004 Revised 20 September 2004 Accepted 18 October 2004

ALGORITHM FOR MODELLING PROGRESSIVE DAMAGE ACCUMULATION

1983

1. INTRODUCTION Progressive damage evolution leading to failure of disordered quasi-brittle materials has been studied extensively using various types of discrete lattice models [1–4]. Figure 1 presents a discrete lattice network with a triangular lattice topology. The essential features of discrete lattice models used for modelling progressive damage evolution are: disorder, elastic response characteristics of the lattice bonds, and a breaking rule for each of the bonds in the lattice. The disorder in the system is introduced by either random dilution of bonds or by randomly prescribing different elastic stiffness constants or breaking thresholds to each of the bonds in the lattice. The elastic response of the individual bonds is typically described by random fuse models, central-force (spring) models, bond-bending spring models, and beam-type models. Depending on the elastic response characteristics of individual bonds, various types of breaking rules are prescribed. The elastic and breaking characteristics of each bond in the lattice are supposed to represent the mesoscopic response of the material. The mechanical breakdown of disordered materials is often modelled using its electrical analogue. Based on this analogy, the mechanical bonds in the lattice are modelled by the electrical fuses, and the mechanical strength of a bond is modelled by the fuse threshold current. The principle advantage of using the electrical fuse network is that it reduces the

Figure 1. Random thresholds fuse network with triangular lattice topology. Each of the bonds in the network is a fuse with unit electrical conductance and a breaking threshold randomly assigned based on a probability distribution (uniform). The behaviour of the fuse is linear upto the breaking threshold. Periodic boundary conditions are applied in the horizontal direction and a unit voltage difference, V = 1, is applied between the top and bottom of lattice system bus bars. As the current I flowing through the lattice network is increased, the fuses will burn out one by one. Copyright 䉷 2005 John Wiley & Sons, Ltd.

Int. J. Numer. Meth. Engng 2005; 62:1982–2008

1984

P. K. V. V. NUKALA, S. ŠIMUNOVIC´ AND M. N. GUDDATI

number of degrees of freedom by replacing the vector problem with the scalar one [5, 6]. Numerical simulations based on electrical and mechanical idealizations have shown excellent agreement between the scaling properties of the two models [5], provided an equivalence is established between electrical current, voltage and conductance in the electrical model and the mechanical stress, strain and Young’s modulus in the mechanical model, respectively. Within this framework of the random thresholds fuse model, the lattice is initially fully intact with bonds having the same conductance, but the bond breaking thresholds, t, are randomly distributed based on a threshold probability distribution, p(t). The burning of a fuse occurs irreversibly, whenever the electrical current in the fuse exceeds the breaking threshold current value, t of the fuse. Numerically, the damage evolution in materials is simulated by setting a unit voltage difference, V = 1, between the top and bottom of lattice system bus bars, and the Kirchhoff equations are solved to determine the current flowing in each of the fuses. Subsequently, for each fuse j , the ratio between the current ij and the breaking threshold tj is evaluated, and the bond jc having the largest value, maxj ij /tj , is irreversibly removed (burnt). The current is redistributed instantaneously after a fuse is burnt, implying that the current relaxation in the lattice system is much faster than the breaking of a fuse. We assume that the loading process is quasi-static in the sense that the current is redistributed instantaneously after a fuse is burnt, and only one fuse is burnt in a given load step. Each time a fuse is burnt, it is necessary to re-calculate the current distribution in the lattice to determine the subsequent breaking of fuses. This process of breaking fuses irreversibly one after another is repeated until the lattice network finally becomes disconnected. It is assumed that successive fuse failures leading ultimately to the failure of lattice system is similar to the breakdown of quasi-brittle materials. Numerical simulation of fracture using large discrete lattice networks is often hampered by the high computational cost associated with solving a new large set of linear equations every time a new lattice bond is broken. This becomes especially severe with increasing lattice system size, L, as the number of broken bonds at failure, nf , follows a power-law distribution  given by nf ∼ O L1.8 . In addition, since the response of the lattice system corresponds to a specific realization of the random breaking thresholds, an ensemble averaging of numerical results over Nconfig configurations is necessary to obtain a realistic representation of the lattice system response. This further increases the computational time required to perform simulations on large lattice systems. Fourier accelerated PCG iterative solvers [7–9] have been used in the past for simulation of breakdowns of large lattices. However, in fracture simulations using discrete lattice networks, these methods exhibit critical slowing down, particularly in the vicinity of percolation critical thresholds. As the lattice system gets closer to the macroscopic fracture, the condition number of the system of linear equations increases, thereby increasing the number of iterations required to attain a fixed accuracy. This becomes particularly significant for large lattices. Furthermore, the Fourier acceleration technique is not effective when fracture simulation is performed using central-force and bond-bending lattice models [8]. This study presents two classes of algorithms to speed up lattice simulations for modelling of fracture in disordered quasi-brittle materials. The first class of algorithms are based on Davis and Hager’s [10, 11] multiple-rank sparse Cholesky factorization downdate using direct solvers, and the second class of algorithms are based on PCG iterative techniques using circulant and block-circulant preconditioners. The aim of the paper is to develop an Copyright 䉷 2005 John Wiley & Sons, Ltd.

Int. J. Numer. Meth. Engng 2005; 62:1982–2008

ALGORITHM FOR MODELLING PROGRESSIVE DAMAGE ACCUMULATION

1985

algorithm that significantly reduces the average computational time required to break one lattice configuration. The paper is organized as follows. Section 2 presents the algebraic problem of modelling progressive damage evolution using discrete lattice models. Section 3 describes the computational algorithms based on multiple-rank sparse Cholesky factorization update using direct solvers for random fuse networks. In Section 4, we present alternate block-circulant PCG iterative algorithms for solving the Kirchhoff equations. Section 5 presents the numerical simulations on 2D triangular lattice systems and compares the computational efficiency of the algorithms based on direct solvers with the iterative solvers. Concluding remarks are summarized in Section 6.

2. ALGEBRAIC PROBLEM Consider a lattice system with a total number of bonds, Nel . Let N = {1, 2, 3, . . . , Nel } denote the set of individual bonds in the lattice system. After n bonds are broken, let Snb denote an ordered set of broken bonds such that the element number of the j th broken bond, where j = 1, 2, . . . , n, is given by ej = Snb (j ). The set of unbroken bonds after n bonds are broken is defined as Snu = N \ Snb . The cardinality of sets Snb and Snu are n and M = (Nel − n), b u respectively. Based on the above definitions, it is clear that Snb ⊂ Sn+1 and Snu ⊃ Sn+1 for b u each n = 0, 1, 2, . . ., and S0 = ∅ and S0 = N. Algebraically, fracture of discrete lattices by breaking one bond at a time is equivalent to solving a new set of linear equations (Kirchhoff equations in the case of an electrical analogue) An xn = bn ,

n = 0, 1, 2, . . .

(1)

every time a new lattice bond is broken. In Equation (1), each matrix An is an N ×N symmetric and positive definite matrix (the lattice conductance matrix in the case of fuse models and the lattice stiffness matrix in the case of spring and beam models), bn is the N × 1 (given) applied nodal current or force vector, xn is the N × 1 (unknown) nodal potential or displacement vector, and N is the number of degrees of freedom (unknowns) in the lattice. The subscript n in Equation (1) indicates that An and bn are evaluated after the nth bond is broken. The solution xn , obtained after the nth bond is broken, is used in determining the subsequent ((n + 1)th) bond to be broken. The matrices An and bn are obtained from the element (bond) matrices using the standard finite element assembly procedure (for example, see p. 42 of Reference [12]), denoted as   An = ke , bn = fe (2) e∈Snu

e∈Snu

where ke and fe are the element conductivity matrix and the current vector, respectively, and  denotes the standard finite assembly operator. For the case of the random fuse model, the assembly procedure is equivalent to  An = Gn n Gnt = ke ge get n = 0, 1, 2, . . . (3) e∈Snu

Copyright 䉷 2005 John Wiley & Sons, Ltd.

Int. J. Numer. Meth. Engng 2005; 62:1982–2008

1986

P. K. V. V. NUKALA, S. ŠIMUNOVIC´ AND M. N. GUDDATI

where ge is the column vector (with entries −1, 0, or 1) of the N × M incidence matrix, Gn , associated with the element e. Similarly, ke is the conductivity (positive) of the element e, and n is the M × M positive definite diagonal conductivity matrix with diagonal entries ke corresponding to each of the elements of the set Snu . In the case of central-force (spring) models, Equation (3) is still valid except that ke now denotes the spring constant of the  element e, and the entries of ge are either nx , ny , nz , −nx , −ny , −nz or 0, where nx , ny , nz denote the direction cosines of the element (see Equation (A1) in Appendix A). Mathematically, in the case of the fuse and spring models, the breaking of a bond is equivalent to a rank-one downdate of the matrix An . In the case of beam models, the breaking of a bond is equivalent to multiple-rank (rank-6) downdate of the matrix An . In the following, we present the process of breaking in the context of a fuse model. The methodology for central-force (spring) and beam models is similar and is presented in Appendix A. As discussed before, let N (the dimension of An , where n = 0, 1, 2, . . . ,) denote the total number of degrees of freedom in the lattice system and let An represent the stiffness matrix of the random fuse network system in which n fuses are either missing (random dilution) or have been burnt during the analysis. Let us also assume that a fuse ij (the (n + 1)th fuse) is burnt when the externally applied voltage is increased gradually. In the above description, i and j refer to the global degrees of freedom connected by the fuse before it is broken. For the scalar random fuse model, the degrees of freedom i and j are also equivalent to the node i and node j connected by the fuse before it is broken. The new stiffness matrix An+1 of the lattice system after the fuse ij is burnt is given by t An+1 = Gn+1 n+1 Gn+1

= Gn n Gnt − kn vn vnt = An − kn vn vnt ,

∀n = 0, 1, 2, . . .

where vn is a sparse vector with only a few (at most two) non-zero entries,   j vnt = 0 · · · 1i · · · −1 · · · 0

(4)

(5)

and kn is a positive constant denoting the conductance of the fuse ij ((n + 1)th broken bond) before it is broken. It is to be noted that for each n = 0, 1, 2, . . ., we have vn = gen+1 and kn = ken+1 , where en+1 denotes the element number of the (n + 1)th broken bond with conb necting nodes i and j , and Sn+1 = Snb ∪ {en+1 }. For the random fuse and spring models, Equation (4) follows directly from Equation (3) by noting that breaking the (n + 1)th fuse (removing an element from the lattice) is equivalent to deleting the corresponding column vn associated with the (n + 1)th burnt fuse from the incident matrix Gn . In particular, the relation An = Gn n Gnt (Equation (3)) holds for each n, where Gn is the result of deleting columns v0 , v1 , . . . , vn−1 from G0 . The matrix G0 is an N × Nel rectangular incidence matrix with entries −1, 0, or 1 corresponding to each of the bonds (elements) of the original intact lattice. Similarly, the matrix n is obtained by successively deleting the rows and columns corresponding to k0 , k1 , . . . , kn−1 diagonal entries of 0 . Furthermore, it is noted that Copyright 䉷 2005 John Wiley & Sons, Ltd.

Int. J. Numer. Meth. Engng 2005; 62:1982–2008

ALGORITHM FOR MODELLING PROGRESSIVE DAMAGE ACCUMULATION

1987

if the Cholesky factorizations are PAn Pt = Ln Lnt

(6)

for each n = 0, 1, 2, . . ., where P is a permutation matrix chosen to preserve the sparsity of Ln , then the sparsity pattern of Ln+1 is contained in that of Ln . Hence, for all n, the sparsity pattern of Ln is contained in that of L0 . Based on the above description of successive An for n = 0, 1, 2, . . ., an updating scheme of some kind is therefore likely to be more efficient than solving the new set of equations formed by Equation (1) for each n. In particular, since the successive matrices An , for each n = 0, 1, 2, . . ., differ by a rank-one matrix, we successively downdate the Cholesky factorizations Ln of An , using the sparse Cholesky factorization downdate algorithm of Davis and Hager [10, 11]. For numerical simulations, we explore two variants of this approach. The successive rank-one downdates [10] of Ln → Ln+1 for n = 0, 1, 2, . . ., leads to Solver Type 1 algorithm, and the multiple-rank ((n + 1 − m) rank) downdate [11] of Lm → Ln+1 , where Lm is the Cholesky factor of Am for 0  m  n, leads to Solver Type 2 algorithm. The multiple-rank update of the sparse Cholesky factorization is computationally superior to an equivalent series of rank-one updates since the multiple-rank update makes one pass through L in computing the new entries, while a series of rank-one updates require multiple passes through L [11]. In addition, given the Cholesky factorization Lm of Am for m = 0, 1, 2, . . ., it is possible that a direct updating of the solution xn+1 for n = m, m + 1, m + 2, . . . based on p = (n − m) saxpy vector updates (see Section 3.2 for details), may sometimes be cheaper than successive sparse Cholesky updates of Solver Type 1. Hence, in this work, we compare the efficiency of both solver types 1 and 2 for solving the random thresholds fuse model.

3. ALGORITHMS BASED ON SPARSE DIRECT SOLVERS 3.1. Sparse Cholesky factorization update: Lm → Lm+p for any p The algorithm presented in References [10, 11] is based on the analysis and manipulation of the underlying graph structure of the stiffness matrix A and on the methodology presented in References [13, 14] for modifying a dense Cholesky factorization. This algorithm incorporates the change in the sparsity pattern of L and is optimal in the sense that the computational time required is proportional to the number of changing non-zero entries in L. In particular, since the breaking of fuses is equivalent to removing the edges in the underlying graph structure of the stiffness matrix A, the new sparsity pattern of the modified L must be a subset of the sparsity pattern of the original L. Denoting the sparsity pattern of L by L, we have Lm ⊇ Ln

∀m < n

(7)

Therefore, we can use the modified dense Cholesky factorization update [10] and work only on the non-zero entries in L. Furthermore, since the changing non-zero entries in L depend on the ith and j th degrees of freedom of the fuse ij that is broken (at most two non-zeros entries in vn , ∀n), it is only necessary to modify the non-zero elements of a submatrix of L. Copyright 䉷 2005 John Wiley & Sons, Ltd.

Int. J. Numer. Meth. Engng 2005; 62:1982–2008

1988

P. K. V. V. NUKALA, S. ŠIMUNOVIC´ AND M. N. GUDDATI

The multiple-rank sparse Cholesky downdate algorithm downdates the Cholesky factorization Lm of the matrix Am to Ln+1 of the new matrix An+1 , where An+1 = Am + YYt ,  = −1, and Y represents a N × p rank-p matrix. The pseudo-code in Algorithm 1 follows the matlab syntax [15] and its sparse matrix functionalities very closely. The following notation is used: • zeros(m, n): an m × n matrix of zero entries. • L (i1 : i2 , j ) = L(i1 : i2 ,j ) : refers to entries corresponding to i1 th to i2 th rows of column j of the matrix L. • [ilist, jlist, val] = find (L (i1 : i2 , j )): extracts the sparsity pattern of L (i1 : i2 , j ). That is, the non-zeros of L (i1 : i2 , j ) are stored in val, and the corresponding row and column indices are stored in ilist and jlist, respectively. • j + ilist: increment each of the entries of ilist by j . • length (ilist): length of the vector ilist. • Sparse(ilist, jlist, val, m, n): create an m × n sparse matrix with non-zero entries val located at the row and column indices given by ilist and jlist, respectively. Using the above notation, the multiple-rank sparse Cholesky factor update ( = + 1) or downt t + YYt , and Y date ( = −1) algorithm, Ln+1 = SpChol (Lm , Y, ), where Ln+1 Ln+1 = Lm Lm represents an N × p rank-p matrix, is given by Algorithm 1 (SpChol(L, Y, ): Rank-p sparse Cholesky update/downdate algorithm [11]) 1: 2: 3: 4: 5: 6: 7: 8: 9: 10:

˜ L ˜ t [15] Convert LLt → LD     ˜ , 1 ; ifree = 1 Initialize: ilist = jlist = vlist = zeros nnz L Set i = 1 for all i = 1, 2, . . . , p for j = 1 to N do Algorithm 5 from Davis et al. [10] (insert Algorithm 2 from below) ˜ (insert pseudo-code 3 from below) Update the non-zeros of column j of L end for nz = ifree − 1 ˜ = I + Sparse(ilist(1 : nz), jlist(1 : nz), vlist, N, N ) Update L ˜ L ˜ t → LLt [15] Convert LD

Algorithm 2 (Algorithm 5 from Davis and Hager’s [10]) 1: for i = 1 to p do 2: if Yj i = 0 then 3: ¯ = i + Yj2i /Djj 4: Djj = ¯ Djj 5: i =  Yj i /Djj 6: Djj = Djj /i 7: i = ¯ 8: end if 9: end for

{( = + 1 for update or − 1 for downdate)}

Copyright 䉷 2005 John Wiley & Sons, Ltd.

Int. J. Numer. Meth. Engng 2005; 62:1982–2008

ALGORITHM FOR MODELLING PROGRESSIVE DAMAGE ACCUMULATION

1989

˜ Algorithm 3 (Pseudo-code for updating the non-zeros of column j of L) 1: if j < N then   2: [iylist, jylist, yval] = find L(j +1):N,j 3: 4: for i = 1 to p do 5: if Yj i = 0 and yval = ∅ then 6: Yj +iylist,i = Yj +iylist,i − Yj i yval  7: [mm, nn, ww] = find Y(j +1):N,i 8: vv = zeros(N, 1); 9: vvj +iylist = yval; vvj +mm = vvj +mm + i ww; 10: [iylist, jylist, yval] = find(vv((j + 1) : N) 11: end if 12: end for 13: 14: nz = length(iylist) 15: ilist(ifree : (ifree + nz − 1)) = iylist + j 16: jlist(ifree : (ifree + nz − 1)) = j 17: vlist(ifree : (ifree + nz − 1)) = yval 18: ifree = ifree + nz 19: end if When the factorization of the matrix A is available as LDLt instead of LLt , the conversion from LLt to LDLt and vice versa is not performed in Algorithm 1. The reader is referred to References [10, 11] for a comprehensive presentation of the sparse Cholesky factorization update/downdate algorithm. • Solver Type 1: Given the factorization Lm of Am , the rank-1 sparse Cholesky modification (algorithm 1 with p = 1) is used to update the factorization Ln+1 for all subsequent values of n = m, m + 1, . . . . Once the factorization Ln+1 of An+1 is obtained, the solution vector t xn+1 is obtained from Ln+1 Ln+1 xn+1 = bn+1 by two triangular solves. • Solver Type 2: Given the factorization Lm of Am for m  n, the solution vector xn+1 after the (n + 1)th fuse is burnt, for n = m, m + 1, m + 2, . . ., is obtained by two triangular solves using the factor Lm on a sparse load vector with only few non-zeros, and (n + 2 − m) saxpy vector updates (see Section 3.2). The triangular solves are further simplified by the fact that it is performed on a trivial load vector (at most two non-zeros) and hence the triangular solves can be performed much more efficiently than O (nnz (Lm )), where nnz (Lm ) denotes the number of non-zeros of the Cholesky factorization Lm of Am . Since the storage and computational cost associated with p = (n − m) saxpy vector updates can become expensive, we use multiple-rank (rankp) sparse Cholesky update (Algorithm 1) Lm → Lm+maxupd+1 for every maxupd (defined later) interval. In the following, we present an algorithm for updating the solution xn → xn+1 , given the factor Lm of Am for m  n. This algorithm is used in conjunction with the multiple-rank sparse Cholesky update Lm → Lm+maxupd+1 of Solver Type 2. Copyright 䉷 2005 John Wiley & Sons, Ltd.

Int. J. Numer. Meth. Engng 2005; 62:1982–2008

1990

P. K. V. V. NUKALA, S. ŠIMUNOVIC´ AND M. N. GUDDATI

t for any m  n 3.2. Update xn → xn+1 , given Am = Lm Lm

Since the successive matrices An and An+1 , for each n, differ by a rank-one matrix, it is possible to use Shermon–Morrison–Woodbury formula [15] to update the solution vector xn+1 directly knowing the sparse Cholesky factor Lm of Am , for any m  n. In the following, we assume that the factor Lm of the stiffness matrix Am is available (after the mth bond is broken) for any 0  m  n, and we update the solution vector xn → xn+1 for all n = m, m + 1, . . . . Let bn and xn represent the load vector and solution vector after the nth bond is broken. Similarly, let us assume that a fuse ij ((n + 1)th fuse) is burnt when the externally applied voltage is increased gradually such that bn+1 and xn+1 represent the new load and solution vectors, and kn is the conductance of the fuse ij ((n + 1)th broken bond) before it is broken. In addition, let v(n−m) denote the sparse vector corresponding to the (n + 1)th broken bond for any n  m (similar to Equation (5)) with only a few (at most two) non-zero entries. The implementation details involved in the updating of xn+1 from xn using the factor Lm are described below: t for any m  n) Algorithm 4 (Solution vector update: xn → xn+1 , given Am = Lm Lm t u 1: Solve Lm Lm (n−m) = v(n−m) 2: if n = m then

3:

u(n−m) ← u(n−m) + u˜ (n−m) , where u˜ (n−m) =

4: end if 5: Evaluate (n−m) =  6: 7: 8: 9: 10: 11: 12:

  l ult v(n−m) ul

(n−m−1)  l=0

kn

 t 1 − kn v(n−m) u(n−m) Store u(n−m) and (n−m) (used in the future evaluation of u˜ (n−m+s) , where s = 1, 2, . . .) Update the load vector:  bn → bn+1  (see Appendix B, Remark 5) t Compute  = (n−m) u(n−m) bn+1 if (i or j is prescribed) then  =  − kn end if Update xn+1 = xn +  u(n−m)

Algorithm 4 is repeated for all values of n = m, m + 1, . . . until either the lattice system has fractured or the number of updates (n + 1 − m) exceeds maxupd, defined as the maximum number of updates between successive Cholesky factor updates. It is noted that the storage and computational requirements can become prohibitively expensive especially for large lattices as the number of updates, p = (n − m), increases. Hence, it is necessary to limit the maximum number of updates between two successive Cholesky factor updates to a certain maxupd. That is, it is necessary to update Lm → Lm+maxupd+1 using the multiple-rank (rank-p) sparse Cholesky update algorithm at every maxupd intervals. Once Lm has been updated, the counter m in algorithm 4 is reset, i.e. m ← (m + maxupd + 1), and the algorithm 4 is repeated for all n = m, m + 1, . . . , until the lattice system fractures. Notes on algorithm 4: Implementation of the algorithm 4 takes advantage of the fact that v(n−m) is a sparse vector with at most 2 non-zeros (see Equations (5) and (B1)). Hence, the Copyright 䉷 2005 John Wiley & Sons, Ltd.

Int. J. Numer. Meth. Engng 2005; 62:1982–2008

ALGORITHM FOR MODELLING PROGRESSIVE DAMAGE ACCUMULATION

1991

computational cost of Step 1 is much less than O(nnz(Lm )). For the same reason, the scalar t t products v(n−m) ul and v(n−m) u(n−m) can be evaluated trivially as described below. Depending on whether the degrees of freedom i and j of the broken bond ij ((n + 1)th broken bond) are constrained/prescribed (see Appendix B, Remarks 3 and 4), we have the following expressions for evaluating the scalar products in Steps 3 and 5 of Algorithm 4. 1: if i and j are free degrees of freedom then

j i t 2: v(n−m) = 0 · · · 1 · · · −1 · · · 0 3:

    t t v(n−m) ul = (ul )i − (ul )j and v(n−m) u(n−m) = u(n−m) i − u(n−m) j

4: else if j is a constrained/prescribed degree of freedom then

i t 5: v(n−m) = 0···1···0   t t 6: v(n−m) ul = (ul )i and v(n−m) u(n−m) = u(n−m) i

7: else if i is a constrained/prescribed degree of freedom then

j t 8: v(n−m) = 0 · · · 1 · · · 0   t t 9: v(n−m) ul = (ul )j and v(n−m) u(n−m) = u(n−m) j 10: end if

Similarly, the updating of the load vector bn → bn+1 in Step 10 of the algorithm 4 is computed trivially by updating a single entry of the vector bn as shown below (see Appendix B, Remark 5): 1: if j is a prescribed degree of freedom then 2: (bn+1 )i = (bn )i − kn 3: else if i is a prescribed degree of freedom then 4: (bn+1 )j = (bn )j − kn 5: end if Based on the above implementation of Algorithm 4, the computational cost involved in breaking the (n + 1)th fuse ij (say i and j are free degrees of freedom as in Section 2) is (n + 2 − m) t u saxpy vector updates, one vector inner product, and two triangular solves Lm Lm (n−m) = v(n−m) on a sparse vector v(n−m) with at most two non-zeros. The computational cost of the associated triangular solves is much less than (O(nnz (Lm ))) because of the sparsity of the vector v(n−m) . The optimum number of steps between successive factorization updates of the matrix A is determined by minimizing the cpu time required for the entire analysis. Let tfac denote the average cpu time required for performing or multiple-rank sparse updating of the Cholesky factorization t , and t t Am = Lm Lm back denote the average cpu time required for solving Lm Lm u(n−m) = v(n−m) with a sparse RHS vector v(n−m) . Let tupd denote the average cpu time required for a single rank-1 update of the solution u˜ n+1−m . Note that the evaluation of u˜ n+1−m requires (n − m) saxpy vector updates. Let the estimated number of steps for the lattice system failure be nsteps . Then, the total cpu time required for solving the linear system of equations until the lattice Copyright 䉷 2005 John Wiley & Sons, Ltd.

Int. J. Numer. Meth. Engng 2005; 62:1982–2008

1992

P. K. V. V. NUKALA, S. ŠIMUNOVIC´ AND M. N. GUDDATI

system failure is given by  = nfac tfac + nsteps tback +

n rank1

nrank1 tupd

1

  1 nsteps − nfac nsteps = nfac tfac + nsteps tback + tupd 2 nfac nfac

(8)

where nfac denotes the number of factorization updates until lattice system failure and nrank1 = (nsteps − nfac )/nfac denotes the average number of rank-1 updates per factorization. The optimum number of factorizations, noptfac , for the entire analysis is obtained by minimizing the function  via */*nfac = 0. The maximum number of vector updates, maxupd, between successive factorization updates is then estimated as 

nsteps − noptfac maxupd = noptfac

 (9)

It should be noted that the above estimate of maxupd is based only on minimizing the computational cost. However, for large lattice systems, in core storage of the Cholesky factor Lm , and the ul vectors, where l = 0, 1, 2, . . . , p and p = (n − m), may also become a limiting factor. In such cases, maxupd is chosen based on storage constraints. 3.3. Alternative algorithms based on sparse direct solvers In general, the multiple-rank update of Lm → Ln+1 as in Solver Type 2, after fuses n = m, m + 1, . . . , m + maxupd are broken, is expected to be computationally cheaper than performing the direct factorization of the new stiffness matrix Am+maxupd+1 [10, 11]. However, as the number of updates p = (n − m) increases, especially close to macroscopic fracture, where An+1 is much sparser than Am , a direct factorization of An+1 may become cheaper than multiplerank updating of Lm → Ln+1 . In order to take advantage of the much sparser An+1 and for comparison of different solver types based on sparse direct solvers, we use the following solver types 3 and 4 in the numerical simulation of random threshold fuse model. • Solver Type 3: Given the factor Lm of Am for m  n, the solution vector xn+1 after the (n + 1)th fuse is burnt, for n = m, m + 1, m + 2, . . . , m + maxupd, is obtained by two t on a sparse load vector with at most two non-zeros, and triangular solves using Lm Lm (n + 2 − m) saxpy vector updates (see Section 3.2). The only difference between Solver Types 2 and 3 is that instead of multiple-rank updating of Lm → Lm+maxupd+1 after every maxupd steps, we refactorize Am+maxupd+1 directly to obtain Lm+maxupd+1 . • Solver Type 4: Along with the Solver Type 3, we have also investigated an alternative modified direct method using dense matrix updates. The details of one such algorithm are given in Appendix C. However, based on the operation counts and the actual cpu times measured during numerical simulations, the algorithm presented in Appendix C was found to be computationally inefficient compared with the other algorithms presented in this section. Copyright 䉷 2005 John Wiley & Sons, Ltd.

Int. J. Numer. Meth. Engng 2005; 62:1982–2008

ALGORITHM FOR MODELLING PROGRESSIVE DAMAGE ACCUMULATION

1993

4. CIRCULANT PRECONDITIONERS FOR CG ITERATIVE SOLVERS This study also investigates the use of PCG algorithms for re-solving the new set of Kirchhoff equations (stiffness matrix) every time a fuse is burnt. In particular, we investigate the choice of using circulant and block-circulant matrices as preconditioners to the Laplacian operator on a fractal network. The main advantage is that the preconditioners can be diagonalized by discrete Fourier matrices, and hence the inversion of an N × N circulant matrix can be done in O(N log N ) operations by using FFTs of size N . In addition, since the convergence rate of the PCG method depends on the condition number and clustering of the eigenvalues of the preconditioned system, it is possible to choose a circulant preconditioner that minimizes the condition number of the preconditioned system [16, 17]. Furthermore, these circulant preconditioned systems exhibit favourable clustering of eigenvalues. In general, the more clustered the eigenvalues are, the faster the convergence rate is. As noted earlier, since the initial lattice grid is uniform, the Laplacian operator (Kirchhoff equations) on the initial uniform grid results in a Toeplitz matrix A0 . Hence, a fast Poisson type solver with a circulant preconditioner can be used to obtain the solution in O(N log N ) operations using FFTs of size N. However, as the lattice bonds are broken successively, the initial uniform lattice grid becomes a fractal network. Consequently, although the matrix A0 is Toeplitz (also block Toeplitz with Toeplitz blocks) initially, the subsequent matrices An , for each n, are not Toeplitz matrices. However, An may still possess block structure with many of the blocks being Toeplitz blocks depending on the pattern of broken bonds. Even with the above observation that An for n > 0 is not Toeplitz, for the purposes of direct comparison with the previously published algorithms for solving random thresholds fuse models, this study explores the choice of optimal [17–20] and superoptimal [16, 17] circulant preconditioners along with incomplete Cholesky preconditioners [15] for the Laplacian operator (Kirchhoff equations) on a fractal network. In addition, since the matrices An in general have block structure with many blocks being Toeplitz blocks, we also use block-circulant matrices [17, 21] as preconditioners to the stiffness matrix. In the literature [7–9], Fourieraccelerated PCG has been used to accelerate the iterative solution of random resistor networks near the percolation critical threshold. However, the type of ensemble-averaged circulant preconditioner used in these studies is not optimal in the sense described in References [17–20], and hence is expected to take more CG iterations than the optimal and superoptimal circulant preconditioners. Consider the N × N stiffness matrix A. The optimal circulant preconditioner c(A) [18] is defined as the minimizer of C − A F over all N × N circulant matrices C. In the above description, · F denotes the Frobenius norm [15]. The optimal circulant preconditioner c(A) is uniquely determined by A, and is given by c(A) = F∗ (FAF∗ )F

(10)

where F denotes the discrete Fourier matrix, (A) denotes the diagonal matrix whose diagonal is equal to the diagonal of the matrix A, and ∗ denotes the adjoint (i.e. conjugate transpose). It should be noted that the diagonals of FAF∗ represent the eigenvalues of the matrix c(A) and can be obtained in O(N log N ) operations by taking the FFT of the first column of c(A). The first column vector of T. Chan’s optimal circulant preconditioner matrix that minimizes Copyright 䉷 2005 John Wiley & Sons, Ltd.

Int. J. Numer. Meth. Engng 2005; 62:1982–2008

1994

P. K. V. V. NUKALA, S. ŠIMUNOVIC´ AND M. N. GUDDATI

the norm C − A F is given by ci =

N 1  aj,(j −i+1) mod N N j =1

(11)

The above formula can be interpreted simply as follows: the element ci is simply the arithmetic average of those diagonal elements of A extended to length N by wrapping around and containing the element ai,1 . If the matrix A is Hermitian, the eigenvalues of c(A) are bounded below and above by min (A)  min (c(A))  max (c(A))  max (A)

(12)

where min (·) and max (·) denote the minimum and maximum eigenvalues, respectively. Based on the above result, if A is positive definite, then the circulant preconditioner c(A) is also positive definite. In particular, if the circulant preconditioner is such that the spectra of the preconditioned system is clustered around one, then the convergence of the solution will be fast. The circulant preconditioner t (A) [16] is based on the idea of minimizing the superoptimal norm I − C−1 A F over all non-singular circulant matrices C. Since the asymptotic convergence of the t (A) preconditioned system is same as c(A) for large N, in this study, we limit ourselves to the investigation of preconditioned systems using c(A) given by Equation (11). The computational cost associated with the solution of the preconditioned system c(A)z = r is the initialization cost of nnz(A) for setting the first column of c(A) using Equation (11) during the first iteration, and O(N log N ) during every iteration step. Since the Laplacian operator on a discrete lattice network results in block structure in the stiffness matrix, we use block-circulant matrices [17, 21] as preconditioners to the stiffness matrix. In order to distinguish the block-circulant preconditioners that follow from the abovedescribed circulant preconditioners, we refer henceforth to the above preconditioners as pointcirculant preconditioners. 4.1. Block-circulant preconditioners Let the matrix A be partitioned into r × r blocks such that each block is an s × s matrix. That is, N = rs, and 

A1,1

 A  2,1  A=   ..  .  Ar,1

A1,2 A2,2 .. . Ar,2

···

A1,r



 · · · A2,r    ..  ..  . .   · · · Ar,r

(13)

Although the point-circulant preconditioner c(A) defined by Equation (11) can be used as a preconditioner, in general, the block structure is not restored by using c(A) as a preconditioner. In contrast, the block-circulant preconditioners obtained by using circulant approximations for each of the blocks restore the block structure of A. The block-circulant preconditioner of A Copyright 䉷 2005 John Wiley & Sons, Ltd.

Int. J. Numer. Meth. Engng 2005; 62:1982–2008

ALGORITHM FOR MODELLING PROGRESSIVE DAMAGE ACCUMULATION

1995

can be expressed as 

c(A1,1 )

  c(A )  2,1  cB (A) =   ..  .  c(Ar,1 )

c(A1,2 ) c(A2,2 ) .. . c(Ar,2 )

···

c(A1,r )



 · · · c(A2,r )    ..  ..  . .   · · · c(Ar,r )

(14)

It is the minimizer of C − A F over all matrices C that are r × r block matrices with s × s circulant blocks. The spectral properties given by Equation (12) for point-circulant preconditioners also extend to the block-circulant preconditioners [17, 21]. That is, min (A)  min (cB (A))  max (cB (A))  max (A)

(15)

In particular, if A is positive definite, then the block-preconditioner cB (A) is also positive definite. The computational cost associated with the block-circulant preconditioners can be estimated as follows. Since the stiffness matrix A is real symmetric for the type of problems considered in this study, in the following, we assume block symmetric structure for A, i.e. Aj,i = Ati,j . In forming the block-circulant preconditioner given by Equation (14), it is necessary to obtain point-circulant preconditioners for each of the r × r block matrices of order s. The pointcirculant approximation for each of the s × s blocks requires O(s log s) operations. This cost is in addition to the cost of forming the first column vectors (Equation (11)) for each of the c(Ai,j ) blocks, which is given by nnz(A) operations. Since there are (r(r + 1))/2 blocks, we need O(r 2 s log s) operations to form         FA1,1 F∗  FA1,2 F∗ · · ·  FA1,r F∗      FA F∗   FA F∗  · · ·  FA F∗     2,1 2,2 2,r    = (I ⊗ F)cB (A)(I ⊗ F∗ ) =  (16)  .. .. .. ..   .   . . .          FAr,1 F∗  FAr,2 F∗ · · ·  FAr,r F∗ where ⊗ refers to the Kronecker tensor product and I is an r × r identity matrix. In order to solve the preconditioned equation cB (A)z = r, Equation (16) is permuted to obtain a blockdiagonal matrix of the form   ˜ 1,1 0 ··· 0      ˜ 2,2 · · · 0   0   ∗ ˜ = P P =   (17)  .. ..   .. ..  . . . .    ˜ s,s 0 0 ···  Copyright 䉷 2005 John Wiley & Sons, Ltd.

Int. J. Numer. Meth. Engng 2005; 62:1982–2008

1996

P. K. V. V. NUKALA, S. ŠIMUNOVIC´ AND M. N. GUDDATI

where P is the permutation matrix such that      ˜ k,k =  FAi,j F∗  kk ij

∀1  i, j  r, 1  k  s

(18)

During each iteration, in order to solve the preconditioned system cB (A)z = r, it is necessary to ˜ This task can be performed by first factorizing each of solve with the block-diagonal matrix . ˜ the k,k blocks during the first iteration, and then subsequently using these factored matrices to do the triangular solves. Hence, without considering the first factorizing cost of each of the block diagonals, during each iteration, the number of operations involving the solves with ˜ is  s      delops = O (19) L˜ k,k  k=1

˜ k,k . Therefore, where L˜ k,k denotes the number of non-zeros in the Cholesky factorization of  the system cB (A)z = r can be solved in O(rs log s) + delops operations per iteration. Thus, we conclude that for the block-circulant preconditioner, the initialization cost is nnz(A) + O(r 2 s log s) plus the cost associated with the factorization of each of the diagonal blocks ˜ k,k during the first iteration, and O(rs log s) + delops during every other iteration. Although from the operational cost per iteration point of view, the point-circulant preconditioner may prove advantageous for some problems, it is not clear whether point-circulant or block-circulant is closest to the matrix A in terms of the number of CG iterations necessary for convergence. Hence, we investigate both point- and block-circulant preconditioners in obtaining the solution of the linear system Ax = b using iterative techniques. In addition, we also employ the commonly used point and block versions of the incomplete Cholesky preconditioners to solve Ax = b. Remark 1 In the case of a 2D discrete lattice network with periodic boundary conditions in the horizontal direction and a constant voltage difference between the top and bottom of the lattice network, A is a block tri-diagonal real symmetric matrix. Under these circumstances, the initialization ˜ k,k is tri-diagonal, during cost is nnz(A) + O(rs log s). Since each of the diagonal blocks  ˜ can be obtained in O(rs) operations. Thus, the cost per each iteration the solution involving  iteration is O(rs log s) + O(rs) = O(rs log s) operations. The total computational cost involved in using the block-circulant preconditioner for a symmetric block tri-diagonal matrix is the initialization cost of nnz(A) + O(rs log s), and O(rs log s) operations per iteration step. This is significantly less than the cost of using a generic block-circulant preconditioner. It should be noted that the block tri-diagonal structure of A does not change the computational cost associated with using a point-circulant preconditioner to solve Ax = b. 5. MODEL PROBLEM: TRIANGULAR LATTICE SYSTEM Consider a 2D random threshold fuse lattice system of size L × L with periodic boundary conditions in the horizontal direction and a unit voltage difference, V = 1, applied between the top and bottom of lattice system bus bars (see Figure 1). For the triangular lattice topology, Nel = (3L + 1)(L + 1), and N = L(L + 1). The model problem considered in this study is well Copyright 䉷 2005 John Wiley & Sons, Ltd.

Int. J. Numer. Meth. Engng 2005; 62:1982–2008

ALGORITHM FOR MODELLING PROGRESSIVE DAMAGE ACCUMULATION

1997

described in References [22, 23, 1, p. 45, 2, p. 231], and the references therein. The elastic response of each bond in the lattice is linear up to an assigned threshold value, at which brittle failure of the bond occurs. The disorder in the system is introduced by assigning random maximum threshold current values t, (which is equivalent to the breaking stress in mechanical problem) to each of the fuses (bonds) in the lattice, based on an assumed probability distribution. The electrical conductance (stiffness in the mechanical problem) is assumed to be the same and equal to unity for all the bonds in the lattice. This is justified because the conductance (or stiffness) of a heterogeneous solid converges rapidly to its scale-independent continuum value. A uniform probability distribution, which is constant between 0 and 1, is chosen as the probability distribution of failure thresholds. A broad thresholds distribution represents large disorder and exhibits diffusive damage (uncorrelated burning of fuses) leading to progressive damage localization, whereas a very narrow thresholds distribution exhibits brittle failure in which a single crack propagation causes material failure. This relatively simple model has been extensively used in the literature [1, 2, 22–25] for simulating the fracture and progressive damage evolution in brittle materials, and provides a meaningful benchmark for comparing the performance of different numerical algorithms. As mentioned earlier, damage evolution in materials is simulated by slowly increasing the external voltage on the lattice system until the current i flowing in an individual fuse exceeds its breaking threshold t. At this point, the fuse is burnt irreversibly. Each time a fuse is removed, the electrical current is instantaneously redistributed and a new system of Kirchhoff equations is solved to determine the fuse that is going to burn up under the redistributed currents. The simulation is initiated with an intact lattice, and the burning of fuses under a gradual increase of external voltage is repeated until the entire lattice system falls apart. Figure 2 presents the snapshots of progressive damage evolution for the case of a uniformly distributed random thresholds model problem in a triangular fuse lattice system of size L = 256. Similarly, Figures 3 and 4 present the snapshots of damage and the scaled stress distribution, respectively, in a triangular spring lattice system of size L = 256 (see Appendix A for algorithmic details of Cholesky downdating scheme for simulating a lattice system with spring elements). 5.1. Brief summary of the solver types used in the numerical simulations For the numerical simulation of a random thresholds fuse model, we use the following four alternate solver types presented once again for quick reference: • Solver Type 1: Given the factor Lm of Am , the rank-1 sparse Cholesky update /downdate (algorithm 1) presented in Section 3.1 is used to downdate the factorization Ln+1 for all subsequent values of n = m, m + 1, . . . . Once the factorization Ln+1 of An+1 is obtained, t the solution vector xn+1 is obtained by two triangular solves of Ln+1 Ln+1 xn+1 = bn+1 . • Solver Type 2: Given the factor Lm of Am , the solution vector xn+1 after the (n + 1)th fuse is burnt, for n = m, m+1, . . . , m+maxupd, is obtained using the algorithm presented in Section 3.2. The multiple-rank sparse Cholesky update/downdate algorithm presented in Section 3.1 is used to downdate the factorization Lm → Lm+maxupd+1 after every maxupd steps. • Solver Type 3: Assuming that the factor Lm of matrix Am is available after the mth fuse is burnt, the implementation follows the algorithm 4 presented in Section 3.2 to determine the solution vector xn+1 after the (n+1)th fuse is burnt, for n = m, m+1, . . . , m+maxupd. Factorization of the matrix A is performed every maxupd steps. Copyright 䉷 2005 John Wiley & Sons, Ltd.

Int. J. Numer. Meth. Engng 2005; 62:1982–2008

1998

P. K. V. V. NUKALA, S. ŠIMUNOVIC´ AND M. N. GUDDATI

Figure 2. Snapshots of damage in a typical triangular lattice system of size L = 256. Number of broken bonds at the peak load and at failure are 22 911 and 24 918, respectively. (a) – (i) represent the snapshots of damage after nb bonds are broken: (a) nb = 6250; (b) nb = 12 500; (c) nb = 18 750; (d) nb = 22 911 (peak load); (e) nb = 23 500; (f) nb = 24 000; (g) nb = 24 500; (h) nb = 24 750; and (i) nb = 24 918 (failure).

• Solver Type 4: Assuming that the factor Lm of Am is available, the dense matrix update (algorithm 5) presented in Appendix C is used to obtain the solution vector xn+1 after the (n + 1)th fuse is burnt, for n = m, m + 1, . . . , m + maxupd. Once again, factorization of the matrix A is performed every maxupd steps. In addition to the above four types of direct solvers, we have performed numerical simulations using PCG iterative solvers. The first iterative solver is CG without any preconditioner. An Copyright 䉷 2005 John Wiley & Sons, Ltd.

Int. J. Numer. Meth. Engng 2005; 62:1982–2008

ALGORITHM FOR MODELLING PROGRESSIVE DAMAGE ACCUMULATION

1999

Figure 3. Snapshots of damage (broken bonds) evolution in a typical triangular spring lattice system of size L = 256. Number of broken bonds at the peak load and at failure are 13 864 and 16 695, respectively: (1) – (9) represent the snapshots of damage after nb bonds are broken: (1) nb = 5000; (2) nb = 10 000; (3) nb = 12 000; (4) nb = 13 000; (5) nb = 14 000 (just after peak load); (6) nb = 15 000; (7) nb = 15 500; (8) nb = 16 000; and (9) nb = 16 500 (close to failure).

incomplete Cholesky with a drop tolerance of 10−3 is used as the second PCG iterative solver. The drop tolerance sets the small elements of the Cholesky factor to zero after they are computed but before they update other coefficients. Elements are dropped if they are smaller than drop tolerance times the norm of the column of the Cholesky factor and they are not on the diagonal and they are not in the non-zero pattern of An . In addition, the factorization is also modified so that the row sums of Ln Lnt are equal to the row sums of An . Circulant (Equation (11)) and block-circulant (Equation (14)) PCG have also been used to accelerate the iterative solution. Copyright 䉷 2005 John Wiley & Sons, Ltd.

Int. J. Numer. Meth. Engng 2005; 62:1982–2008

2000

P. K. V. V. NUKALA, S. ŠIMUNOVIC´ AND M. N. GUDDATI

Figure 4. Scaled stress distribution corresponding to the snapshots of damage in Figure 3.

Once again, it should be noted that the Fourier accelerated PCG presented in References [7–9] is not optimal in the sense described in References [17–20], and hence is expected to take more CG iterations compared with the optimal and superoptimal circulant preconditioners. In the numerical simulations using solver types 1–4, the maximum number of vector updates, maxupd, is chosen to be a constant for a given lattice size L. We choose maxupd = 25 for L = {4, 8, 16, 24, 32}, maxupd = 50 for L = 64, and maxupd = 100 for L = {128, 256, 512}. For L = 512, maxupd is limited to 100 by memory constraints. By keeping the maxupd value constant, it is possible to compare realistically the computational cost associated with different solver types. Moreover, the relative cpu times taken by these algorithms remain the same even when the simulations are performed on different platforms. Copyright 䉷 2005 John Wiley & Sons, Ltd.

Int. J. Numer. Meth. Engng 2005; 62:1982–2008

ALGORITHM FOR MODELLING PROGRESSIVE DAMAGE ACCUMULATION

2001

It is well known that the convergence rate of an iterative solver depends on the quality of starting vector. When PCG iterative solvers described in Section 4 are used in updating the solution xn → xn+1 , we use either the solution from the previous converged state, i.e. xn , or the zero vector, 0, as the starting vectors for the CG iteration of xn+1 . The choice of using either xn or 0 as the starting vectors depends on which vector is closer to bn+1 in 2-norm. That is, if bn+1 − An xn 2  bn+1 2 , then xn is used, else 0 is used as the starting vector for the xn+1 CG iteration. A relative residual tolerance  = 10−12 is used in CG iteration. 5.2. Numerical simulation results Tables I–IV present the cpu and wall-clock times taken for one configuration (simulation) using the solver types 1–4, respectively. These tables also indicate the number of configurations, Table I. Solver type 1. Size

CPU (s)

Wall (s)

Simulations

32 64 128 256 512

0.592 10.72 212.2 5647 93 779

0.687 11.26 214.9 5662 96 515

20 000 4000 800 96 16

Table II. Solver type 2. Size

CPU (s)

Wall (s)

Simulations

32 64 128 256

0.543 11.15 211.5 6413

0.633 12.01 214.1 6701

20 000 4000 800 96

Table III. Solver type 3. Size

CPU (s)

Wall (s)

Simulations

32 64 128 256

0.566 9.641 203.1 6121

0.655 10.59 213.4 6139

20 000 4000 800 96

Table IV. Solver type 4. Size

CPU (s)

Wall (s)

Simulations

32 64 128 256

0.679 11.18 254.4 6112

0.771 12.28 260.2 6147

20 000 4000 800 96

Copyright 䉷 2005 John Wiley & Sons, Ltd.

Int. J. Numer. Meth. Engng 2005; 62:1982–2008

P. K. V. V. NUKALA, S. ŠIMUNOVIC´ AND M. N. GUDDATI

2002

Nconfig , over which ensemble averaging of the numerical results is performed. The cpu and wall-clock times taken by the iterative solvers are presented in Tables V–VIII. For iterative solvers, the number of iterations presented in Tables V–VIII denote the average number of total iterations taken to break one intact lattice configuration until it falls apart. In the case of iterative solvers, some of the simulations for larger lattice systems were not performed either because they were expected to take larger cpu times or the numerical results do not influence the conclusions drawn in this study. Based on the results presented in Tables I–VIII, it is clear that for modelling the breakdown of disordered media as in starting with an intact lattice and successive breaking of bonds until the lattice system falls apart, the solver types 1–4 based direct solvers are superior to the iterative PCG solver techniques. In particular, for larger lattice systems, Solver Type 1 based on algorithm 1 presented in Section 3.1 for rank-1 sparse Cholesky downdate of the Cholesky factor L is superior to the Solver Types 2–4. It should be noted that for larger lattice systems, limitations on the available memory of the processor may decrease the allowable maxupd Table V. CG iterative solver (no preconditioner). Size 32 64

CPU (s)

Wall (s)

Iterations

Simulations

7.667 203.5

8.016 205.7

66 254 405 510

20 000 1600

Table VI. CG iterative solver (incomplete Cholesky). Size

CPU (s)

Wall (s)

Iterations

Simulations

32 64 128

2.831 62.15 1391

3.008 65.61 1430

5857 29 496 148 170

20 000 4000 320

Table VII. T. Chan’s optimal circulant PCG. Size

CPU (s)

Wall (s)

Iterations

Simulations

32 64 128

11.66 173.6 7473

12.26 178.8 7725

25 469 120 570 622 140

20 000 1600 128

Table VIII. T. Chan’s block-circulant PCG. Size

CPU (s)

Wall (s)

Iterations

Simulations

32 64 128 256

10.00 135.9 2818 94 717

10.68 139.8 2846 96 500

11 597 41 207 147 510

20 000 1600 192 32

Copyright 䉷 2005 John Wiley & Sons, Ltd.

Int. J. Numer. Meth. Engng 2005; 62:1982–2008

ALGORITHM FOR MODELLING PROGRESSIVE DAMAGE ACCUMULATION

2003

value, as in the case of L = 256 using Solver Types 2 and 3. However, this is not a concern for simulations performed using Solver Type 1. In the case of Fourier accelerated solvers (circulant and block-circulant) for the model problem considered, the block-circulant solver is superior to T. Chan’s optimal circulant PCG solver.

6. CONCLUSIONS The paper presents a computational methodology based on a rank-p update of the stiffness matrix that alleviates the computational complexities involved in simulating fracture and damage evolution in disordered quasi-brittle materials using discrete lattice networks. Numerical simulation results using 2D random resistor networks show that the present algorithms based on direct sparse solvers Solver Types 1–4 are computationally superior to the commonly used Fourier accelerated preconditioned conjugate gradient iterative solver. In particular, for large system sizes, Solver Type 1 based on successive Cholesky factor downdates is superior to other solver types. Furthermore, these algorithms completely eliminate the critical slowing down observed in fracture simulations using the conventional iterative schemes. Given the factorization Lm of Am after the mth bond is broken, the computational cost associated with breaking the (n + 1)th bond can be estimated as • For Solver Type 1, the algorithm 1 requires a rank-1 sparse Cholesky downdate and two triangular solves on a sparse vector with at most two non-zeros. The computational cost associated with rank-1 sparse Cholesky downdate is at most O(nnz(Lm )), and that associated with two triangular solves is much less than O (nnz (Lm )) because of the sparsity of the RHS vector. • For Solver Type 2, which is based on the combination of algorithms 1 and 4, the computational complexity is (n + 2 − m) saxpy vector updates, one vector inner product, and two triangular solves on a sparse vector with at most two non-zeros. The computational cost of the associated triangular solves is much less than (O(nnz(Lm ))) because of the sparsity of the RHS vector. The Cholesky factor Lm → Lm+maxupd+1 is updated after every maxupd steps using the algorithm 1. • For Solver Type 3 based on Algorithm 4, the associated computational cost is exactly the same as that of Solver Type 2, except that after every maxupd steps, Solver Type 3 re-factorizes the matrix Am+maxupd+1 , whereas Solver Type 2 performs a rank-maxupd + 1 sparse Cholesky downdate of Lm → Lm+maxupd+1 . • For Solver Type 4, the associated computational cost is much higher than the Solver Type 3, the details of which are presented in Appendix C. The results presented in Tables I–VIII indicate that algorithms based on direct solvers (solver types 1–4) are superior to the iterative PCG solver techniques. In particular, for large lattice systems, Solver Type 1 based on Algorithm 1 is superior to the Solver Types 2–4. In the case of iterative solvers, the block-circulant solver is computationally superior to T. Chan’s optimal circulant PCG solver, which in turn is superior to the Fourier accelerated PCG solvers used in the simulation of lattice systems [7–9]. In fracture simulations using discrete lattice networks, ensemble averaging of numerical results is necessary to obtain a realistic representation of the lattice system response. In this regard, for very large lattice systems, this methodology is especially advantageous as the Cholesky Copyright 䉷 2005 John Wiley & Sons, Ltd.

Int. J. Numer. Meth. Engng 2005; 62:1982–2008

2004

P. K. V. V. NUKALA, S. ŠIMUNOVIC´ AND M. N. GUDDATI

factorization of the system of equations can be performed using a parallel implementation on multiple processors. Subsequently, this Cholesky decomposition can then be distributed to each of the processors to continue with independent fracture simulations that only require less intensive triangular solves. This technique is particularly advantageous for investigating the effects of various disorders on material fracture characteristics.

APPENDIX A: ALGORITHM FOR CENTRAL-FORCE (SPRING) MODELS For the case of central-force models, let PQ denote the (n + 1)th bond (spring) that is broken when the externally applied displacement (or force) is increased gradually. Let (Px , Py , Pz ) and (Qx , Qy , Qz ) refer to the x, y, and z degrees of freedom associated with the nodes P and Q, respectively. Let the direction cosines of the bond PQ be given by (nx , ny , nz ). The new stiffness matrix An+1 of the lattice system after the bond PQ is broken is given by An+1 = An − kPQ vvt

(A1)

where kPQ refers to the stiffness of the bond PQ, and the vector v is given by vt = {0 · · · nx ny nz · · · −nx −ny −nz · · · 0 }

(A2)

The rest of the procedure for updating the solution xn → xn+1 is similar to Section 2. Whenever any of the degrees of freedom are constrained or prescribed an externally applied displacement (see Appendix B, Remark 3), the corresponding elements of the vector v have zero entries, and the rest of the procedure is identical to the one described in Section 2. The treatment of periodic boundary conditions is also identical to the procedure described in Remark 4 of Appendix B. In the case of updating the load vector bn after the bond PQ is broken, for presentation purposes, let Q denote the node at which an external displacement is prescribed. Although it is not necessary, let us also assume that the external displacement (x , y , z ) is prescribed in all the x, y, and z directions. Then the updated load vector bn+1 is given by bn+1 = bn + w, where    x {0 0 · · · −n2x −nx ny −nx nz · · · 0 0 }        t 2 w = kPQ + y {0 0 · · · −nx ny −ny −ny nz · · · 0 0 }         +  {0 2 0 · · · −nx nz −ny nz −nz · · · 0 0 } z When external displacement is prescribed only in certain directions at the node Q, a similar procedure is followed for updating the load vector bn+1 . Once again, the implementation of the algorithm takes into account the sparse nature of the vectors v and w in all of its calculations. Therefore, based on the above presentation, the computational cost associated with the centralforce (spring) models is same as that for scalar fuse models. Remark 2 When the elastic response of the individual elements of the lattice system is modelled by beam or bond-bending models, breaking a bond PQ, in most cases, does not lead to a rank-1 update of the stiffness matrix. In the case of beam elements, breaking the (n + 1)th bond PQ is Copyright 䉷 2005 John Wiley & Sons, Ltd.

Int. J. Numer. Meth. Engng 2005; 62:1982–2008

2005

ALGORITHM FOR MODELLING PROGRESSIVE DAMAGE ACCUMULATION

equivalent to updating the stiffness matrix An by a matrix B such that An+1 = An + B is given by B = −Tt Et Kcor ET



B, and (A3)

where Kcor represents the 6 × 6 corotational stiffness matrix of the beam, E represents the 12 × 6 local to corotational transformation  matrix, and T represents the 12 × 12 global to local transformation matrix. The symbol assembles the matrix B into the matrix An to obtain An+1 . The reader is referred to Reference [26] and the references therein for a detailed exposition on 3D corotational beam element formulations. In the case of 3D beam elements, the matrix Kcor is given by Kcor =

 2EIz   2EIy  EA GJ y1 y1t + y2 y2t + 3y3 y3t + y4 y4t + 3y5 y5t + y6 y6t L L L L

= YCcor Yt

(A4)

  where Ccor = diag EA/L, GJ /L, 6EIy /L, 2EIy /L, 6EIz /L, 2EIz /L , E and G denote the Young’s modulus and shear modulus of the material (positive constants), and L, A, J , Iy and Iz denote the length, area, St Venant torsion constant, y moment of inertia, and z moment of inertia of the beam cross-section, respectively. The matrix Y is given by Y = [y1 , y2 , y3 , y4 , y5 , y6 ], and the mutually orthogonal vectors y1−6 are given by y1 = {1, 0, 0, 0, 0, 0}t

y2 = {0, 1, 0, 0, 0, 0}t

y3 = {0, 0, 1, 0, 1, 0}t

y4 = {0, 0, 1, 0, −1, 0}t

y5 = {0, 0, 0, 1, 0, 1}t

y6 = {0, 0, 0, 1, 0, −1}t

Hence,B is a rank-6 matrix, and the updating of An to An+1√can be envisioned as An+1 = t An + Zn+1 Ztn+1 = An + Vn+1 Vn+1 , where Zn+1 = Tt Et Y Ccor is a 12 × 6 matrix, and √  t Vn+1 is the assembled version, i.e. Vn+1 Vn+1 = Zn+1 Ztn+1 . The matrix Ccor is obtained by taking the square root of each of the diagonal entries of Ccor . A dense matrix update similar to Algorithm 5 in Appendix Cmay be used to solve An+1 xn+1 = bn+1 . However, since the matrix update An+1 = An + B is a rank-6 update, the computational cost associated with updating the solution xn+1 based on the factor Lm of Am for m  n, increases rapidly as the number of updates, p = (n + 1 − m), increases. Under these circumstances, it may be advantageous to block-update the factor Ln after the nth bond is broken using Vn+1 based on Algorithm 3. That is, Ln+1 = SpChol (Ln , Vn+1 , ). APPENDIX B: SPECIAL CASES OF BOND BREAKING Remark 3 (Breaking of a fuse attached to a constrained /prescribed degree of freedom) In Section 2, we have described a methodology for updating the factors of a stiffness matrix after breaking a fuse ij (nth broken bond) that connects two ‘free’ degrees of freedom i Copyright 䉷 2005 John Wiley & Sons, Ltd.

Int. J. Numer. Meth. Engng 2005; 62:1982–2008

2006

P. K. V. V. NUKALA, S. ŠIMUNOVIC´ AND M. N. GUDDATI

and j . In the following, the methodology is applied to a special case when the fuse that is broken is attached to a constrained /prescribed degree of freedom. Without loss of generality, let us assume that the degree of freedom j is either constrained or prescribed by an externally applied voltage. Then, the vector vn in Algorithm 1 is given by   i vnt = 0 · · · 1 · · · 0 (B1) Remark 4 Consider the case of a broken fuse j k that is attached to a slave degree of freedom k whose master degree of freedom is i. This type of situation is particularly important in the case of lattice models with periodic boundary conditions. Under these circumstances, the methodology presented in Remark 3 is applicable in a straightforward manner if it is understood that breaking the fuse j k is equivalent to breaking the fuse ij . Remark 5 (Updating the load vector bn after breaking a fuse) Without loss of generality, let us assume that the external loading on the lattice system is voltage controlled, i.e. a constant voltage difference is imposed on the lattice system. Let bn denote the load vector on the lattice system after n fuses have been burnt because of the externally applied voltage conditions. In the following, we investigate how the load vector bn+1 needs to be updated following breaking of the (n + 1)th fuse ij . The load vector bn+1 will differ from the load vector bn only if the (n + 1)th broken fuse ij is attached to a prescribed degree of freedom, where a constant voltage difference is imposed. Once again, for presentation purposes, let us assume that j is such a prescribed degree of freedom. Then the load vector bn+1 is given by (B2) bn+1 = bn + wn where

  i wnt = kn 0 0 · · · −1 · · · 0 0

(B3)

If neither i nor j is a prescribed degree of freedom, then wn = 0. In the implementation of the algorithms, only the non-zero value of wn (ith entry) is stored, and the corresponding value (bn+1 )i = (bn )i + (wn )i is updated directly. APPENDIX C: AN ALTERNATIVE DIRECT METHOD USING DENSE MATRIX UPDATE t . The objective is to find the solution x Consider the Cholesky decomposition Am = Lm Lm n+1 of the equation An+1 xn+1 = bn+1 knowing Lm and the N × p update matrix Yn+1 . We have t An+1 = Am + Yn+1 Yn+1 , and p = (n + 1 − m). For an update  = + 1 and is equal to −1 for a downdate. Let Yn+1 = [Yn | y], where Yn is the update matrix after the nth fuse is broken and y is the rank-1 update vector such that t An+1 = Am + Yn+1 Yn+1

= Am + Yn Ynt + yyt = An + yyt Copyright 䉷 2005 John Wiley & Sons, Ltd.

(C1) Int. J. Numer. Meth. Engng 2005; 62:1982–2008

ALGORITHM FOR MODELLING PROGRESSIVE DAMAGE ACCUMULATION

2007

In terms of the factor Lm , the matrix An+1 can be expressed as t An+1 = Am + Yn+1 Yn+1   t t Lm = Lm I + Wn+1 Wn+1

(C2)

where Lm Wn+1 = Yn+1 . That is, based on the decomposition of Yn+1 = [Yn | y], we have Wn+1 = [Wn | w], where Lm Wn = Yn and Lm w = y. Based on the factorization given by Equation (C2), the solution of An+1 xn+1 = bn+1 can be obtained using Algorithm 5. In this algorithm, steps 1 and 3 together amount to a triangular t solve using the current factor Lm . The solution of I + Wn+1 Wn+1 z2 = z1 is obtained by as z = z − W z , and then obtaining z3 by solving first expressing the solution z 2 2 1 n+1 3   t t I + Wn+1 Wn+1 z3 = Wn+1 z1 . Algorithm 5 (Dense Matrix Update) 1: Solve Lm z1 = bn+1 t 2: Solve (I + Wn+1 Wn+1 )z2 = z1 t z3 = Wn+1 z1 t x 3: Solve Lm n+1 = z2

using z2 = z1 − Wn+1 z3 , where



t I+Wn+1 Wn+1



t In the following, we obtain the Cholesky factorization, Lwn+1 , of (I + Wn+1 Wn+1 ) simply t by updating the Cholesky factorization, Lwn , of the matrix (I + Wn Wn ). We have, ! ! ! t t Lwn 0 Lwn Lw Wnt w Lw s n n t (C3) = Wn+1 ) = (I + Wn+1 0  st  wt Wn (1 + wt w)

where Lwn s = Wnt w, and 2 = (1 + wt w − st s). Hence, ! Lwn 0 Lwn+1 = st 

(C4)

t t and the solution z3 is obtained by backsolving Lwn+1 Lw z = Wn+1 z1 . n+1 3 The computational cost associated with this dense matrix update is significantly higher than the algorithm based on successive rank-1 downdates of Cholesky factor (Solver Type 1). The computational cost of the algorithm is two triangular solves using the already computed factor  Lm , (3p+1) vector inner products, and nnz(Lm )+nnz(Lwn )+2nnz Lwn+1 operations in solving w, s, and z3 , respectively. If the load vector bn+1 = bn , then the operations count reduces by p vector inner products.

ACKNOWLEDGEMENTS

This research is sponsored by the Mathematical, Information and Computational Sciences Division, Office of Advanced Scientific Computing Research, U.S. Department of Energy under contract number DE-AC05-00OR22725 with UT-Battelle, LLC. The first author wishes to thank Dr. Ed F. D’Azevedo for many helpful discussions and excellent suggestions. The authors thank the reviewers for their comments and suggestions in improving the presentation style and the quality of the manuscript. Copyright 䉷 2005 John Wiley & Sons, Ltd.

Int. J. Numer. Meth. Engng 2005; 62:1982–2008

2008

P. K. V. V. NUKALA, S. ŠIMUNOVIC´ AND M. N. GUDDATI

REFERENCES 1. Hansen A, Roux S. Statistical Toolbox for Damage and Fracture. Springer: New York, 2000; 17 – 101. 2. Herrmann HJ, Roux S (eds). Statistical Models for the Fracture of Disordered Media. North-Holland: Amsterdam, 1990. 3. Sahimi M. Non-linear and non-local transport processes in heterogeneous media from long-range correlation percolation to fracture and materials breakdown. Physics Reports 1998; 306:213 – 395. 4. Chakrabarti BK, Benguigui LG. Statistical Physics of Fracture and Breakdown in Disordered Systems. Oxford Science Publications: Oxford, 1997. 5. de Arcangelis L, Hansen A, Herrmann HJ, Roux S. Scaling laws in fracture. Physical Review B 1989; 40(1):877 – 880. 6. Delaplace S, Pijaudier-Cabot G, Roux S. Progressive damage evolution in discrete lattice models and consequences on continuum modelling. Journal of Mechanics and Physics of Solids 1996; 44:99 – 136. 7. Batrouni GG, Hansen A, Nelkin M. Fourier acceleration of relaxation processes in disordered systems. Physical Review Letters 1986; 57:1336 – 1339. 8. Batrouni GG, Hansen A. Fourier acceleration of iterative processes in disordered-systems. Journal of Statistical Physics 1988; 52:747 – 773. 9. Batrouni GG, Hansen A. Fracture in three-dimensional fuse networks. Physical Review Letters 1998; 80:325 – 328. 10. Davis TA, Hager WW. Modifying a sparse Cholesky factorization. SIAM Journal on Matrix Analysis and Applications 1999; 20(3):606 – 627. 11. Davis TA, Hager WW. Multiple-rank modifications of a sparse Cholesky factorization. SIAM Journal on Matrix Analysis and Applications 2001; 22(4):997 – 1013. 12. Hughes TJR. The Finite Element Method. Prentice-Hall: Englewood Cliffs, NJ, 1987. 13. Gill PR, Golub GH, Murray W, Saunders MA. Methods for modifying matrix factorizations. Mathematics of Computation 1974; 28:505 – 535. 14. Gill PE, Murray W, Saunders MA. Methods for computing and modifying the LDV factors of a matrix. Mathematics of Computation 1975; 29:1051 – 1077. 15. Golub GH, van Loan CF. Matrix Computations. The Johns Hopkins University Press: Baltimore, MD, 1996. 16. Tyrtyshnikov E. Optimal and superoptimal circulant preconditioners. SIAM Journal on Matrix Analysis and Applications 1992; 13:459 – 473. 17. Chan RH, Ng MK. Conjugate gradient methods for Toeplitz systems. SIAM Review 1996; 38(3):427 – 482. 18. Chan T. An optimal circulant preconditioner for Toeplitz systems. SIAM Journal on Scientific and Statistical Computing 1988; 9:766 – 771. 19. Chan RH. Circulant preconditioners for Hermitian Toeplitz systems. SIAM Journal on Matrix Analysis and Applications 1989; 10:542 – 550. 20. Chan R, Chan T. Circulant preconditioners for elliptic problems. Numerical Linear Algebra with Applications 1992; 1:77 – 101. 21. Chan RH, Jin XQ. A family of block preconditioners for block systems. SIAM Journal on Scientific and Statistical Computing 1992; 13:1218 – 1235. 22. de Arcangelis L, Redner S, Herrmann HJ. A random fuse model for breaking processes. Journal of Physics (Paris) Letters 1985; 46(13):585 – 590. 23. Sahimi M, Goddard JD. Elastic percolation models for cohesive mechanical failure in heterogeneous systems. Physical Review B 1986; 33:7848 – 7851. 24. Duxbury PM, Beale PD, Leath PL. Size effects of electrical breakdown in quenched random media. Physical Review Letters 1986; 57(8):1052 – 1055. 25. Duxbury PM, Leath PL, Beale PD. Breakdown properties of quenched random systems: the random-fuse network. Physical Review B 1987; 36:367 – 380. 26. Crisfield MA. A consistent co-rotational formulation for nonlinear, three-dimensional, beam elements. Computer Methods in Applied Mechanics and Engineering 1990; 81:131.

Copyright 䉷 2005 John Wiley & Sons, Ltd.

Int. J. Numer. Meth. Engng 2005; 62:1982–2008