Partial Refactorization in Sparse Matrix Solution : A New Possibility for Faster Nonlinear Finite Element Analysis

This paper proposes a partial refactorization for faster nonlinear analysis based on sparse matrix solution, which is nowadays the default solution choice in finite element analysis and can solve finite element models up to millions degrees of freedom. Among various fill-in’s reducing strategies for sparse matrix solution, the graph partition is in general the best in terms of resultant fillins and floating-point operations and furthermore produces a particular graph of sparse matrix that prevents local change of entries from wide spreading in factorization. Based on this feature, an explicit partial triangular refactorization with local change is efficiently constructed with limited additional storage requirement in row-sparse storage scheme. The partial refactorization of the changed stiffness matrix inherits a big percentage of the original factor and is carried out only on partial factor entries. The proposedmethod provides a new possibility for faster nonlinear analysis and ismainly suitable formaterial nonlinear problems and optimization problems. Compared to full factorization, it can significantly reduce the factorization time and can make nonlinear analysis more efficient.


Introduction
Nonlinearities [1,2] occur in practical applications of many engineering fields.For example, in design of steel structures, elastoplastic analyses are necessary to compute the limit loads of truss, frame, or shell structures.In area of concrete structural design or soil mechanics, complicated nonlinear constitutive relations have to be involved for realistic descriptions.In case of cable design, geometrically nonlinear effects have to be considered for large displacements.In numerical simulation of rainfall effect on stability of slope, the changes in pore-water pressure will alter the elastoplastic behavior of geologic media.In analysis of seepage [3], the coefficient of permeability changes with capillary pressure and can be also treated as nonlinear problems.
The nonlinear equilibrium equations are typified by the nonlinear system of equations: R (x) = 0. (1) Generally, this can be achieved in finite element methods by the so-called Newton-Raphson method, which involves the linearization of the equilibrium equations, given by where R(x) [u] indicates directional derivative of R at x in the direction of an increment u and K indicates the tangent matrix.In each iteration step , a linear system of equations has to be solved.Nowadays, for up to about two million equations, direct methods may be still dominant for the solution, compared with iterative methods.As stated in [1], the advantage of direct solvers lies on their capability to solve even ill-conditioned and indefinite system of equations, which is especially interesting for nonlinear applications.Moreover, direct methods are generally robust.
The convergence behavior of Newton-Raphson method is advantageous.Generally only a few iterations are needed to obtain the solution.However, in every iteration step, triangularization of the tangent matrix K(x  ) in (2) involves extensive computational effort for large finite element models.
To reduce time consumption of triangular factorization of K, the modified Newton method is proposed in which K is not changed in several successive iteration steps.Therefore the number of refactorizations of K decreases and total computing time might be saves.In practice, this saving may be counterbalanced by the fact that modified Newton method converges much slower than Newton-Raphson method, while repetitive residuum load vector building, forward reduction, and back substitution in each iteration step become then time consuming.
Fortunately, it can be noticed that between two adjacent iteration steps, nonlinear phenomenon usually appears locally.For example, stress concentration often occurs around one or several points; plastic deformation usually starts from local elements and then spreads around; in analysis of seepage, only elements around the interface of liquid need to change the coefficient of permeability, and so forth.All these features are equivalent to local change of tangent matrix K; that is to say, only a small or very small percentage (but the number might be large such as five thousand rows) of stiffness coefficients will be changed in each linearized step.
In light of this, a new algorithm is proposed in this study to refactorize tangent matrix with local change.Our goal is to save time of triangular factorization, to make Newton-Raphson method more efficient and thus to provide a new possibility for faster nonlinear analysis.Generally speaking, local change occurs frequently in material nonlinear problems and optimization problems; thus the proposed algorithm is mainly suitable for them.In the worst case, a global change causes that the proposed algorithm degrades to full factorization.
An outline of this paper follows.In Section 2, background on sparse direct solution techniques is given.In Section 3, we discuss the effect propagation of local change in the matrix factor and conclude that only part of matrix factor needs to be recalculated.Based on all preparations, a new algorithm for refactorization is described in Section 4. The cost of computation is predictably reduced, and the precision is unaffected.In Section 5, numerical examples are given to illustrate performance of the proposed algorithm.Finally in Section 6, it is concluded that the algorithm proposed in this paper is significantly efficient and suitable for local change of structures.This method can be applied to a wide range of engineering problems and could be the foundation of faster nonlinear finite element analysis.

Sparse Direct Solution Techniques
Since 1990s, direct solution techniques of finite element analysis [4] have developed from conventional variable bandwidth [5] or frontal solutions [6] to various sparse solutions [5,[7][8][9][10][11], such as backward-reference (left-looking), forwardreference (right-looking), and multifrontal [12,13] methods.They have yielded a breakthrough in terms of solution speed and storage requirement in finite element analysis.It is quite safe to conclude that solvers based on various row pivoting sparse storage schemes are, in terms of the solution time, memory requirement and scale of problems, much more efficient than those based on variable bandwidth or skyline storage schemes.With the achievements in hardware, the analysis of 10,000 to 500,000 nodes finite element calculations on microcomputers becomes popular.Since 1995, commercial FEA packages have turned to sparse direct solvers and gained a speedup of more than 10 times.The scale of problems that can be solved on certain machines increased about 3 to 10 times.
Sparse solution has two essential features.One is sparse index storage scheme, also called compact storage scheme, storing nonzero entries of matrix in a row or column compact form, which significantly improves the efficiency of time and space because only nonzero entries are considered in triangular factorization.The other is fill-in's reducing, an optimization procedure before numerical factorization, which finds an optimized pivoting sequence by permuting rows and columns of matrix, so that fill-ins as well as floatpoint operations of factorization will be decreased greatly.
Generally, a stiffness matrix of finite element analysis can be considered as an adjacency list of graph, in which a vertex presents an equation and a nonzero off-diagonal entry implies that the two corresponding vertices are adjacent.At present, one of practical fill-reducing algorithms is the so-called multilevel graph partition [14][15][16], a special case of graph partition.It is a divide and conquer algorithm, which partitions global optimization cost functions into some unrelated cost functions of subproblems and additional cost function between these subproblems.Approximately, the global cost function is minimized if the additional cost function is minimized and the scales of subproblems are roughly equal.The division guarantees that local change on one side does not spread to the other side.In other words, the graph partition prevents local change from wide spreading.
Taking advantage of both row-sparse solution and fillreducing, a new algorithm to refactorize the tangent matrix for local change is designed.Only a part of rows of the matrix factor involves recalculation, and therefore the computational cost decreases.It is to note that no approximation is assumed in the presented algorithm; therefore accuracy is guaranteed as the same of full refactorization.
For simplicity, we assume in following discussions that a pivoting sequence of fill-in's reducing of a positive definite global stiffness matrix has been already found through a graph partition algorithm, such as METIS [19].

Effect of Local Change in Sparse Solution
In each iteration step of Newton-Raphson method, nonlinear finite element analysis yields to a system of linear equations as (2) or closely where K(x  ) denotes the tangent stiffness matrix, which is symmetric positive definite generally in finite element method, −R(x  ) the residuum vector, and u the displacement increment vector.At present, general solution procedure of finite element equation ( 3) is LDL T , which decomposes K into product do row  = 1,  do  = all appropriate rows (  ̸ = 0) RowTask (, ) end do RowTask (, ) end do Algorithm 1: Simple  form of LDL T factorization [17,18]. of lower triangular matrix L, diagonal matrix D, and upper triangular matrix L T , as With the result of LDL T , the displacement u can be obtained easily by forward reduction and back substitution.In the process, factorization of stiffness matrix K is the most time consuming.In nonlinear analysis, repetitive factorizations are carried out at each iteration step.It is to point out that, in two adjacent iteration steps, the tangent stiffness matrices differ frequently only from each other on a small portion of elements.Besides, the difference is frequently local.
Suppose K = (  ) ∈ R × , the factors L = (  ) ∈ R × , and D = diag(  ) ∈ R × , which are a lower triangular matrix with unit diagonal and a diagonal matrix, respectively, where  is the number of equations.With regard to memory management, U shares the same memory locations as L T and D together, since the unit diagonal of L T does not necessarily occupy any storage space.Practically, we denote K before factorization and U/DL T after factorization as A = (  ) ∈ R × .The bulk of the work in the LDL T factorization occurs in a triple nested loop around a single statement [20]: There are six different forms [17] to implement LDL T factorization, which are six permutations of the indices , , .Row-sparse factorization [21] and its improvement [8,18] are foundations of the present sparse solution implementation, in which  form and its variations are used.The simple  form can be expressed in terms of tasks [18,22]: (1) RowTask (, ): reduction of the target th row by a multiple of the th row,  < , that is, elimination of   ; (2) RowTask (, ): division of the off-diagonals of the target th row by its diagonal.
The RowTask (, ) itself involves an -loop from  to .We can symbolically write this procedure as shown in Algorithm 1.
As stiffness matrix is sparse, most entries in K as well as in L or U vanish, and therefore, only nonzero entries are actually involved in calculation.Besides nonzero entries in K, the factor L or U contains additional nonzero entries generated by factorization.Thus, it is necessary to predetermine symbolically, not numerically, which entries will become nonzero in L or U.This step is called symbolic analysis.After this, it can be analyzed by (5) how the variation of entries of stiffness matrix K will affect the factors L or U and D and how the effect will propagate in the L or U.
Since the matrices differ on a small portion of elements, consider a change of row  of U which is caused by previous eliminations or entries change of K directly.It will cause change of row  that   ̸ = 0 as shown in Algorithm 1.Take (6), for example.Change of row 1 generally causes change of row 3 and row 7, since change of  13 and  17 spreads to  3 and  7 .Change of row 3 spreads changes to row 5 and row 7: ) ) ) ) .
In general, change of diagonal entry directly affects value of all nonzero entries of this row in the matrix, and therefore, for convenience, let a row be the smallest unit of change, corresponding to an equation or a general displacement of the structure.Through the previous analysis, the rule of direct effect of change is summarized as follows.
Rule.In LDL T factorization, change of a row in stiffness matrix K directly affects values of this row in factorization results U(L T ) and D, and then affects rows corresponding to the columns where nonzero entries of this row are.
Consider (6) again.The change of row 1 will affect rows 3 and 7.The effect will propagate in the matrix, as row 3 will affect rows 5 and 7, and row 5 will affect rows 6 and 7. Due to sparseness of stiffness matrix, the number of nonzero entries in each row is finite.As a result, change of one row is able to affect only part of the matrix.In this case, change of row 1 will eventually affect rows 3, 5, 6, and 7, whereas rows 2 and 4 remain unchanged.
In practice, fill-ins' reducing, which minimizes the number of fill-ins (additional nonzero entries produced by factorization) through equation reordering, is always required before numerical factorization.A typical optimized result delivered by the divide and conquer graph partition algorithm is as follows: ) .(7) In this case, if stiffness of group 2 is changed, only groups 2, 3, and 7 are necessary to be recalculated; similarly, if stiffness of group 6 is changed, only groups 6 and 7 are affected.Now it is concluded that if the stiffness matrix changes locally, based on fill-ins' reducing, only part of values in L Step 1. Input original stiffness matrix K, upper triangular matrix U and diagonal matrix D; Step 2. Create array CHANGE() with the initial value zero, where  is the number of equations.CHANGE() = 0 indicates row  is not changed, while CHANGE() = 1 indicates row  is changed; Step 3. Input the change of each element stiffness matrix and assemble them to original total stiffness matrix.Then let CHANGE() be 1 if row  is changed; Step 4. According to Rule, spread the effect of changes over the whole matrix and mark all the changed rows: Step 5. Assemble matrix to the original factor.If CHANGE() = 0, fill in row  with original factorization result; if CHANGE() = 1, fill in row  with entries of new stiffness matrix; Step 6. Numerically factorize the matrix, only recalculating entries in changed rows (CHANGE() = 1) according to (5).Only changed row is added to the elimination tree, in implementation being represented by the linked list.
Step 7. The solution procedure after factorization stays the same.and D need to be recalculated.The cost of computation is predictably reduced, and the precision is unaffected.

Algorithm Design for Local Change of Sparse Matrix
Algorithm for refactorization involves sparse storage scheme, which takes extremely advantage of sparseness of numerical part.However, index can be compressed furthermore, and indirect addressing of data access can be reduced substantially [5].Row-sparse factorization [21] and its improvement [18] are the foundation of present sparse solution, which consist of four steps: symbolic assembly, symbolic factorization, numerical assembly, and numerical factorization.Instead of the elimination tree, an updated linked list is used in factorization.
In this refactorization algorithm, structural topology is unchanged, so symbolic assembly and symbolic factorizationinherit the original results.Thus, three steps need to be redesigned.The detailed steps of this algorithm are shown as Algorithm 2.
For further illustration steps 4 and 6 are refined as follows in Algorithms 3 and 4.
In step 6 (Algorithm 4), only parts of factor are recalculated for elimination, and as a result, computation efficiency compared to full refactorization is improved.

Numerical Examples and Analysis
Two building structures are selected to show the efficiency of proposed algorithm, in size of 321,210 and 442,331 equations or DOFs (degrees of freedom), in which up to 5000 DOFs are changed.Both numerical examples are tested on Windows 7 system with Intel Xeon CPU E5-2620 (2.00 GHz, 2 × 6 cores, but only one core used) and 32 GB memory.The codes have not integrated in nonlinear solver, and therefore only refactorization part is presented.Moreover, only computational effort and elapsed time of factorization are discussed, since the algorithm proposed in this paper involves no approximation and the solution is as accurate as a full recalculation.
Example 1.A multitower building, as shown in Figure 1(a), is calculated.The number of DOFs or equations is 321,210; the number of nonzero entries in stiffness matrix is 17,169,579, and that in factor is 62,281,728.Consider that nonlinear phenomenon occurs between two adjacent iteration steps.For example, part of steel structures reaches plastic phase.For simplicity, in each test one group's element stiffness is changed to 90% of standard value, and one group here normally consists of a certain number (from 1 to 2000) of well-connected elements in one floor which means a local change.
The performance of the proposed algorithm is shown in Table 1.The first row of data shows the result of full refactorization, a violent method to factorize the new matrix, introduced as reference of comparison.The rest of the rows show the result of partial refactorization using the proposed algorithm.For each case (change of certain number of elements) about 30 samples are selected randomly.For statistical purpose, minimum, maximum, and average affected DOFs, (million floating-point operations), MFLOP and computing time are given.The percentage next to average values is the ratio between this item and corresponding value of full refactorization.
The numerical results indicate that the local change, up to 2% DOFs in this example, will only spread to a little bit large percentage of DOFs, up to 10% here.The elapsed time or computational effort is usually less than half of full recalculation, which demonstrates that this method is more efficient than full recalculation.The MFLOP ratio of this algorithm and full recalculation is always much larger than the ratio of affected DOFs and total DOFs, because reduction of some DOFs or equations refers a large number of DOFs, both changed and unchanged.Also, it is shown that the MFLOP ratio is coincident with elapsed time ratio.
Comparing the cases of different numbers of changed elements, it is concluded that computational effort of refactorization depends slightly on the number of modified/affected DOFs and the relation is roughly monotonic but not necessarily linear.In this numerical example, changing less than 200 elements, the computational efforts of refactorization are almost the same, about 15%, and even changing 500 or more elements, the computational effort does not increase much.In fact, in the worst case, changing all DOFs, it degrades to full refactorization.
Example 2. A two-tower building, as shown in Figure 1 The results demonstrate again that the proposed algorithm is efficient and factorization time decreases.Furthermore, in each case of these two examples, maximum (affected DOFs, MFLOP, and computing time) is usually less than twice the average, which indicates that the algorithm is robust.
Summarizing the results of numerical tests yields qualitatively the relationship between computational effort ratio and ratio of changed DOFs in Figure 2. It shows clearly that the algorithm proposed in this paper is efficient most of the time.

Conclusions
In this paper, a partial refactorization algorithm based on row-sparse solution and fill-in's reducing is proposed for nonlinear finite element analysis, especially material nonlinear problems and optimization problems.Instead of full refactorization in traditional nonlinear analysis, the proposed procedure finds the changed factor of the tangent matrix with much lower cost.
It is concluded that this algorithm can significantly improve the efficiency compared with full refactorization.There are several advantages as follows.
(a) The algorithm does not affect the precision of final results, since it involves no approximation, just skipping repetitive computation.
(b) A large number of elements or DOFs can be changed.
(c) The amplitude of change is not limited to being small, as usually being required by approximate approaches.
(d) Only one additional array CHANGE is required to perform the computation.
(e) The implementation is simple, if one understands row-sparse solution procedure.
We expect the proposed algorithm could be integrated in nonlinear analysis, especially Newton-Raphson method.It can significantly reduce the refactorization time and make Newton-Raphson method more efficient.In other words, it opens a new possibility for faster nonlinear analysis.The proposed algorithm can also be applied to many engineering problems, in which a series of linear systems of equations with step-by-step local change has to be solved, including structural optimization, progressive collapse analysis, and analysis of seepage.

( a )
Modification analysis: determines which rows will be changed in upper triangular matrix U. (b) Numerical assembly: reassembles entries of matrix in rows which are changed.(c) Numerical factorization: modifies rows of upper triangular matrix U and recalculates the value of corresponding entries.

Figure 1 :
Figure 1: Schematic diagrams of two numerical examples.

Table 1 :
Statistic results of refactorization of Example 1.

Table 2 :
Statistic results of refactorization of Example 2.

Table 2 .
For each case about 100 samples are selected randomly.