12.4 Linear Equation Solver
Selection
LINEAR_SOLVER
LINEAR_SOLVER solver_name = list , etc...
Specify the type of linear equation solver to use for all equations belonging to any given solution stagger.
|
|
|
|
|
(1) |
Solver_type |
list |
[direct] |
Solver type |
|
direct / iterative |
|
|
|
|
|
|
|
|
(2) |
Symmetric_matrix |
list |
[on] |
Symmetric matrix option |
|
on / off |
|
|
|
|
|
|
|
|
|
Non_symmetric_matrix |
list |
[off] |
Nonsymmetric matrix option |
|
on / off |
|
|
|
|
|
|
|
|
|
Solver_name |
list |
[*] |
Linear equation solution procedure |
|
|
|
|
(see Table 1) |
|
|
|
|
|
|
• Direct Solvers |
|
|
|
|
||||
Symmetric Solvers |
||||
|
|
|
|
|
|
Crout_column |
|
|
Crout column solver |
|
Crout_block |
|
|
Crout block solver |
|
Frontal |
|
|
Frontal solver |
|
Cholesky_1_sparse |
|
|
Sparse Cholesky with minimum degree |
|
|
|
|
ordering. |
|
Cholesky_2_sparse |
|
|
Sparse Cholesky with general quotient |
|
|
|
|
minimum degree ordering. |
|
Cholesky_3_sparse |
|
|
Sparse Cholesky with nested direction |
|
|
|
|
ordering. |
|
|
|
|
|
Nonsymmetric Solvers |
||||
|
|
|
|
|
|
Crout_column |
|
|
Crout column solver |
|
|
|
|
|
(cont'd)
(cont'd)
• Iterative Solvers
Symmetric
Solvers
CG_EBE Conjugate gradient
CG_Crout_EBE Conjugate gradient (Crout preconditioner)
CG_Crout_GS_EBE Conjugate gradient (Crout
Gauss-Seidel
preconditioner)
CG_LU_EBE Conjugate gradient (LU preconditioner)
CG_LU_GS_EBE Conjugate gradient (LU
Gauss-Seidel
preconditioner)
CG_sparse Conjugate gradient
CG_IC_sparse Conjugate gradient (incomplete Cholesky)
CG_Mf Conjugate gradient
CG_Shanno_Mf Shanno conjugate gradient
CG_Beale_Shanno_Mf Shanno CG with Beale's restart
Nonsymmetric
Solvers
GMRES_EBE GMRES
GMRES_LU_EBE GMRES (LU preconditioner)
GMRES_LU_GS_EBE GMRES (LU Gauss-Seidel preconditioner)
GMRES_sparse GMRES
GMRES_ILU_sparse GMRES (incomplete LU)
GMRES_Mf GMRES
BCG_EBE Biconjugate gradient
BCG_sparse Biconjugate gradient
BCG_ILU_sparse Biconjugate gradient (incomplete LU)
BCG_Mf Biconjugate gradient
BCGS_sparse Squared biconjugate gradient
BCGS_ILU_sparse Squared biconjugate gradient (incomplete LU)
Jacobi_sparse Jacobi iterations
Gauss_Seidel_sparse Gauss-Seidel iterations
ILU_sparse Incomplete LU iterations
Broyden_Mf Broyden iterations
• MPI Solvers
MPI_Cholesky Direct Cholesky solver
MPI_FGMRES Iterative GMRES solver
(cont'd)
(cont'd)
|
|
|
|
|
• Direct Profile
Solver Options |
||||
|
Profile_minimizer |
list |
[on] |
Equation numbering profile minimizer |
|
on / off |
|
|
|
|
|
|
|
|
• Iterative Solver
Options |
||||
|
Preconditioner_type |
list |
[*] |
Type of global preconditioner to use |
|
diagonal |
|
|
Nodal diagonal scaling |
|
block_diagonal |
|
|
Nodal block-diagonal scaling |
|
|
|
|
|
• Conjugate Gradient Solver Options |
||||
|
Max_cg_iterations |
integer |
[Neq] |
Maximum number of iterations |
|
ILU_level |
integer |
[25] |
Incomplete LU factorization level |
|
Convergence_tol_cg |
real |
[1.E-8] |
Convergence tolerance |
|
|
|
|
|
• GMRES Solver
Options |
||||
|
Max_gmres_iterations |
integer |
[Neq] |
Maximum number of iterations |
|
Max_gmres_cycles |
integer |
[10] |
Maximum number of Krylov subspace restart |
|
Num_gmres_iterations |
integer |
[20] |
Dimension of Krylov subspace |
|
Convergence_tol_gmres |
real |
[1.E-8] |
Convergence tolerance |
|
|
|
|
|
• MPI Iterative Solver Options |
||||
|
Preconditioner_type |
list |
[*] |
Type of global preconditioner |
|
add_schwarz |
|
|
|
|
|
|
|
|
|
right_schur |
|
|
|
|
multic_schwarz |
|
|
|
|
|
|
|
|
EXAMPLE
Linear_Solver /
Solver_name = CG_Crout_EBE / # Conjugate gradient solver with Crout
# Element by element preconditioner request
Symmetric_matrix = on / # Symmetric matrix
Preconditionner_type = diagonal / # Diagonal preconditionning
Convergence_tol = 1.e-6 # Specified convergence tolerance for CG
iterations
Table 12.4. Linear Solvers
0 Crout_column direct 0, 1 profile Crout column solver
1 Crout_block direct 0 out_of_core Crout block solver
29 Cholesky_1_sparse direct 0 sparse Sparse Cholesky with
multiple
minimum
degree
ordering.
30 Cholesky_2_sparse direct 0 sparse Sparse Cholesky with
general
quotient
minimum
degree
ordering.
31 Cholesky_3_sparse direct 0 sparse Sparse Cholesky with
nested
direction
ordering.
3 CG_EBE iterative 0 element_by_element Conjugate gradient
4 CG_Crout_EBE iterative 0 element_by_element Conjugate gradient
(Crout
preconditioner)
5 CG_Crout_GS_EBE iterative 0 element_by_element Conjugate gradient
(Crout
Gauss-Seidel
preconditioner)
6 CG_LU_EBE iterative 0 element_by_element Conjugate gradient
(LU
preconditioner)
7 CG_LU_GS_EBE iterative 0 element_by_element Conjugate gradient
(LU
Gauss-Seidel
preconditioner)
8 GMRES_EBE iterative 0, 1 element_by_element GMRES
9 GMRES_LU_EBE iterative 0, 1 element_by_element GMRES
(LU
preconditioner)
10 GMRES_LU_GS_EBE iterative 0, 1 element_by_element GMRES
(LU
Gauss-Seidel
preconditioner)
11 BCG_EBE iterative 0, 1 element_by_element Biconjugate gradient
(cont'd)
Table 12.4. Linear Solvers (cont'd)
12 CG_Mf iterative 0 matrix_free Conjugate gradient
13 CG_Shanno_Mf iterative 0 matrix_free Shanno conjugate
gradient
14 CG_Beale_Shanno_Mf iterative 0 matrix_free Shanno CG with
Beale's
restart
15 BCG_Mf iterative 0, 1 matrix_free Biconjugate gradient
16 GMRES_Mf iterative 0, 1 matrix_free GMRES
17 Broyden_Mf iterative 0, 1 matrix_free Broyden iterations
18 CG_sparse iterative 0 sparse Conjugate gradient
19 CG_IC_sparse iterative 0 sparse Conjugate gradient
(incomplete
Cholesky)
20 BCG_sparse iterative 0, 1 sparse Biconjugate gradient
21 BCG_ILU_sparse iterative 0, 1 sparse Biconjugate gradient
(incomplete
LU)
22 BCGS_sparse iterative 0, 1 sparse Squared biconjugate
gradient
23 BCGS_ILU_sparse iterative 0, 1 sparse Squared biconjugate
gradient
(incomplete
LU)
24 GMRES_sparse iterative 0,
1 sparse GMRES
25 GMRES_ILU_sparse iterative 0, 1 sparse GMRES
(incomplete
LU)
26 Jacobi_sparse iterative 0, 1 sparse Jacobi iterations
27 Gauss_Seidel_sparse iterative 0, 1 sparse Gauss-Seidel iterations
28 ILU_sparse iterative 0, 1 sparse Incomplete LU
iterations
Notes /
* Isymm = 0 Solver applicable to symmetric matrices
Isymm = 1 Solver applicable to nonsymmetric matrices
Notes / (cont’d)
(1) The application of the finite element discretization to the governing equation(s) of a field theory generates a matrix system of equations. For instance, nonlinear transient finite element dynamics are characterized by the following second-order semi-discrete balance equation:
Ma
+ N(v, d) = f (1)
where M is the mass matrix, N is the vector of internal forces and f = f(t) is the time-dependent external force vector. In Eq. 1, d, v, and a are the vectors of nodal displacements, velocities and accelerations, respectively. Eq. 1 is solved by applying a step-by-step time integration procedure resulting in a system of nonlinear algebraic equations. The solution of this system is obtained by using Newton-like iterations or related schemes. At the heart of this iteration is a set of linear equations
Ax = b (2)
where x is the correction to the approximate solution vector in the nonlinear iteration loop. The matrix A is definite and sparse (but not necessarily always symmetric), and for the case of Eq. 1 is obtained as
(3)
where are algorithmic parameters [see Section 12.3], and [see Section 12.4]
(4)
DYNAFLOW provides the capability of solving this system (Eq. 2) by either direct or iterative methods. Direct methods include:
• Profile in-core Crout
• Profile out-of-core Crout block
• Frontal in-core / out-of-core
• Explicit
and iterative methods include:
• Conjugate Gradient (CG)
• Generalized Minimum Residual (GMRES)
Notes (cont’d)
Direct solvers have the advantage of being relatively simple and robust; solutions are obtained in a predictable, finite number of operations. However, in large three-dimensional
Notes / (cont’d)
simulations, storage requirements and the number of floating-point operations can become prohibitive.
These problems can be alleviated by using iterative solvers. However, iterative solvers are not necessarily robust, and preconditioning matrices are often needed to counteract problems of ill-conditioning.
Direct Methods:
Direct methods are well established and documented in standard tests (see e.g. [1, 4, 6]). In order to generate matrices with small bandwidth and profile, a profile minimizer [5, 9] option is available.
Iterative Methods:
The convergence of both the Conjugate Gradient and GMRES algorithms is greatly influenced by the matrix condition number (A). In order to reduce this sensitivity, it is useful to precondition the original system by using a symmetric, positive-definite matrix. In DYNAFLOW, there are two levels of preconditioning available: (1) a global preconditioner and (2) a local preconditioner (for example, an element-by element preconditioner.) These are discussed hereafter.
1. Global Preconditioner
Two global preconditioners are available as follows:
The first preconditioner available is the nodal diagonal scaling matrix W1 defined as:
(5)
and we rewrite Eq. 2 in a scaled form:
(6)
Notes (cont’d)
where
(7)
(8)
(9)
with this definition of W1 , diag becomes an identity matrix.
The second preconditioner available is the nodal
block-diagonal scaling matrix W2
defined as:
block_diagonal (A) (10)
where the operator "block_diagonal" assigns to W2 the nodal block-diagonal matrix consisting of the mxm nodal diagonal blocks of A where m = number of equations per node (m Ndof). Nodal diagonal blocks of A are symmetric and positive definite. Therefore, W2 possess a well-defined Cholesky factorization:
W2 = UTU (11)
and we rewrite Eq. 2 in a scaled form:
(12)
where:
(13)
(14)
(15)
With this definition of W2, block_diagonal becomes an identity matrix. Note that if A is symmetric and positive-definite, inherits these properties. An important attribute of block-diagonal scaling is its ability to nondimensionalize the original system of equations. This is not necessary if the system is dimensionally homogeneous. However, when this is not the case (for example, when using structural elements with both translational and rotational degrees of freedom), the 2-norm of the residual is not well-defined. To ensure against this, A can be scaled
Notes (cont’d)
at the element level. That is, each element matrix is replaced with, where U is the Cholesky factor of W2.
Remark: The nodal diagonal scaling or nodal block-diagonal scaling is embedded into the solution algorithm before entering either the preconditioned Conjugate Gradient or the preconditioned GMRES iterative process. This requires forming before entering either process and subsequently recovering after the iterative solution has been obtained. Since nodal diagonal or nodal block-diagonal scaling is always used, this global preconditioniong is often called pre-preconditioning or simply scaling, to distinguish it from the (optional) preconditioning used in the Conjugate Gradient or GMRES iterations. Also, in the following to simplify the notation the tilde is dropped on A, b and x, and the scaled system is written hereafter as:
Ax = b (16)
2. Conjugate Gradient
The Conjugate Gradient method [7] is restricted to linear systems Ax = b for which A is symmetric positive-definite. This is typically the case in solid mechanics problems. Solving such systems can be viewed as minimizing the associated potential function
(17)
over all , where N = number of equations (A is an NxN matrix.) An iterative strategy that begins with an initial vector x0 and computes a series of approximate solutions {x1, x2,...} which progressively reduce , is used. The essence of the Conjugate Gradient Method consists of obtaining a new vector xi+1 from xi along a direction pi at a distance i:
(18)
The direction vectors pi are mutually conjugate and satisfy the relationships:
(19)
and
(20)
The step length is fixed by the condition , while the parameter is determined by the A-conjugate property of the search direction pi. The algorithm is summarized as follows:
Notes (cont’d)
Given:
scaled linear system: Ax = b
preconditioner: B
symmetric
positive-definite: A and B
Initialize:
Iterate:
do i = 0, 1, 2, ..., imax
Check convergence: if return
enddo
In the above, = convergence_tol is a user-specified convergence tolerance, and B a symmetric positive-definite conditioner matrix. Similar to scaling, this second preconditioning scheme also employs a symmetric positive-definite conditioner matrix B, which is selected such that (1) B significantly contracts the spectrum of A, i.e., ; and (2) solving the linear system Bz = r is relatively inexpensive. Of course, using B = A would yield convergence to the
Notes (cont’d)
exact solution without any iteration. But this is not considered a viable choice due to its associated computational cost. Also, using B = I = identity matrix one recovers the
unpreconditioned Conjugate Gradient method. The various options for defining B are described later in this section.
3. GMRES
The Generalized Minimal Residual (GMRES) method of Saad and Schultz [10] can be applied to any linear system Ax = b, but is usually reserved for those systems in which A is non-symmetric. The essence of the method is to obtain a trial solution vector x0 + z that minimizes the 2‑norm of the residual r = b - A(x0 + z); that is to find
min (21) |
where x0 is an initial guess, z is in the k -dimensional Krylov space K:
span {r0, Ar0,...., Ak-1r0} (22)
and r0 = b - Ax0. The algorithm employs an orthonormal basis of K which is obtained by the modified Gram-Schmidt procedure, and the minimization is performed using the Q-R algorithm [9]. An excellent general reference for the numerical algebraic procedures employed is [6]. Convergence of the GMRES method is also influenced by the condition number of the coefficient matrix A. Condition improvements are accomplished through the use of an appropriate preconditioning matrix B = LU = a positive-definite matrix, where L and U are referred to as the left and right preconditioning matrices, respectively. The preconditioned linear system then takes the form:
(L-1AU-1)(Ux) = (L-1b) (23)
It should be emphasized that the matrix (L-1AU-1) is never formed nor stored as a single matrix. Rather, L-1 and U-1 are viewed as representing operations carried out during the GMRES solution process. The algorithm is summarized as follows:
Notes (cont’d)
1. Given:
scaled linear system: Ax = b
preconditioner: B = LU
2. Initialize:
x = 0 b = L-1b r = b
3. GMRES cycles:
do l = 0, 1, 2, ..., lmax (GMRES cycles loop)
u1
= b - L-1AU-1x
do i = 0, 1, 2, ..., kmax (GMRES iterations loop)
ui+1 = L-1AU-1ui
• do j = 1,..., i (Modified Gram-Schmidt Orthogonalization)
•
•
• Update Q-R factorization of
• Check convergence:
If exit GMRES iteration loop i
enddo on i (End GMRES iterations loop)
Solve: Hi y = e (Using Factorized Hi)
Update solution:
Check convergence:
If exit GMRES cycles loop l
enddo on l (End GMRES cycles loop)
4.
Recover scaled system solution vector: x = U-1x
The GMRES iteration loop consists of formation of an orthogonal basis for the Krylov space by the modified Gram-Schmidt procedure; triangularization of the Hessenberg matrix Hi by the Q-R algortihm, and backsubstitution. The maximum number of GMRES cycles allowed is lmax = max_gmres_cycles, a user-specified value. The dimension of the Krylov space k = num_gmres_iterations, is a user-specified value, as is the convergence tolerance convergence_tol.
4. Implementation Issues
Central to both CG and GMRES methods is the computation of the matrix-vector product Ap of the global matrix A by a vector p, at every iteration. A naive way of performing Ap-products would be to initially compute and store all the entries in the global matrix A and as needed, compute explicitly the product for each vector p. This obviously would entail extremely large costs both with regard to computation and storage requirements. However, more efficient methods for handling Ap-products that alleviate these problems are available. DYNAFLOW provides two such methods: (1) element-by-element (EBE) storage of A; and (2) direct matrix-free computations of the Ap-product. In the first option, A is stored as a series of unassembled element matrices, whereas in the second option, A is never formed as discussed hereafter.
Element-by-Element (EBE) Ap-product: The method is due to Fox and Stanton [2] and Fried [3]. The matrix A is stored in terms of unassembled element matrices Ae, and the Ap-product is computed via the decomposition:
(24)
Notes (cont’d)
where Nel = number of elements. Let Nee = number of element equations (Nee = Ned x Nen, where Ned = number of element degrees of freedom; Nen = number of nodes per element), then the data structure requires:
for symmetric Ae: Nee x (Nee+1)/2 words
for nonsymmetric Ae: Nee x Nee words
and the calculation in Eq. 24 requires at most Nel x Nee2 floating-point operations (half for symmetric matrices.)
Matrix-free Ap-product: In the matrix-free method, evaluation and storage of the components of A are avoided altogether, and an explicit forward differencing scheme is used to approximate the product. This is possible because A is a Jacobian-like matrix [see Section 12.3], thus a residual-like vector r can always be defined such that:
(25)
where u is the solution vector, and a "small" number related to = computing machine precision. The vector r(u + p) is sometimes referred to as the perturbed residual. Criteria for determining an optimum value for have been implemented in DYNAFLOW (see Johan [8] for details.)
5. Preconditioning
In the preconditioned Conjugate Gradient algorithm, the preconditioning matrix B is directly involved in the solution of the system Bz = r. In the GMRES algorithm, the left and right preconditioning matrices L and U obtained from B = LU are involved in the computation of L-1AU-1p terms. Two types of preconditioning are available in DYNAFLOW: (1) element-by
element (EBE) preconditioners; and (2) global incomplete factorization-based ILU preconditioners.
Element-by-Element (EBE) Preconditioning: The terminology refers to the fact that the preconditioners are constructed from the individual element matrices Ae (recall that Ae is the eth element's scaled contribution to the scaled global matrix A). The choice of element-by-element preconditioners is motivated by the element-based local data structure inherent to finite element codes. Two kinds of EBE preconditioners are available: (1) factorized, and (2) Gauss-Seidel. Both preconditioners are product decomposition approximations of the global matrix A. Since the matrices Ae are usually rank-deficient, some regularization of Ae is needed to remove the
zero-eigenvalues. This is accomplished via Diagonal or Winget [12] regularization, where the regularized matrix is computed as:
(26)
The EBE preconditioners are constructed from the regularized element matrices . Details can be found in [11, 12] for both symmetric and nonsymmetric matrices.
Global Incomplete Factorization-Based (ILU) Preconditioning: An alternative preconditioning method is based on an incomplete factorization of A. The goal is to obtain a reasonably accurate factorization of A without having to form the full A or generating too many fill‑ins. In the current implementation, an incomplete factorization by "position" has been adopted, in which the user may select the maximum height (user-specified ILU_level) of the columns used to store the incomplete factors of A. Then for ILU_level = 0 only the diagonal of A is used to form B, whereas for ILU_level > 0 terms in the j column of A from row i = j (diagonal) up to i = j - ILU_level are used to form B.
(2) For certain problems, a nonsymmetric matrix may be required. See element data in Chapter 9 and material data in Chapter 10 for details.
References /
Bibliography
1. Duff, I.S., A.M. Erisman and J.K. Reed, Direct Methods for Sparse Matrices, Oxford Science Publications, Clarendon Press, Oxford, England, (1989).
2. Fox, R.L. and E.L. Stanton, "Developments in Structural Analysis by Direct Energy Minimization," AIAA Journal, 6, (1968), 1036-1042.
3. Fried,
4. George, J.A. and J.W-H. Liu, Computer Solution of Large Sparse Positive Definite Systems, Prentice Hall, (1981).
5. Gibbs, N.E., W.G. Poole Jr., and P.K.
Stockmeyer, "An Algorithm for Reducing the Bandwidth and Profile of a
Sparse Matrix,"
6. Golub, G.H. and C.F. VanLoan, Matrix Computations, Johns Hopkins
University Press,
7. Hestenes, M.R. and
8. Johan, Z., "Data Parallel Finite
Element Techniques for Large Scale Computational Fluid Dynamics," Ph.D.
Thesis,
9. Lewis, J.G., "Implementation of the Gibbs-Poole-Stockmeyer and Gibbs-King Algorithms," ACM Trans. Math. Softw., 8, 2, (1982), 180-189.
10. Saad, Y. and M.H. Schultz, "GMRES: A
generalized minimum residual algorithm for solving nonsymmetric linear
systems,"
11. Shakib, F., T.J.R. Hughes and Z. Johan, "A Multi-Element Group Preconditioned GMRES Algorithm for Nonsymmetric Systems arising in Finite Element Analysis," Comp. Meth. Appl. Mech. Eng., 75, (1989), 415-456.
12. Shanno, D.F., "On the Convergence of a
New Conjugate Gradient Algorithm",
13. Shanno, D.F., "Conjugate Gradient Methods with Inexact Searches", Mathematics of Operations Research, 3, No. 3, (1978), 244–256.
14. Winget, J.M. and T.J.R. Hughes, "Solution Algorithms for Nonlinear Transient Heat Conduction Analysis Employing Element-by-Element Strategies," Comp. Meth. Appl. Mech. Eng., 52, (1985), 711-815.
Notes . .
Notes . .