Dynaflow Chapter 12.4

12.4 Linear Equation Solver Selection

LINEAR_SOLVER

LINEAR_SOLVER solver_name = list , etc...

Specify the type of linear equation solver to use for all equations belonging to any given solution stagger.

Note	Variable Name	Type	Default	Description

(1)	Solver_type	list	[direct]	Solver type
	direct / iterative

(2)	Symmetric_matrix	list	[on]	Symmetric matrix option
	on / off

	Non_symmetric_matrix	list	[off]	Nonsymmetric matrix option
	on / off

	Solver_name	list	[*]	Linear equation solution procedure
				(see Table 1)

	• Direct Solvers

Symmetric Solvers

	Crout_column			Crout column solver
	Crout_block			Crout block solver
	Frontal			Frontal solver
	Cholesky_1_sparse			Sparse Cholesky with minimum degree
				ordering.
	Cholesky_2_sparse			Sparse Cholesky with general quotient
				minimum degree ordering.
	Cholesky_3_sparse			Sparse Cholesky with nested direction
				ordering.

Nonsymmetric Solvers

	Crout_column			Crout column solver

(cont'd)

Note Variable Name Type Default Description

• Iterative Solvers

Symmetric Solvers

CG_EBE Conjugate gradient

CG_Crout_EBE Conjugate gradient (Crout preconditioner)

CG_Crout_GS_EBE Conjugate gradient (Crout Gauss-Seidel
preconditioner)

CG_LU_EBE Conjugate gradient (LU preconditioner)

CG_LU_GS_EBE Conjugate gradient (LU Gauss-Seidel
preconditioner)

CG_sparse Conjugate gradient

CG_IC_sparse Conjugate gradient (incomplete Cholesky)

CG_Mf Conjugate gradient

CG_Shanno_Mf Shanno conjugate gradient

CG_Beale_Shanno_Mf Shanno CG with Beale's restart

Nonsymmetric Solvers

GMRES_EBE GMRES

GMRES_LU_EBE GMRES (LU preconditioner)

GMRES_LU_GS_EBE GMRES (LU Gauss-Seidel preconditioner)

GMRES_sparse GMRES

GMRES_ILU_sparse GMRES (incomplete LU)

GMRES_Mf GMRES

BCG_EBE Biconjugate gradient

BCG_sparse Biconjugate gradient

BCG_ILU_sparse Biconjugate gradient (incomplete LU)

BCG_Mf Biconjugate gradient

BCGS_sparse Squared biconjugate gradient

BCGS_ILU_sparse Squared biconjugate gradient (incomplete LU)

Jacobi_sparse Jacobi iterations

Gauss_Seidel_sparse Gauss-Seidel iterations

ILU_sparse Incomplete LU iterations

Broyden_Mf Broyden iterations

• MPI Solvers

MPI_Cholesky Direct Cholesky solver

MPI_FGMRES Iterative GMRES solver

(cont'd)

Note	Variable Name	Type	Default	Description

• Direct Profile Solver Options
	Profile_minimizer	list	[on]	Equation numbering profile minimizer
	on / off

• Iterative Solver Options
	Preconditioner_type	list	[*]	Type of global preconditioner to use
	diagonal			Nodal diagonal scaling
	block_diagonal			Nodal block-diagonal scaling

• Conjugate Gradient Solver Options
	Max_cg_iterations	integer	[Neq]	Maximum number of iterations
	ILU_level	integer	[25]	Incomplete LU factorization level
	Convergence_tol_cg	real	[1.E-8]	Convergence tolerance

• GMRES Solver Options
	Max_gmres_iterations	integer	[Neq]	Maximum number of iterations
	Max_gmres_cycles	integer	[10]	Maximum number of Krylov subspace restart
	Num_gmres_iterations	integer	[20]	Dimension of Krylov subspace
	Convergence_tol_gmres	real	[1.E-8]	Convergence tolerance

• MPI Iterative Solver Options
	Preconditioner_type	list	[*]	Type of global preconditioner
	add_schwarz
	left_schur
	right_schur
	multic_schwarz

EXAMPLE

Linear_Solver /

Solver_name = CG_Crout_EBE / # Conjugate gradient solver with Crout

# Element by element preconditioner request

Symmetric_matrix = on / # Symmetric matrix

Preconditionner_type = diagonal / # Diagonal preconditionning

Convergence_tol = 1.e-6 # Specified convergence tolerance for CG

iterations

Table 12.4. Linear Solvers

Isolv Solver_name Type Isymm* Lhs_storage Description

0 Crout_column direct 0, 1 profile Crout column solver

1 Crout_block direct 0 out_of_core Crout block solver

2 Frontal direct 0 out_of_core Frontal solver

29     Cholesky_1_sparse           direct         0                 sparse                Sparse Cholesky with
                                                                                                                       multiple minimum
                                                                                                                       degree ordering.

30     Cholesky_2_sparse           direct         0                 sparse                Sparse Cholesky with
                                                                                                                       general quotient
                                                                                                                       minimum degree
                                                                                                                       ordering.

31     Cholesky_3_sparse           direct         0                 sparse                Sparse Cholesky with
                                                                                                                       nested direction
                                                                                                                       ordering.

3 CG_EBE iterative 0 element_by_element Conjugate gradient

4      CG_Crout_EBE             iterative        0     element_by_element     Conjugate gradient
                                                                                                                       (Crout
                                                                                                                       preconditioner)

5      CG_Crout_GS_EBE      iterative        0     element_by_element     Conjugate gradient
                                                                                                                       (Crout Gauss-Seidel
                                                                                                                       preconditioner)

6 CG_LU_EBE iterative 0 element_by_element Conjugate gradient
(LU preconditioner)

7      CG_LU_GS_EBE          iterative        0     element_by_element     Conjugate gradient
                                                                                                                       (LU Gauss-Seidel
                                                                                                                       preconditioner)

8 GMRES_EBE iterative 0, 1 element_by_element GMRES

9 GMRES_LU_EBE iterative 0, 1 element_by_element GMRES
(LU preconditioner)

10     GMRES_LU_GS_EBE iterative      0, 1   element_by_element     GMRES
                                                                                                                       (LU Gauss-Seidel
                                                                                                                       preconditioner)

11 BCG_EBE iterative 0, 1 element_by_element Biconjugate gradient

(cont'd)

Table 12.4. Linear Solvers (cont'd)

Isolv Solver_name Type Isymm* Lhs_storage Description

12 CG_Mf iterative 0 matrix_free Conjugate gradient

13 CG_Shanno_Mf iterative 0 matrix_free Shanno conjugate
gradient

14 CG_Beale_Shanno_Mf iterative 0 matrix_free Shanno CG with
Beale's restart

15 BCG_Mf iterative 0, 1 matrix_free Biconjugate gradient

16 GMRES_Mf iterative 0, 1 matrix_free GMRES

17 Broyden_Mf iterative 0, 1 matrix_free Broyden iterations

18 CG_sparse iterative 0 sparse Conjugate gradient

19     CG_IC_sparse                iterative        0                 sparse                Conjugate gradient
                                                                                                                       (incomplete
                                                                                                                       Cholesky)

20 BCG_sparse iterative 0, 1 sparse Biconjugate gradient

21 BCG_ILU_sparse iterative 0, 1 sparse Biconjugate gradient
(incomplete LU)

22 BCGS_sparse iterative 0, 1 sparse Squared biconjugate
gradient

23     BCGS_ILU_sparse         iterative      0, 1               sparse                Squared biconjugate
                                                                                                                       gradient
                                                                                                                       (incomplete LU)

24 GMRES_sparse iterative 0, 1 sparse GMRES

25 GMRES_ILU_sparse iterative 0, 1 sparse GMRES
(incomplete LU)

26 Jacobi_sparse iterative 0, 1 sparse Jacobi iterations

27 Gauss_Seidel_sparse iterative 0, 1 sparse Gauss-Seidel iterations

28 ILU_sparse iterative 0, 1 sparse Incomplete LU
iterations

Notes /

* Isymm = 0 Solver applicable to symmetric matrices

Isymm = 1 Solver applicable to nonsymmetric matrices

Notes / (cont’d)

(1) The application of the finite element discretization to the governing equation(s) of a field theory generates a matrix system of equations. For instance, nonlinear transient finite element dynamics are characterized by the following second-order semi-discrete balance equation:

Ma + N(v, d) = f (1)

where M is the mass matrix, N is the vector of internal forces and f = f(t) is the time-dependent external force vector. In Eq. 1, d, v, and a are the vectors of nodal displacements, velocities and accelerations, respectively. Eq. 1 is solved by applying a step-by-step time integration procedure resulting in a system of nonlinear algebraic equations. The solution of this system is obtained by using Newton-like iterations or related schemes. At the heart of this iteration is a set of linear equations

Ax = b (2)

where x is the correction to the approximate solution vector in the nonlinear iteration loop. The matrix A is definite and sparse (but not necessarily always symmetric), and for the case of Eq. 1 is obtained as

(3)

where are algorithmic parameters [see Section 12.3], and [see Section 12.4]

(4)

DYNAFLOW provides the capability of solving this system (Eq. 2) by either direct or iterative methods. Direct methods include:

• Profile in-core Crout

• Profile out-of-core Crout block

• Frontal in-core / out-of-core

• Explicit

and iterative methods include:

• Conjugate Gradient (CG)

• Generalized Minimum Residual (GMRES)

Notes (cont’d)

Direct solvers have the advantage of being relatively simple and robust; solutions are obtained in a predictable, finite number of operations. However, in large three-dimensional

Notes / (cont’d)

simulations, storage requirements and the number of floating-point operations can become prohibitive.

These problems can be alleviated by using iterative solvers. However, iterative solvers are not necessarily robust, and preconditioning matrices are often needed to counteract problems of ill-conditioning.

Direct Methods:

Direct methods are well established and documented in standard tests (see e.g. [1, 4, 6]). In order to generate matrices with small bandwidth and profile, a profile minimizer [5, 9] option is available.

Iterative Methods:

The convergence of both the Conjugate Gradient and GMRES algorithms is greatly influenced by the matrix condition number (A). In order to reduce this sensitivity, it is useful to precondition the original system by using a symmetric, positive-definite matrix. In DYNAFLOW, there are two levels of preconditioning available: (1) a global preconditioner and (2) a local preconditioner (for example, an element-by element preconditioner.) These are discussed hereafter.

1. Global Preconditioner

Two global preconditioners are available as follows:

The first preconditioner available is the nodal diagonal scaling matrix W₁ defined as:

(5)

and we rewrite Eq. 2 in a scaled form:

(6)

Notes (cont’d)

where

(7)

(8)

(9)

with this definition of W₁ , diag becomes an identity matrix.

The second preconditioner available is the nodal block-diagonal scaling matrix W₂
defined as:

block_diagonal (A) (10)

where the operator "block_diagonal" assigns to W₂the nodal block-diagonal matrix consisting of the mxm nodal diagonal blocks of A where m = number of equations per node (m Ndof). Nodal diagonal blocks of A are symmetric and positive definite. Therefore, W₂ possess a well-defined Cholesky factorization:

W₂ = U^TU (11)

and we rewrite Eq. 2 in a scaled form:

(12)

where:

(13)

(14)

(15)

With this definition of W₂, block_diagonal becomes an identity matrix. Note that if A is symmetric and positive-definite, inherits these properties. An important attribute of block-diagonal scaling is its ability to nondimensionalize the original system of equations. This is not necessary if the system is dimensionally homogeneous. However, when this is not the case (for example, when using structural elements with both translational and rotational degrees of freedom), the 2-norm of the residual is not well-defined. To ensure against this, A can be scaled

Notes (cont’d)

at the element level. That is, each element matrix is replaced with, where U is the Cholesky factor of W₂.

Remark: The nodal diagonal scaling or nodal block-diagonal scaling is embedded into the solution algorithm before entering either the preconditioned Conjugate Gradient or the preconditioned GMRES iterative process. This requires forming before entering either process and subsequently recovering after the iterative solution has been obtained. Since nodal diagonal or nodal block-diagonal scaling is always used, this global preconditioniong is often called pre-preconditioning or simply scaling, to distinguish it from the (optional) preconditioning used in the Conjugate Gradient or GMRES iterations. Also, in the following to simplify the notation the tilde is dropped on A, b and x, and the scaled system is written hereafter as:

Ax = b (16)

2. Conjugate Gradient

The Conjugate Gradient method [7] is restricted to linear systems Ax = b for which A is symmetric positive-definite. This is typically the case in solid mechanics problems. Solving such systems can be viewed as minimizing the associated potential function

(17)

over all , where N = number of equations (A is an NxN matrix.) An iterative strategy that begins with an initial vector x₀ and computes a series of approximate solutions {x₁, x₂,...} which progressively reduce , is used. The essence of the Conjugate Gradient Method consists of obtaining a new vector x_i₊₁ from x_i along a direction p_i at a distance _i:

(18)

The direction vectors p_i are mutually conjugate and satisfy the relationships:

(19)

and

(20)

The step length is fixed by the condition , while the parameter is determined by the A-conjugate property of the search direction p_i. The algorithm is summarized as follows:

Notes (cont’d)

Given:

scaled linear system: Ax = b

preconditioner: B

symmetric positive-definite: A and B

Initialize:

Iterate:

do i = 0, 1, 2, ..., i_max

Check convergence: if return

enddo

In the above, = convergence_tol is a user-specified convergence tolerance, and B a symmetric positive-definite conditioner matrix. Similar to scaling, this second preconditioning scheme also employs a symmetric positive-definite conditioner matrix B, which is selected such that (1) B significantly contracts the spectrum of A, i.e., ; and (2) solving the linear system Bz = r is relatively inexpensive. Of course, using B = A would yield convergence to the

Notes (cont’d)

exact solution without any iteration. But this is not considered a viable choice due to its associated computational cost. Also, using B = I = identity matrix one recovers the

unpreconditioned Conjugate Gradient method. The various options for defining B are described later in this section.

3. GMRES

The Generalized Minimal Residual (GMRES) method of Saad and Schultz [10] can be applied to any linear system Ax = b, but is usually reserved for those systems in which A is non-symmetric. The essence of the method is to obtain a trial solution vector x₀ + z that minimizes the 2‑norm of the residual r = b - A(x₀+ z); that is to find

min (21)

where x₀ is an initial guess, z is in the k -dimensional Krylov space K:

span {r₀, Ar₀,...., A^k-1r₀} (22)

and r₀ = b - Ax₀. The algorithm employs an orthonormal basis of K which is obtained by the modified Gram-Schmidt procedure, and the minimization is performed using the Q-R algorithm [9]. An excellent general reference for the numerical algebraic procedures employed is [6]. Convergence of the GMRES method is also influenced by the condition number of the coefficient matrix A. Condition improvements are accomplished through the use of an appropriate preconditioning matrix B = LU = a positive-definite matrix, where L and U are referred to as the left and right preconditioning matrices, respectively. The preconditioned linear system then takes the form:

(L^-1AU^-1)(Ux) = (L^-1b) (23)

It should be emphasized that the matrix (L^-1AU^-1) is never formed nor stored as a single matrix. Rather, L^-1andU^-1 are viewed as representing operations carried out during the GMRES solution process. The algorithm is summarized as follows:

Notes (cont’d)

1. Given:

scaled linear system: Ax = b

preconditioner: B = LU

2. Initialize:

x = 0 b = L^-1b r = b

3. GMRES cycles:

do l = 0, 1, 2, ..., l_max (GMRES cycles loop)

u₁ = b - L^-1AU^-1x

do i = 0, 1, 2, ..., k_max (GMRES iterations loop)

u_i+1 = L^-1AU^-1u_i

• do j = 1,..., i (Modified Gram-Schmidt Orthogonalization)

enddo on j

•

• Update Q-R factorization of

• Check convergence:

If exit GMRES iteration loop i

Notes (cont’d)

enddo on i (End GMRES iterations loop)

Solve: H_iy = e (Using Factorized H_i)

Update solution:

Check convergence:

If exit GMRES cycles loop l

enddo on l (End GMRES cycles loop)

4. Recover scaled system solution vector: x = U^-1x

The GMRES iteration loop consists of formation of an orthogonal basis for the Krylov space by the modified Gram-Schmidt procedure; triangularization of the Hessenberg matrix H_i by the Q-R algortihm, and backsubstitution. The maximum number of GMRES cycles allowed is l_max = max_gmres_cycles, a user-specified value. The dimension of the Krylov space k = num_gmres_iterations, is a user-specified value, as is the convergence tolerance convergence_tol.

4. Implementation Issues

Central to both CG and GMRES methods is the computation of the matrix-vector product Ap of the global matrix A by a vector p, at every iteration. A naive way of performing Ap-products would be to initially compute and store all the entries in the global matrix A and as needed, compute explicitly the product for each vector p. This obviously would entail extremely large costs both with regard to computation and storage requirements. However, more efficient methods for handling Ap-products that alleviate these problems are available. DYNAFLOW provides two such methods: (1) element-by-element (EBE) storage of A; and (2) direct matrix-free computations of the Ap-product. In the first option, A is stored as a series of unassembled element matrices, whereas in the second option, A is never formed as discussed hereafter.

Element-by-Element (EBE) Ap-product: The method is due to Fox and Stanton [2] and Fried [3]. The matrix A is stored in terms of unassembled element matrices A^e, and the Ap-product is computed via the decomposition:

(24)

Notes (cont’d)

where Nel = number of elements. Let Nee = number of element equations (Nee = Ned x Nen, where Ned = number of element degrees of freedom; Nen = number of nodes per element), then the data structure requires:

for symmetric A^e: Nee x (Nee+1)/2 words

for nonsymmetricA^e: Nee x Nee words

and the calculation in Eq. 24 requires at most Nel x Nee² floating-point operations (half for symmetric matrices.)

Matrix-free Ap-product: In the matrix-free method, evaluation and storage of the components of A are avoided altogether, and an explicit forward differencing scheme is used to approximate the product. This is possible because A is a Jacobian-like matrix [see Section 12.3], thus a residual-like vector r can always be defined such that:

(25)

where u is the solution vector, and a "small" number related to = computing machine precision. The vector r(u + p) is sometimes referred to as the perturbed residual. Criteria for determining an optimum value for have been implemented in DYNAFLOW (see Johan [8] for details.)

5. Preconditioning

In the preconditioned Conjugate Gradient algorithm, the preconditioning matrix B is directly involved in the solution of the system Bz = r. In the GMRES algorithm, the left and right preconditioning matrices L and U obtained from B = LU are involved in the computation of L^-1AU^-1p terms. Two types of preconditioning are available in DYNAFLOW: (1) element-by

element (EBE) preconditioners; and (2) global incomplete factorization-based ILU preconditioners.

Element-by-Element (EBE) Preconditioning: The terminology refers to the fact that the preconditioners are constructed from the individual element matrices A^e (recall that A^eis the e^th element's scaled contribution to the scaled global matrix A). The choice of element-by-element preconditioners is motivated by the element-based local data structure inherent to finite element codes. Two kinds of EBE preconditioners are available: (1) factorized, and (2) Gauss-Seidel. Both preconditioners are product decomposition approximations of the global matrix A. Since the matrices A^eare usually rank-deficient, some regularization of A^eis needed to remove the

Notes (cont’d)

zero-eigenvalues. This is accomplished via Diagonal or Winget [12] regularization, where the regularized matrix is computed as:

(26)

The EBE preconditioners are constructed from the regularized element matrices . Details can be found in [11, 12] for both symmetric and nonsymmetric matrices.

Global Incomplete Factorization-Based (ILU) Preconditioning: An alternative preconditioning method is based on an incomplete factorization of A. The goal is to obtain a reasonably accurate factorization of A without having to form the full A or generating too many fill‑ins. In the current implementation, an incomplete factorization by "position" has been adopted, in which the user may select the maximum height (user-specified ILU_level) of the columns used to store the incomplete factors of A. Then for ILU_level = 0 only the diagonal of A is used to form B, whereas for ILU_level > 0 terms in the j column of A from row i = j (diagonal) up to i = j - ILU_level are used to form B.

(2) For certain problems, a nonsymmetric matrix may be required. See element data in Chapter 9 and material data in Chapter 10 for details.

References / Bibliography

1. Duff, I.S., A.M. Erisman and J.K. Reed, Direct Methods for Sparse Matrices, Oxford Science Publications, Clarendon Press, Oxford, England, (1989).

2. Fox, R.L. and E.L. Stanton, "Developments in Structural Analysis by Direct Energy Minimization," AIAA Journal, 6, (1968), 1036-1042.

3. Fried, I., "More on Gradient Iterative Methods in Finite-Element Analysis," AIAA Journal, 7, (1969), 565-567.

4. George, J.A. and J.W-H. Liu, Computer Solution of Large Sparse Positive Definite Systems, Prentice Hall, (1981).

5. Gibbs, N.E., W.G. Poole Jr., and P.K. Stockmeyer, "An Algorithm for Reducing the Bandwidth and Profile of a Sparse Matrix," SIAM J. Num. Anal., 13, (1976), 236-250.

6. Golub, G.H. and C.F. VanLoan, Matrix Computations, Johns Hopkins University Press, Baltimore, (2^nd edition), (1989).

7. Hestenes, M.R. and E. Steifel, "Method of Conjugate Gradients for Solving Linear Systems," J. Research National Bureau of Standards, 49, (1952), 409-436.

8. Johan, Z., "Data Parallel Finite Element Techniques for Large Scale Computational Fluid Dynamics," Ph.D. Thesis, Stanford University, Stanford, CA, (1992).

9. Lewis, J.G., "Implementation of the Gibbs-Poole-Stockmeyer and Gibbs-King Algorithms," ACM Trans. Math. Softw., 8, 2, (1982), 180-189.

10. Saad, Y. and M.H. Schultz, "GMRES: A generalized minimum residual algorithm for solving nonsymmetric linear systems," SIAM J. Sci. Statist. Comp., 7, (1986), 856-869.

11. Shakib, F., T.J.R. Hughes and Z. Johan, "A Multi-Element Group Preconditioned GMRES Algorithm for Nonsymmetric Systems arising in Finite Element Analysis," Comp. Meth. Appl. Mech. Eng., 75, (1989), 415-456.

12. Shanno, D.F., "On the Convergence of a New Conjugate Gradient Algorithm", SIAM J. Numer. Anal., 15, No. 6, (1978), 1247–1257.

13. Shanno, D.F., "Conjugate Gradient Methods with Inexact Searches", Mathematics of Operations Research, 3, No. 3, (1978), 244–256.

14. Winget, J.M. and T.J.R. Hughes, "Solution Algorithms for Nonlinear Transient Heat Conduction Analysis Employing Element-by-Element Strategies," Comp. Meth. Appl. Mech. Eng., 52, (1985), 711-815.

Notes . .