Iterative methods are currently the solvers of choice for large sparse linear systems of equations. However, it is well known that the key factor for accelerating, or even allowing for, convergence is the preconditioner. The research on preconditioning techniques has characterized the last two decades. Nowadays, there are a number of different options to be considered when choosing the most appropriate preconditioner for the specific problem at hand. The present work provides an overview of the most popular algorithms available today, emphasizing the respective merits and limitations. The overview is restricted to algebraic preconditioners, that is, general-purpose algorithms requiring the knowledge of the system matrix only, independently of the specific problem it arises from. Along with the traditional distinction between incomplete factorizations and approximate inverses, the most recent developments are considered, including the scalable multigrid and parallel approaches which represent the current frontier of research. A separate section devoted to saddle-point problems, which arise in many different applications, closes the paper.

The term “preconditioning” generally refers to

The development of preconditioning techniques for large sparse linear systems is strictly connected to the history of iterative methods. As mentioned by Benzi [

As it often occurred in the past, a dramatic and tragic historical event like World War II increased the research fundings for military reasons and helped accelerate the development and progress also in the field of numerical linear algebra. World War II brought a great development of the first automatic computing machines, which later led to the modern digital electronic era. With the availability of such new computing tools, the interest in the numerical solution of relatively large linear systems of equations suddenly grew. The more significant advances were obtained in the field of direct methods. Between the 40s and 70s the algorithms of choice for solving automatically a linear system are based on the Gaussian elimination, with the development of several important variants which helped greatly improve the native method. First, it was recognized that pivoting techniques are of paramount importance to stabilize the numerical computations of the triangular factors of the system matrix and reduce the effects of rounding errors, for example, [

In the meantime, iterative methods lived in a kind of niche, despite the publication in 1952 of the famous papers by Hestenes and Stiefel [

In the 60s the Finite Element (FE) method was introduced in structural mechanics. The new method had a great success and gave rise to a novel set of large matrices which were very sparse but neither diagonally dominant nor characterized by property A. Unfortunately, SOR techniques were not reliable in many of these problems, either converging very slowly or even diverging. Therefore, it is not a surprise that direct methods soon became the reference techniques in this field. The irregular sparsity pattern of the FE-discretized structural problems gave a renewed impulse to the development of more appropriate ordering strategies based on the graph theory and more efficient direct algorithms, leading in the 70s and early 80s to the formulation of the modern multifrontal solvers [

In contrast, during the 60s and 70s iterative solvers were living their infancy. CG was rehabilitated as an iterative method by Reid in 1971 [

The 80s and 90s are the years of the great development of Krylov subspace methods. In 1986 Saad and Schultz introduced the Generalized Minimal Residual (GMRES) method [

Initially, Krylov subspace methods were seen with some suspicion by the practitioners, especially those coming from structural mechanics, because of their apparent lack of reliability. Numerical experiments in different fields, such as FE elastostatics, geomechanics, consolidation of porous media, fluid flow, and transport, for example, [

This is why research on the construction of effective preconditioners has significantly grown over the last two decades, while advances on Krylov subspace methods have progressively faded. Currently, preconditioning appears to be a much more active and promising research field than either direct and iterative solution methods, and particularly so within the context of the fast evolution of the hardware technology. On one hand, this is due to the understanding that there are virtually no limits to the available options for obtaining a good preconditioner. On the other hand, it is also generally recognized that an optimal general-purpose preconditioner is unlikely to exist, so that there is the possibility to improve the solver efficiency in different ways for any specific problem at hand within any specific computing environment. Generally, the knowledge of the governing physical processes, the structure of the resulting system matrix, and the available computer technology are factors that cannot be ignored in the design of an appropriate preconditioner. It is also recognized that theoretical results are few, and frequently “empirical” algorithms may work surprisingly well despite the lack of a rigorous foundation. This is why finding a good preconditioner for solving a sparse linear system can be viewed rather as

Roughly speaking, preconditioning means transforming system (

Writing the preconditioned Krylov subspace algorithms is relatively straightforward. Simply, the basic algorithms can be implemented replacing

Generally speaking, there are three basic requirements for obtaining a good preconditioner:

the preconditioned matrix should have a clustered eigenspectrum away from 0,

the preconditioner should be as cheap to compute as possible,

its application to a vector should be cost-effective.

The importance of conditions (ii) and (iii) depends on the specific problem at hand and may be highly influenced by the computer architecture. From a practical point of view, they can also be more important than condition (i). For example, if a sequence of linear systems has to be solved with the same matrix

An easy way to build a preconditioner is based on the decomposition of

Because of their simplicity, Jacobi, Gauss-Seidel, SOR, and Symmetric SOR preconditioners are still used in some applications. Assume, for instance, that matrix

Though the algorithms above cannot compete with more sophisticated tools in terms of global performance [

For the developers of computer codes, however, using physically based preconditioners is not always desirable. In fact, it is usually simpler introducing the linear solver as a black box which is expected to work independently of the specific problem at hand, and especially so with less specialized codes such as the commercial ones. This is why a great interest has arisen also in the development of purely algebraic preconditioners that claim to be as much “general purpose” as possible. Algebraic preconditioners are usually robust algorithms which require the knowledge of the matrix

In this work, the traditional distinction between incomplete factorizations and approximate inverses is followed, describing the most successful algorithms belonging to each class and the most significant variants developed to address particular occurrences and increase efficiency. Then, two additional sections will be devoted to the most recent results obtained in the fields that currently appear to be the more active frontier of research in the area of algebraic preconditioning, that is, multigrid techniques and parallel algorithms. Finally, a few words will be spent on a special class of problems characterized by indefinite saddle-point matrices, which arise quite frequently in the applications and have attracted much interest in recent years.

Given a nonsingular matrix

The native ILU algorithm runs as follows. Define a set

The procedure described above is clearly an incomplete Gauss elimination. Hence, no surprise that several implementation tricks developed to improve the efficiency of direct solution methods can be borrowed in order to reduce the computational cost of the ILU computation. For example, the algorithm can follow either the KIJ, or the IKJ, or the IJK elimination variants. To give an idea, Algorithm

IKJ variant of ILU factorization.

The several existing versions of the ILU preconditioner basically differ for the rules followed to select the retained entries in

Unfortunately, there are a number of problems where the ILU(0) preconditioner converges very slowly or even fails. There are several different reasons for this, depending on both the unstable computation and the unstable application of

There are several different ways for enlarging the initial pattern

In most applications, ILU

Similarly to the level-of-fill approach, a drawback of the drop tolerance strategy is that the amount of fill-in of

Retain the

As the drop tolerance may be quite sensitive to the specific problem and so difficult to implement in a black box solver, ILU variants that use only the fill-in parameter

Both Algorithms

Meijerink and van der Vorst [

A famous stabilization technique to ensure the IC existence is the one advanced by Ajiz and Jennings [

Unstable factors and negative pivots typically arise if the diagonal terms of

If

Incomplete factorizations can be very powerful preconditioners; however the issues concerning their existence and numerical stability can undermine their robustness in several examples. One reason for such a weakness can be understood using the following simple argument. Consider the incomplete factorization

To cope with these drawbacks the researchers in the past have developed a second big class of preconditioners with the aim of computing an explicit form for

Early algorithms for computing an approximate inverse via the Frobenius norm minimization appeared as far back as in the 70s [

Schematic representation of a matrix-vector product subject to sparsity constraints.

The main difference in the algorithms developed within this class relies on how the nonzero pattern

Compute

Compute

Gather

Solve the least square problem

Enlarge

Gather

Solve the least square problem

A different way to build an approximate inverse of

Set

Drop the

Quite obviously, if

Compute

Set

Gather

Solve

The native FSAI algorithm has been developed for SPD matrices. A generalization to nonsymmetric matrices is possible and quite straightforward [

A key factor for the efficiency of approximate inverse preconditioning based on the Frobenius norm minimization is the choice of the nonzero pattern

With respect to the nonzero pattern selection, approximate inverses can be defined as

Dynamic approximate inverses are based on adaptive strategies which start from a simple initial guess, for example, a diagonal pattern, and progressively enlarge it until a certain criterion is satisfied. For instance, a typical dynamic algorithm is the one proposed by Grote and Huckle [

In contrast with approximate inverses based on the Frobenius norm minimization, the AINV algorithm does not need the selection of the nonzero pattern, so it can be classified as a purely dynamic preconditioner. However, its computation turns out to be pretty similar to that of an incomplete factorization, see Algorithms

Theoretically, for an arbitrary nonzero pattern

Numerical experiments in different problems typically show that, whenever a stable ILU-type preconditioner can be computed, that is probably the most efficient preconditioner on a scalar machine [

It is well known that the convergence rate of any Krylov subspace method preconditioned by either an incomplete factorization or an approximate inverse tends to slow down as the problem size increases. The convergence deterioration along with the increased number of operations per iteration may lead to an unacceptably high computational cost, thus limiting de facto the size of the simulated model even though large memory resources are available. This is why in recent years much work has been devoted to the so-called

Multigrid methods can provide an answer to such a demand. Pioneering works on multigrid are due to Brandt [

Recalling the previous observations, the basic idea of multigrid proceeds as follows. The operator

Multigrid methods have been introduced as a solver for discretized PDEs of elliptic type, and indeed in such problems they have soon proved to be largely superior to existing algorithms. The first idea to extend their use to other applications was to look at the multigrid as a purely algebraic solver, where one has to define the smoother, restriction, and prolongation operators knowing the system matrix

Solve

Apply

Call

Apply

Basically, AMG preconditioners vary according to the choice of both the restriction operator and the smoother, while the prolongation operator is often defined as the transposed of the restrictor. The basic idea for defining a restriction is to coarsen the native pattern of

The last decade has witnessed an explosion of research on AMG techniques. The key factor leading to such a great interest is basically their potential scalability with the size of the problem to be solved, in the sense that the iteration count to converge for a given problem does not depend on the number of the mesh nodes. Several theoretical results have been obtained with the aim of generalizing as much as possible AMG to nonelliptic problems, for example, [

Parallel computing is widely accepted as the only pathway toward the possibility of managing millions of unknowns [

ILU-based preconditioners are highly sequential in both their computation and application. Anyway, some degree of parallelism can be achieved through graph coloring techniques. These strategies have been known from a long time by numerical analysts, for example, [

In contrast, approximate inverses are intrinsically much more parallel than ILU factorizations, as they can be applied by a matrix-vector product instead of forward and backward substitutions. The construction of an approximate inverse, however, can be difficult to parallelize. For example, the AINV Algorithm

The theoretical scalability properties of AMG make it very attractive for parallel computations. This is why in the last few years the research on AMG techniques has concentrated on high-performance massively parallel implementations. The main difficulties may arise in parallelizing the Gauss-Seidel smoother and the coarsening stage. As mentioned before, these problems can be overcome by using naturally parallel smoothers, such as Jacobi, relaxed Jacobi, polynomial or a static approximate inverse smoother, for example, [

An alternative strategy for developing parallel preconditioners relies on building new algorithms which should consist of matrix-vector products and local triangular solves only. Perhaps the earliest technique of this kind belongs to the class of the polynomial preconditioners which was first introduced by Cesari as back as in 1937 [

Another popular parallel preconditioner is based on the Additive Schwarz (AS) methods [

Matrix partitioning into possibly overlapping subdomains.

A novel recent parallel preconditioner which tries to combine the positive features of both approximate inverses based on the Frobenius norm minimization and AS methods is Block FSAI [

Elements belonging to

The Block FSAI preconditioner described above can be improved by defining

Block FSAI has the theoretical property of minimizing

Extract

Gather

Solve

Set

Compute

Update

Gather

Solve

Compute the diagonal term

Compute

Matrix

Block FSAI coupled with a block diagonal IC factorization, for example, as implemented in [

In recent years the interest in the solution of saddle-point problems has grown, with many contributions from different fields. The main reason is the great variety of applications requiring the solution of linear systems with a saddle-point matrix. For example, such problems arise in the discretization of compressible and incompressible Navier-Stokes equations in computational fluid dynamics [

A saddle-point problem is defined as a linear system where the matrix

A natural way to solve the problem

Quite obviously, there are several different constraint preconditioners according to how

Despite its remarkable properties, ECP has at least two severe limitations. First,

To make the preconditioner application cheaper, a substitute for (

Similar to ECP, ICP as well requires an explicit approximation

It is quite well accepted that constraint preconditioners outperform any other “global” preconditioning approach applied to saddle-point matrices. This remarkable property has attracted the theoretical interest of several researchers who have investigated the spectral properties of constraint-preconditioned saddle-point matrices [

Quite naturally, there is the distinct interest to implement constraint preconditioners on parallel computers. The main difficulties for developing efficient parallel variants of constraint preconditioners are twofold. First, the constraint preconditioning concept is inherently sequential, as it involves the preliminary computation of the Schur complement and then the subsequent approximate solution to two inner systems. Thus, a standard parallel approach where the system matrix is distributed among the processors as stripes of contiguous rows is not feasible, because all the processors owning the second block of unknowns are idle when the other processors perform the operation on the first block, and viceversa. Second, efficient parallel approximations of

The development and implementation of efficient preconditioners are the key factors for improving the performance and robustness of Krylov subspace methods in the solution of large sparse linear systems. Such as issue is a central task in a great number of different applications, not all necessarily related to PDEs, so it is no surprise that much research has been devoted to it. After a time where most efforts were focused on direct and iterative solvers, the last two decades can be denoted as the “preconditioning era.” The number of papers, projects, and international conferences addressing this topic has largely exceeded those aimed at the development of new solvers, and this trend is not likely to fade in the next few years. This is mainly due to the fact that an optimal general-purpose preconditioner does not exist and that any specific problem can afford the opportunity to develop an ad hoc strategy.

The present paper is intended to offer an overview of the most important algebraic preconditioners available today for the solution of sparse linear systems. Hence, it cannot exhaust the subject of preconditioning as the entire class of problem-specific algorithms has been omitted. Algebraic preconditioners have been chosen for their “universality,” in that they can be effectively used as black-box tools in general-purpose codes requiring the knowledge of the system matrix only. A classification of algebraic preconditioners has been attempted based on the most significant representatives of each group along with their prominent features:

the potentially high scalability of

the current hardware development is leading to a much more pronounced parallel degree of any computer. Besides the parallelization of existing algorithms, this trend is going to enforce the development of totally new techniques which can make sense in a parallel computing environment only. This is why

The author is very thankful to Giuseppe Gambolati and Carlo Janna for their careful reading of the paper and helpful comments.