Reusable object-oriented solutions for numerical simulation of PDEs in a high performance environment

Object-oriented platforms developed for the numerical solution of PDEs must combine flexibility and reusability, in order to ease the integration of new functionalities and algorithms. While designing similar frameworks, a built-in support for high performance should be provided and enforced transparently, especially in parallel simulations. The paper presents solutions developed to effectively tackle these and other more specific problems (data handling and storage, implementation of physical models and numerical methods) that have arisen in the development of COOLFluiD, an environment for PDE solvers. Particular attention is devoted to describe a data storage facility, highly suitable for both serial and parallel computing, and to discuss the application of two design patterns, Perspective and Method-Command-Strategy, that support extensibility and run-time flexibility in the implementation of physical models and generic numerical algorithms respectively.


Introduction
COOLFluiD (Computational Object-Oriented Library for Fluid Dynamics) [13] is a framework originally developed to solve fluid-dynamics related PDEs (Partial Differential Equations) on unstructured grids by means of state-of-art numerical techniques.
Similarly to many other scientific platforms, such as DiffPack [4], Overture [10], ELEMD [16], MOUSE [24] and OpenFoam [25], COOLFluiD is implemented in C++, in order to take advantage of both classical object-oriented (OO) techniques [15,20,21] and generic template-based programming [1,22,23].Effort has been devoted in searching and implementing long term architectural solutions, in order to allow both run-time flexibility and high performance, as required by the target applications.
In particular, while designing the kernel interfaces and components of COOLFluiD, no strict a priori assumption has been made on the kind of PDE solving algorithms and discretization techniques to support.Consequently, a multitude of numerical solvers for unstructured grids can be implemented within the platform.This algorithmindependent design policy marks in fact a sound difference between COOLFluiD and the majority of other available OO frameworks for Computational Fluid Dynamics (CFD), which are usually tailored towards specific space discretization schemes (e.g.Finite Volume for [24,25], Finite Volume and Finite Difference on block structured meshes for [10], cell-vertex schemes for [16], etc.).

COOLFluiD architecture
COOLFluiD consists of a collection of dynamically linked libraries and application codes, organized in a multiple layer structure, as shown in Fig. 1.Two kinds of libraries are identifiable: kernel libraries and modules (plug-ins).The former supply basic functionalities, data structure and abstract interfaces, without actually implementing any scientific algorithm, while the latter add effective simulation capabilities.The framework was developed according to a plug-in policy, based on the self-registration [3,13] and self-configuration techniques [13], which allow new components or even third party extensions to be integrated at run-time, while only the API has been exposed to the developers.From top to bottom of the COOLFluiD architecture, one finds, progressively, more functional layers: the kernel defines the interfaces, the modules implement them and the application codes can select and load on demand the libraries needed for an actual simulation.This multi-layer structure guarantees flexibility and modularity at the highest design level and helps managing the development process, since the layers automatically reflect different implementation levels.

Outline
Designing a framework for generic PDE solvers is not a trivial task by itself and additional complexity comes in our case from planning to deal efficiently with arbitrary numerical algorithms and discretization schemes with different data structures.
According to a simplified but sufficiently complete conceptual view, we can consider a numerical solver for PDEs to be made by a limited number of independent building blocks: some mesh-related data, typically numericsdependent, that represent a discretized view in terms of both geometry and solution of the computational domain; an equation set describing the physical phenomena to study; a group of interacting numerical algorithms to solve the PDEs on the given discretized domain.
We can therefore identify three issues of fundamental importance for developing an environment for solving PDEs: -handling and storage of bulk serial and distributed data; -implementation of a physical model; -implementation of a numerical method.
In our opinion, a good OO design should be able to tackle these three aspects orthogonally, independently from one another as much as possible, in order to minimize coupling and maximize the reusability of the single components.To this aim, a clear separation between physics and numerics should be enforced, the first being the description of the properties, constitutive relations and quantities that characterize the equation system, the second providing the mathematical tools (algorithms, algebraic and differential operators) to discretize and solve those equations.This article will focus on analyzing each one of the three above-mentioned problems in a separate Sections (2,3,4), and on presenting corresponding possibly effective solutions, as they were implemented in COOLFluiD.Some illustrative Examples (3.1, 4.1) will also be given in order to show the concrete applicability of the proposed ideas.

Data handling and storage
PDE-driven simulations typically involve a considerable amount of data related to the discretization of the computational domain, such as coordinates, state variables, geometric entities, etc., all grouped inside container objects (arrays, lists . ..) on which numerical actions are performed.Additional complexity appears when computing on unstructured grids, as in the case of COOLFluiD, since different numerical methods can require the storage and usage of different kinds of connectivity information.As a result, genericity in both quantity and types of data must be addressed.
Moreover, in a scalable and safe design of the data handling, framework components should be able to share data without violating encapsulation.
In a high performance environment, developers of new modules should be allowed to elect newly defined data types to be stored, handled efficiently, and shared in parallel communication.Ideally, local and distributed data should be treated uniformly from the developer's point of view, and support for different communication strategies (message passing and shared memory) should be offered.

Data storage
In COOLFluiD, all data whose size scales with the problem complexity are encapsulated by a MeshData object, that, in particular, aggregates and offers safe access to a number of underlying instances of DataStorage.

template <class TAG> class DataStorage {};
DataStorage is defined as a template empty class parameterized with a lightweight tag policy, TAG, which is meant to specify the actual parallel implementation or communication type (e.g.As explained in [12], these tags do not appear explicitly in the code, but they are aliased at compile time to more general concepts, such as GLOBAL for intra-communication or REMOTE for inter-communication, according to the available models and the user's preferences: In COOLFluiD, a parallel layer is defined with the task of concealing implementation details that rely upon specific parallel paradigms.To this aim, only the aliased tags are used outside the parallel layer, i.e. in the numerical modules of the framework.
To make DataStorage instantiable, explicit template specializations [20,22] must be defined, one for each native tag class.

Local data storage
The class definition of DataStorageInternal is parameterized with the storage type, STYPE, and the derived DataStorageLocal specifies the latter to be an Evector [11], which offers an improved and more efficient implementation of std::vector [6,20,21].In particular, Evector supports back insertion/deletion of elements in constant time and on-the-fly memory allocation to accommodate new entries with minimal overhead.
DataStorageInternal behaves as an archive where arrays of data with a certain type and size are registered under a user-chosen name in an associative container like a std::map [20,21].Pointers to the arrays in question are statically casted to void * , in order to let coexist different types in the same instance of DataStorageInternal.No built-in garbage collection or deletion policies based on some sort of reference counting are automatically provided by DataStorageInternal.This fact implies that data created with createData() must be consistently deleted with an explicit call to deleteData(): ad-hoc setup and unsetup Commands fulfill this task in each numerical module (see 4.1).
Exceptions are thrown if client code attempts to register twice data with the same name or to delete unavailable data.In any case, just before the end of the simulation, a general clean-up is performed by MeshData and all the still reachable entries in all the instantiated DataStorages are removed.

MPI data storage
DataStorage, which is a proxy for DataStorageInternal, provides a uniform, user-friendly interface to create and manage arrays of generic data types, that are meant to be shared among different numerical components.Moreover, DataStorage offers the same interface to treat both local and distributed data uniformly, even though the actual implementation of its member functions depends on the tag class.In case the MPI tag is used, DataStorageMPI is defined as follows: DataStorageMPI does not derive from DataStorageInternal, but provides its own implementation of the same interface, so that the actual differences between one DataStorage or another remain completely invisible to the client code.The main difference between DataStorageMPI and DataStorageInternal is given by the presence of a communicator, needed by ParVectorMPI, which is the container type for the data to be registered in DataStorageMPI.

Data handle
Once registered in either a local or MPI DataStorage, data can be accessed through a smart pointer, DataHandle, that ensures safety by preventing data array from being accidentally deleted.It also provides inlinable accessor/mutator functions for the individual array entries and offers some additional functionalities.We present the class definition of DataHandle and its partial template specializations when the tag class is set to LOCAL or MPI: DataHandleMPI declares functions that deal with the synchronization and handling of data belonging to the overlap region.The actual synchronization and communication is delegated to the acquainted parallel vector, ParVectorMPI, in which all MPI calls are encapsulated.As shown above, the same functions must be implemented also in the DataHandleLocal in order to allow all the code to compile even without having MPI, that is when the GLOBAL tag is aliased to LOCAL.For sake of completeness, we report the definition of DataHandleInternal, from which both the above presented DataHandles derive: StorageType* m_ptr; // pointer to the array to be handled };

Current and future applications
The parallel implementation of data handling and storage that has been described so far is currently available only for standard MPI [14], even though the use of other libraries for message passing communication like PVM [8] could be easily integrated.In the latter case, a new set of template specializations (DataHandlePVM, DataStoragePVM and ParVectorPVM) with specific PVM bindings would be required, with no implications on the client code of DataStorage, that would keep on dealing with LOCAL and GLOBAL data.
The described scheme allows the user to combine different tagged models within the same run: a LOCAL (serial) and a GLOBAL (MPI) model coexist in a parallel multi-processors SPMD (Single Program Multiple Data) COOLFluiD simulation.
The MeshData object aggregates two instances of DataStorage, one for LOCAL data and one for GLOBAL data.The former includes not only data that are created by numerical modules and that are not meant to be shared among different processors, but also a local representation of some distributed data.An example is given by storages of State and Node objects, which represent solution state vectors and geometric coordinates, i.e. the degrees of freedom of the computational grid.States and Nodes are both proxies [7] for tiny arrays of floats, on which convenient symbolic mathematical operations can be performed efficiently using the Expression Templates technique [9,22,23].The raw memory of these tiny arrays, which are interfaced by States (or Nodes) objects, belongs to a single GLOBAL preallocated unidimensional ParVectorMPI of floats, on which synchronization and communication are performed.During the computation, the synchronization and communication processes involve only a relatively Fig. 2. Cell-wise mesh partitioning that shows updatable and ghost states in the overlap region.small fraction of data, namely only the updatable and non updatable States and Nodes which belong to the overlap region in each process.The size of the overlap region is determined by the needs of the selected space discretization algorithm.The mesh is partitioned on a cell base by means of a polymorphic MeshPartitioner object: a self-rolled random partitioner and wrappers for METIS and ParMETIS libraries [18] are currently available.The overlap region is normally chosen to include all the vertex neighbors of the local cells attached to the partition boundary in each process, as it is illustrated by Fig. 2.This choice allows numerical algorithms to work solely on LOCAL data which already include the full computational stencil, without needing parallel communication, except at the end of each physical time step, or pseudo time step, when synchronization of remote data is needed and so is access to the GLOBAL data storage.
The next challenge for COOLFluiD is to deal with MPMD (Multiple Program Multiple Data) schemes in loosely coupled multi-physics applications, such as Aeroelasticity, where the equations of Fluid Dynamics are solved weakly coupled with the Elasticity equations and the computational mesh is deformed accordingly.
At the time of the writing of this paper, hybrid parallel simulations are being investigated.As a preliminary step, intra-simulation communication will be based on shared memory, which corresponds to the already mentioned SHM tag in the COOLFluiD terminology, and inter-simulation communication will be tackled with the already available MPI implementation.The final goal is to perform multi-physics computations in which both intra-communication among processes (processors) associated to a certain group (corresponding to a simulation on a certain domain, physics and numerics) and inter-communication among different groups will be handled by MPI (or PVM or an hybrid of the two).
In order to allow the coexistence of shared memory and MPI communication, an additional DataStorage for REMOTE data will be added in MeshData and additional DataStorageSHM, DataHandleSHM and ParVectorSHM will be implemented.Furthermore, the concept of groups needs to be introduced in the design.

Performance issues and results
The scheme and implementation that have been presented so far do not allow run-time selection of the parallel models or paradigms, but this is not really a limitation, since compile time selection offers a much better support for aggressive optimization (e.g.inlining) and high performance, helping to avoid any run-time overhead that could otherwise very likely compromise the overall efficiency.In order to make a realistic quantitative example to demonstrate this, we have tested a polymorphic DataHandleInternal with virtual accessor and mutator functions for the overloaded [] and () operators.As a result of this simple experiment, an increase of global computational time of approximately 15% has been observed in all testcases, due to the high call frequency of those tiny virtual functions, that otherwise could have been easily inlined by any available C++ compiler.Therefore, the idea of supporting run-time selection of parallel models has been discarded.The performance of the numerical solvers implemented in COOLFluiD rely heavily upon DataHandles, since all mesh data are accessed through them.Since all accessor/mutator functions defined in DataHandles are inlinable (the profiler confirms this), efficiency is guaranteed.If we compare the direct usage of raw data and the usage through the DataHandle interface, the only price to pay is a single pointer indirection.
It's therefore not surprising that a good performance is achieved in our computations, as demonstrated by Fig. 3, where the speedup results of two simulations are presented.The first testcase is an implicit hypersonic Navier-Stokes computation on a Double Ellipsoid geometry (1972384 cells, 344315 nodes) with cell-centered Finite Volume (FV).The second one is an implicit transonic Euler simulation on a F15 geometry (3558667 cells, 621636 nodes) with cell-vertex Residual Distribution Schemes (RDS).All the computations have been performed on a cluster of Intel Pentium 4 machines with 2.8 or 3.4 GHz, 2 Gb of RAM, and MPI wall time has been used for the timing.The results show an almost perfect linear speed-up for the FV testcase and a close to linear speed-up for the RDS one.This different behaviour, in our opinion, can be explained by two reasons.First, the size of the overlap region, that includes all the vertex neighbors of the nodes on the partition boundary, is currently overestimated in a cell-vertex method like RDS.This leads to some useless additional work in both numerical computation and parallel communication that can impact on the overall efficiency, especially if we consider that in relatively small problems the overlap size can become comparable with the local size of the "updatable" mesh in each processor, as it is in our case.Second, the currently available space discretization algorithm for computing the residual and its perturbations on the partition boundaries is face-based in FV and can simply ignore useless faces, while in RDS it is cell-based and takes into account also vertices that should in fact be neglected.This clearly suggests that a further optimization of the RDS implementation is needed.

Implementation of physical models
Scientists and researchers in the field of Fluid Dynamics deal regularly with complex systems of PDEs expressing conservation laws that can be formulated as where U (state vector of unknown variables), F c (convective fluxes), F v (viscous fluxes), S (source or reaction term) depend on the chosen physical model.The latter can be seen as a composition of entities, e.g., transport coefficients, thermodynamic properties and quantities that contribute to define the mathematical description of some physical phenomena.It should be noticed that one can look at the same physics through different formulations of the corresponding equations, which involve the use of different sets of variables, transformations, or other adaptations tailored, e.g., towards a specific numerical method.Moreover, in a framework for solving PDEs which supports different numerical techniques and run-time loading of components, the modules where algorithms are implemented should be as independent as possible from the modules dealing with the physics description.Therefore, when new physical models or numerical components are integrated, the need for modifications on the kernel, where most of the key abstractions are defined, should be avoided.Ideally, it should be possible to build more complex models by simply extending and reusing available ones, or solving the same equations by means of different numerical techniques, without requiring changes in the existing abstract interfaces and without establishing a high level direct coupling between physics and algorithms.Widely used OO CFD platforms like the ones described in [4,25] lack these last features, since the implementation of the physics is directly tangled to the numerical solver, in particular to the space discretization.This is surely a reasonable down-to-earth solution, it is probably efficient, but it does not promote enough reusability and interchangeability between physics and numerics.The Perspective pattern, that will now be described, proposes, in our opinion, a structural way for overcoming the above mentioned pitfalls and for facilitating dynamical incremental changes and code reuse.

Perspective pattern
Trying to define a single abstract interface for a generic physical model would be quite a demanding task and it would probably lead to a non maintainable solution, especially if we keep in mind our concern with run-time flexibility.In COOLFluiD, this hypothetical hard-to-define interface is therefore broken into a limited granularity of independent Perspective objects [13], each one offering a different view of the same physics, according to the specific needs of different numerical algorithms.To make an example, convective, diffusive, inertial, reaction or source terms of the equations can all be Perspectives with a certain abstract interface.Moreover, if one implements a new numerical scheme that needs to make use of an available physical model, but that requires something not foreseen a priori and, therefore, not offered by the available interfaces, a new Perspective can be created, without requiring a modification of the existing ones.
Figure 4 shows the Object Modeling Technique (OMT) [7] class diagram of a Perspective pattern applied to a generic physical model.The base PhysicalModel defines a very general abstract interface.ConcreteModel derives from it, implements the virtual methods of the parent class and defines another interface to which the concrete Perspective objects (and only those) are statically bound.This other interface offers data and functionalities which are typical of a certain physics, but invariant to all its possible Perspectives.
The resulting pattern lets the numerical client code make use of the physical model through an abstract layer, dynamically enlargeable if required, given by a number of Perspective objects, while all their collaborations with the ConcreteModel are completely hidden.The pattern reflects the composition-based Adapter described in [7], but with a single shared Adaptee object, namely ConcreteModel, and multiple abstract Targets (called Perspectives here), each one with a number of derived classes (Adapters).The fact that several objects may be defined to describe the same physics may look like a disadvantage, but, in our experience, improves reusability, allows much more easily to decouple physics from numerics and gives better support to a run-time plug-in policy.
Unlike in other OO CFD platforms [5,25] where no clear distinction between numerical algorithms and physical description is provided and where a close form of reliance (inheritance) binds the equation model and the scheme, in COOLFluiD, physics is completely independent and unaware of numerical methods (space and time discretizations, linear system solving, etc.).However, a design that is based on Perspective objects does not even prevent from having numerical objects, such as Strategies or Commands in 4.1, with static binding to the actual physics, e.g. in the case of some special schemes or boundary conditions.In the latter case, the use of Perspectives can still help to limit dependencies, improve reusability and avoid excessive sub-classing.
Some code examples are now presented in order to show the concrete applicability of the described pattern and its suitability to implement complex physical models.A polymorphic BaseTerm object and a corresponding enumerated variable are assigned to each independent physical term of the equation (convection, diffusion, reaction, dispersion, inertia, etc.) and they are registered in an associative container, such as a std::map.BaseTerm manages a number of unidimensional arrays of floats (RealVector), where some physics-dependent data (thermodynamic quantities, transport coefficients, parameters useful for calculating fluxes, eigenvectors, eigenvalues etc.) are stored and continuously recomputed during the simulation.SafePtr is a wrapper class around a bare pointer that provides safe copy, prevents accidental deletion and does not imply ownership.In order not to affect the total memory requirements of the computation, the maximum number of the above mentioned physical data arrays is determined by the numerical method and scales with the number of degrees of freedom in one or more geometric entities (cells, faces, etc.), depending on the required computational stencil.To make an example, for a cell-vertex based algorithm (Finite Element, Residual Distribution, etc.), the total number of physical data arrays that are stored in BaseTerm is the maximum number of quadrature points in a cell.The size and the content of these arrays will be defined by the classes deriving from BaseTerm.
A template compositor object is defined for each possible combination of physical equation terms (Convection, Diffusion, ConvectionDiffusion, etc.).It derives from PhysicalModel and aggregates generic policies [1] The equation terms aggregated by the compositor are registered in the parent class and each one of them is made accessible polymorphically through the getTerm() method defined in PhysicalModel.

Navier-Stokes model
We NavierStokesModel can exploit the knowledge of both its concrete convective and diffusive terms to compute, for instance, adimensionalizing coefficients.

Variable sets
Numerical algorithms are not direct clients of the ConcreteModel.As explained previously, the latter is used through a layer of polymorphic Perspective objects, whose collaborations are statically bound to the ConcreteModel and therefore potentially very efficient.A variable set, VarSet, is a Perspective that decouples the usage of a physical model from the knowledge of the used variables.This, for instance, allows the client code to solve equations formulated in conservative variables, to perform intermediate algorithmic steps and to update in other ones (primitive, characteristic, etc.). Figure 5  Among all the methods declared in the above two class definitions, particular attention is due to setPhysicalData() and its dual setFromPhysicalData(): the former takes a given state vector (unknown variables) and sets the corresponding physical dependent data, whose entry pattern is defined by the the subclasses of BaseTerm; the latter does the opposite.In the case of the EulerTerm, for instance, if we assume that State holds primary variables (pressure, velocity components, temperature) setPhysicalData() will compute an array of physical quantities (density, pressure, total enthalpy, total internal energy, sound speed, temperature, velocity module and components) from v by means of thermodynamic relations.w will be then reused to compute eigenvalues, fluxes etc.A key point for the flexibility of the design is that each VarSet has acquaintance of the corresponding physical equation term (see Fig. 5), but not of the resulting ConcreteModel: this allows the developer to compose progressively more complex models with full reuse of the individual equation terms.Some functionalities associated to a VarSet are independent on the variables in which one needs to work: advective fluxes, for instance, can always be computed once that physical data like pressure, enthalpy, velocity are known, because their formulation is always the same due to the conservation property.
However, as a counter example, the Euler equations can be written in conservative, symmetrizing, entropy, characteristic (. ..) variables and each one of this formulation defines different jacobian matrices for the convective fluxes.The VarSet abstraction is particularly useful in these more complex cases, since we can let variable-dependent subclasses of EulerVarSet implement the jacobian matrices, according to the chosen formulation.Therefore you can have EulerCons(conservative), EulerChar (characteristic), EulerSymm (symmetrizing), etc.
It is helpful to have derived classes for the NavierStokesVarSet too, since, for instance, the transport properties can be computed differently in chemically reactive or non reactive flows, in local thermal equilibrium (LTE) or non equilibrium (NE).The method computeFlux() that calculates the diffusive flux is a template method [7] in NavierStokesVarSet and delegates the computation of the transport properties to subclasses, which have more specific knowledge.Some possible subclasses are NavierStokesCons (non reactive conservative variables), NavierStokesLTEPvt (pressure, velocity, temperature variables for LTE), NavierStokesNERivt (densities of chemical species, velocity, temperature variables for NE) etc. Another polymorphic Perspective is the VariableTransformer.This allows the client code to apply linear matrix transformations (∂U/∂V ) between variables and to compute one set of variable from another (e.g.Euler primitive from Euler conservative and viceversa).The combined use of VarSets and VariableTransformers gives the freedom to use different variables to update the solution, compute or distributed the residual, linearize jacobians, without requiring any modification in the client numerical algorithms that work with dynamically bound Perspective objects, completely unaware of the actual physics.
The support for variable independent algorithms which is offered by COOLFluiD represents an important feature that none of other well known CFD platforms such as [4,25] have.

Turbulent k-ε multi-species model
To verify the extensibility and soundness of the Perspective pattern, we can show an application to a more challenging case: a multi-species chemically reactive turbulent k-ε model.The compositor object is now a ConvectionDiffusionReaction and ConcreteModel will aggregate three terms: EulerTurbKETerm, TurbKEDiffTerm and TurbKESourceTerm.The following classes can therefore be defined: It can be noticed that full reuse of existing terms is achieved.The data set is simply extended to accommodate the new model.The same reusability and flexibility applies also to the VarSets and VariableTransformers.In particular, EulerKEVarSet derives from EulerVarSet, acquaints EulerTurbKETerm, calls the parent method to compute the first five flux components and implements two additional ones for k and ε plus all the partial densities fluxes.Similarly, TurbKEDiffVarSet inherits from NavierStokesVarSet, acquaints TurbKEDiffTerm, and simply extends the implementation of the fluxes and computes the turbulent transport properties.Additionally, a TurbKESourceVarSet, that points to TurbKESourceTerm, and that implements a variable dependent behaviour for the reaction source term, needs also to be implemented.

Implementation of numerical methods
The solution of PDEs requires the implementation of different numerical techniques that deal with time or space discretizations, linear system solving, mesh adaptation algorithms, error estimation, mesh generation, etc.
In a typical OO design, e.g.like the ones proposed by [5,10,16,24], each one of these simulation steps is enclosed in separate object (or polymorphic hierarchies of objects) and different patterns are applied to make them all interact.
It would be advantageous to have just a single pattern, extraordinarily reusable, that allows the developer to encapsulate different numerical methods uniformly, and lets him/her focus more on the algorithm itself than on its already well-defined surrounding.This pattern should ease the integration of new algorithms, but also the implementation of different versions or parts of the same ones, while looking for optimal solutions and/or tuning for run-time performance.
One of the main achievements of COOLFluiD, as opposed to other similar OO environments, lies in having developed a uniform high level structural solution, potentially able to encapsulate and tackle efficiently a wide range of numerical algorithms: the compound Method-Command-Strategy pattern.

Method-Command-Strategy pattern
The Method-Command-Strategy (MCS) pattern [13,17] provides a uniform way to implement numerical algorithms and to compose them according to the specific needs.As sketched in Fig. 6, BaseMethod defines an abstract interface for a specific type of algorithm (e.g.MeshCreator, SpaceMethod, ConvergenceMethod, ErrorEstimator, LinearSystemSolver). ConcreteMethod implements the virtual functions of the corresponding parent class by delegating tasks to ad-hoc Commands [1,7] that share a tuple, ConcreteMethodData, aggregating multiple polymorphic receivers (Strategies).Three levels of abstraction and flexibility can be identified in this pattern: BaseMethod, Command and Strategies can all be overridden, allowing one to implement the same task, at the corresponding level, in different ways.This kind of behavioral modularity allows the developers to easily re-implement or tune components (Methods, Commands or Strategies), and gives them the freedom to move code from one layer to another, according to convenience, taste or profiling-driven indications.The fast-path code, critical for the overall performance, can be wrapped inside Commands or Strategies and it can be substituted with more efficient implementations without implying changes in the upper layer.
Moreover, while freedom is left to define the abstract interface of a new polymorphic BaseMethod or Strategy, the interface of a Command consists of only three actions: virtual void setup(); // setup private data virtual void unsetup(); // unsetup private data virtual void execute(); // execute the action In COOLFluiD, managing the collaboration between different numerical methods is eased by the fact that Commands can create their own local or distributed data and share them with other Commands defined within other Methods, by making use of the DataStorage facility, whose details are presented in Section 2.1.
Moreover, Perspective objects, as described in 3.1, can be used within the MCS pattern, at the Strategy-level, as part of the ConcreteMethodData, in order to bind a numerical algorithm to the physics polymorphically.This collaboration between MCS and Perspective patterns is an effective solution to minimize the coupling between physics and numerics.It makes possible, for instance, to employ a certain space method to discretize equations related to different physical models, but also to apply different space methods that require completely different data-structures to discretize a single set of equations.
Interchangeability of Methods, Commands and Strategies can be facilitated and maximized by making them self-registering and self-configurable objects [3,13]: this gives users and developers full control on each of them.

Self-registering objects
The self-registration technique [3,13] automatizes the creation of polymorphic objects and reduces implementation and compilation dependencies.A generic ConcreteObj of polymorphic type BaseObj can be registered by simply instantiating the corresponding ObjectProvider in the implementation file: ObjectProvider<BaseObj, ConcreteObj> myProvider("objName"); The string "objName", accepted by the provider constructor, can then be used as a key to ask a singleton template Factory [7] [13] to create the corresponding polymorphic object: BaseObj* obj = Factory<BaseObj>::getProvider("objName")->create();

Self-configurable objects
In COOLFluiD, objects can be self-configurable [13], i.e. they can create and set their own data.An object is made self configurable, by deriving it from a parent class ConfigObject and by adding a call to addConfigOption("OptionKey","option description", &configData); in its constructor for each configurable data member configData.In particular, OptionKey is the configuration key string, used to map the value of configData.This technique allows the user to input whatever kind of data (including analytical functions) from file, environmental variables or command line options.We consider, as a basic example, what could appear in a configuration file for a concrete SpaceMethod object: MySM is the self-configuration value for SpaceMethod, which is the configuration key for the homonymous polymorphic object.MySM is also the self-registration key for the concrete space method class, MySpaceMethod, that will then be instantiated and will configure itself with the specified setup Command, named MySetUp, and data.The latter includes two polymorphic strategies, whose selected instances are called MyA and MyB.
A deeper analysis of the MCS pattern and more implementation details are provided in [17].Next we consider some possible applications.

Example of SpaceMethod: FiniteVolume
We define the interface of a SpaceMethod, that takes care of the spatial discretization of the given set of PDEs, according to a specified numerical scheme, on a chosen mesh: class SpaceMethod : public Method { public: SpaceMethod(string name); // constructor virtual ˜SpaceMethod(); // virtual destructor virtual void setMethod() = 0; // setup the method virtual void unsetMethod() = 0; // unsetup the method // initialize the solution virtual void initializeSolution(CFbool isRestart) = 0; // compute the space part of the residual and jacobian matrix virtual void computeSpaceResidual(CFreal factor=1.0)= 0; // compute the time dependent part of the residual and // jacobian matrix virtual void computeTimeResidual(CFreal factor=1.0)= 0; virtual void applyBC() = 0; // apply boundary conditions }; As shown in the sample code above, SpaceMethod inherits from a non instantiable Method object that provides configuration functionalities meant to be reused by all its children, namely all the possible BaseMethods in Fig. 6. Figure 7 shows a simplified class diagram of the cell center Finite Volume (FV) module.CellCenterFVM is a concrete instance of SpaceMethod.Specific Commands, CellCenterFVMCom, are associated to actions like setup and unsetup (creation and destruction of data needed by the employed scheme), application of boundary conditions, computation of the residual and jacobian contributions to the system matrix, if required by implicit time stepping: All self-registering polymorphic objects, including Commands, are aggregated by SelfRegistPtrs, i.e. smart pointers with intrusive reference counting [1] that keep ownership on them.In order to take full profit of selfregistration and self-configuration features, all ConcreteMethods, including CellCenterFVM, hold the Command names (second entry in the std::pair tuples), which are used as keys for the polymorphic creation and configuration of the Commands themselves.Let's consider the constructor of CellCenterFVM: CellCenterFVM::CellCenterFVM(string name) : SpaceMethod(name), m_data(new CellCenterFVMData()) { m_setupStr = "StdSetup"; // default name addConfigOption("SetupCom","Setup",&m_setup.second); m_unSetupStr = "StdUnSetup"; // default name addConfigOption("UnSetupCom","UnSetup",&m_unSetup.second); m_computeSpaceRHSStr = "FVMCCRhs"; // default name addConfigOption("ComputeRHS", "Compute space residual", &m_computeSpaceRHS.second);// ... } MultiMethodHandle<LinearSystemSolver> m_lss; }; In the code above, MultiMethodHandle is a lightweight proxy object that hides the knowledge of multiplicity: it controls the access to one or more underlying Methods with the same polymorphic type and dispatches specified actions on all of them sequentially, similarly to what a std::for each function would do [20].The purpose of accessing Method objects through MultiMethodHandles is to offer transparent support for weakly coupled simulations, where, in the same process, two or more different linear systems are assembled by one or more SpaceMethods and must be solved one after the other.
SpaceMethod (SM) and LinearSystemSolver (LSS) are the collaborator Methods: a concrete ConvergenceMethod uses them polymorphically via MultiMethodHandles, without knowledge of their concrete type or their actual number.
As a result, concrete collaborator Methods are completely interchangeable with other ones with the same polymorphic type.
When In a parallel simulation, the norm of the residual is calculated with a collective operation, between the beginning and the end of the synchronization process, in order to overlap communication and computation and maximize the efficiency of the operation.When running serially, the implementation of synchAndComputeRes() remains unchanged, but in fact the calls to beginSync() and endSync() have no effect.
We // perform final assembly and ask PETSc to solve the system } Data stored in MeshData are allowed to cross the Method scope and can be used in Commands belonging to different Methods.The keys for the data exchange are the storage name and type, that must both match in all the method Commands that need the same data.
In parallel simulations, nothing changes, since the Commands implementing numerical algorithms work only with LOCAL data, as they would do in a serial run.Functions that demand access to the GLOBAL storage and perform some parallel action, like the above mentioned synchAndComputeRes(), are exceptional.
Furthermore, the example of the collaboration between ConvergenceMethod and LSS demonstrates the suitability of the MCS pattern to interface existing libraries, PETSc in this case, without exposing any detail of their actual implementation to their clients.
The Trilinos package [19] has also been successfully integrated in COOLFluiD by means of the MCS pattern.While the class definitions of the corresponding concrete linear system solver Method and the related Commands are basically similar to the Petsc's ones, the interface of TrilinosLSSData is defined as follows: To summarize, the application of the MCS pattern helps to encapsulate not only each numerical algorithm but also its collaborations with other algorithms.This contributes to enforce the multi-component-oriented character of the COOLFluiD framework, where each component can be replaced by another one with the same polymorphic type, without affecting the client code.

Conclusions
In the available literature related to OO simulation of PDEs, to the knowledge of the authors, it's difficult to find design solutions that can offer, at least in principle, a flexibility in the serial and parallel data handling, a run-time interchangeability between physics and numerics and a structural support to encapsulate generic numerical algorithms in a flexible and reusable way, comparable with the ones presented in this article.As a matter of fact, the proposed design and implementation ideas don't necessarily need to be employed within COOLFluiD, but they can be reused independently and in other scientific computing contexts, even unrelated to the solution of PDEs.
The flexible and reusable design solutions presented in this paper have allowed COOLFluiD to enlarge its target applications beyond the scope of solely Computational Fluid Dynamics.Several components have already been integrated in the framework: explicit (Runge-Kutta) and implicit (Newton, Crank-Nicholson, Three Point Backward) time stepping, different spatial discretizations (FV, RD, Space-Time RD, FE), different physical models (Euler, Navier-Stokes, ideal Magneto Hydro Dynamics, Linear Elasticity, Heat transfer, etc.), different linear system solvers wrappers (PETSc [2], Trilinos [19]), a parallel flexible data-structure supporting the use of hybrid meshes, etc.
The implementation of many other functionalities is underway: Aero-Thermo-Chemical models, incompressible plasma flows, error estimation, mesh movement and adaptation, loosely coupled fluid-structure interaction.

Fig. 3 .
Fig.3.Speed up results for two numerical simulations, one performed with FV and the other one with RD methods, and comparison with the ideal linear behaviour.

Fig. 4 .
Fig. 4. Perspective pattern applied to a Physical Model.

Fig. 5 .
Fig. 5. Example of perspective pattern applied to the Navier-Stokes Model.