A Multidimensional Data-Driven Sparse Identification Technique: The Sparse Proper Generalized Decomposition

cited. Sparse modelidentification by means of data is especially cumbersomeif thesoughtdynamics live in a high dimensional space.This usually involves the need for large amount of data, unfeasible in such a high dimensional settings. This well-known phenomenon, coined as the curse of dimensionality, is here overcome by means of the use of separate representations. We present a technique based on the same principles of the Proper Generalized Decomposition that enables the identification of complex laws in the low-data limit. We provide examples on the performance of the technique in up to ten dimensions.


Introduction
In recent years there has been a growing interest in incorporating data-driven techniques into the field of mechanics.While almost classical in other domains of science like economics, sociology, etc., big data has arrived with important delay to the field of computational mechanics.It is worth noting that, in our field, the amount of data available is very often no so big, and therefore we speak of data-driven techniques instead of big-data techniques.
Among the first in incorporating data-driven technologies to the field of computational mechanics one can cite the works of Kirchdoerfer et al. [1,2], or the ones by Brunton et al. [3][4][5].Previous attempts exist; however, to construct data-driven identification algorithms, see, for instance [6,7].More recently, the issue of compliance with general laws like the ones of thermodynamics has been also achieved, which is a distinct feature of data-driven mechanics [8].Other applications include the identification of biological systems [9] or financial trading [10], to name but a few.
The problem with high dimensional systems is that data in these systems is often sparse (due precisely to the high dimensional nature of the phase space) while the system has, on the contrary, low dimensional features-at least very frequently.Based on this, a distinction should be made between methods that require an a priori structure of the sampling points and others which do not require such a regularity.
Regarding the methods that need a rigid structure in the sampling points, the Nonintrusive Sparse Subspace Learning (SSL) method is a novel technique which has proven to be very effective [11].The basic ingredient behind such a technique is that the parametric space is explored in a hierarchical manner, where sampling points are collocated at the Gauss-Lobato-Chebychev integration points.Also, using a hierarchical base allows improving the accuracy adding more hierarchical levels without perturbing the previous ones.To achieve such hierarchical property, just the difference at a given point between the real function minus the estimated value, using the precedent hierarchical levels, is propagated.For more details about the method, the reader is referred to [11].However, in the high-dimensional case, this technique shows severe limitations, as will be detailed hereafter.

Complexity
On the other hand, nonstructured data-driven techniques are commonly based on Delaunay triangularization techniques, providing an irregular mesh whose nodes coincides with the sampling points.Afterwards, depending on the degree of approximation inside each one of the Delaunay triangles, it gives rise to different interpolation techniques; i.e., linear, nearest, cubic, and natural, among other techniques, are commonly used.Apart from techniques that depend on a given triangularization, it is worth mentioning Kriging interpolants as an appealing technique to provide response surfaces from nonstructured data points.The key ingredient behind such technique is that each sampling point is considered as a realization of a random process.Therefore, defining a spatial correlation function allows to infer the position of unknown points just like providing confidence intervals based on the distance to the measured points.Nevertheless, the calibration of the correlation matrix has an important impact in the performance of the method itself.
Kriging also possesses a very interesting property: it is able to efficiently filter noise and outliers.Therefore, it is expected that it also could help us in problems with noise in the data.
However, in high dimensional settings, all of the just mentioned techniques fail to identify the nature of the system due precisely to the curse of dimensionality.A recent alternative for such a system could be Topological Data Analysis (TDA), which is based on the use of algebraic topology and the concept of persistent homology [12].A sparse version of this technique also exists [13].
Hence, if a competitive data-driven identification technique is desired, such a technique should meet the following requirements: (i) Nonstructured data set: this characteristic provides versatility to the method.Indeed, when evaluating the response surface requiring a lot of computational effort, recycling previous evaluations of the response surface, which do not coincide with a given structure of the data, may be very useful.In addition, the SSL technique establishes sampling points at locations in the phase space with no physical meaning in an industrial setting.
(ii) Robustness with respect to high dimensionality: triangularization-based techniques suffer when dealing with multidimensional data just because a high dimensional mesh has to be generated.Nevertheless, the separation of variables could be an appealing technique to circumvent the problem of generating such a high dimensional mesh.
(iii) Curse of dimensionality: all previous techniques suffer when dealing with high dimensional data.For instance, the SSL needs 2  sampling points just to reach the first level of approximation.Thus, when dealing with high dimensional data ( > 10 uncorrelated dimensions) plenty of sampling points are required to properly capture a given response surface.
In what follows we present a method based on the concept of separate representations to overcome the curse of dimensionality.Such separate representation has previously been employed by the authors to construct a priori reducedorder modeling techniques, coined as Proper Generalized Decompositions [14][15][16][17][18][19][20].This will give rise to a sparse Proper Generalized Decomposition (s-PGD in what follows) approach to the problem.We then analyze the just developed technique through a series of numerical experiments in Section 4, showing the performance of the method.Examples in up to ten dimensions are shown.The paper is completed with some discussions.

A Sparse PGD (s-PGD) Methodology
. .Basics of the Technique.In this section we develop a novel methodology for sparse identification in high dimensional settings.For the ease of the exposition and, above all, representation, but without loss of generality, let us begin by assuming that the unknown objective function (, ) lives in R 2 and that is to be recovered from sparse data.As in previous references, see, for instance [21]; we have chosen to begin with a Galerkin projection, in the form where Ω ⊂ R 2 stands for the-here, still twodimensional-domain in which the identification is performed and  * (, ) ∈ C 0 (Ω) is an arbitrary test function.Finally, (, ) will be the obtained approximation to (, ), still to be constructed.In previous works of the authors [8] as well as in other approaches to the problem (e.g., [21]), this projection is subject to additional constraints of thermodynamic nature.In this work no particular assumption is made in this regard, although additional constraints could be imposed to the minimization problem.
Following the same rationale behind the Proper Generalized Decomposition (PGD), the next step is to express the approximated function   (, ) ≈ (, ) as a set of separate one-dimensional functions, The determination of the precise form of functional pairs   ()  (),  = 1, . . ., , is done by first projecting them on a finite element basis and by employing a greedy algorithm such that once the approximation up to order −1 is known, the new th order term is found by any nonlinear solver (Picard, Newton,. ..).
It is well-known that this approach produces optimal results for elliptic operators (here, note that we have in fact an identity operator acting on ) in two dimensions, see [14] and references therein.There is no proof, however, that this separate representation will produce optimal results (in other words, will obtain parsimonious models) in dimensions higher than two.In two dimensions and with  * =  * it provides the singular value decomposition of (, ) [15].Our experience, nevertheless, is that it produces almost optimal results in the vast majority of the problems tested so far.
It is worth noting that the product of the test function  * (, ) times the objective function (, ) is only evaluated at few locations (the ones corresponding to the experimental measurements) and that, in a general high dimensional setting, we will be in the low-data limit necessarily.Several options can be adopted in this scenario.For instance, the objective function can be first interpolated in the high dimensional space (still 2D in this introductory example) and then integrated together with the test function.Indeed, this will be the so-called PGD in approximation [15], commonly used when either (, ) is known everywhere and a separated representation is sought or if (, ) is known in a separated format but a few pairs  are needed for any reason.Under this rationale the converged solution (, ) tries to capture the already interpolated solution in the high dimensional space but in a more compact format.As a consequence, the error due to interpolation of experimental measurements on the high dimensional space will persist in the final separate identified function.
In order to overcome such difficulties, we envisage a projection followed by interpolation method.However since information is just known at  sampling points (  ,   ),  = 1, . . ., , it seems reasonable to express the test function not in a finite element context, but to express it as a set of Dirac delta functions collocated at the sampling points, giving rise to The choice of the test function  * (, ) in the form dictated by ( 4) is motivated by the desire of employing a collocation approach while maintaining the symmetry of standard Bubnov-Galerkin projection operation.
. .Matrix Form.Let us detail now the finite element projection of the one-dimensional functions   () and   (),  = 1, . . ., , (often referred to as modes) appearing in (2).Several options can be adopted, ranging from standard piecewise linear shape functions, global nonlinear shape functions, maximum entropy interpolants, splines, kriging, etc. Regarding the kind of interpolant to use, an analysis will be performed in the sequel.Nevertheless, no matter which precise interpolant is employed, it can be expressed in matrix form as where    and    ,  = 1, . . ., , represent the degrees of freedom of the chosen approximation.We employ N k as the most usual nomenclature for the shape function vector.It is important to remark that the approximation basis could even change from mode to mode (i.e., for each ).For the sake of simplicity we take the same number of terms for both   () and   (), namely, .
By combining (1), ( 2), ( 4), (6), and (7) a nonlinear system of equations is derived, due to products of terms in both spatial directions.An alternate direction scheme is here preferred to linearize the problem, which is also a typical choice in the PGD literature.Note that, when computing modes   (), the variation in the other spatial direction vanishes,  * () = 0, and vice versa.
In order to fully detail the matrix form of the resulting problem, we first employ the notation "⊗" as the standard tensorial product (i.e., b ⊗ c =     ) and define the following matrices For the sake of simplicity but without loss of generality, evaluations of the former operators at point (  ,   ) are denoted as Equations ( 10)-( 11) below show the discretized version of the terms appearing in the weak form, (1), when computing modes in the  direction.Again,  stands for the number of modes in the solution (, ) while  denotes the number of sampling points.
Hence, by defining allows to write a system of algebraic equations Exactly the same procedure is followed to obtain an algebraic system of equations for b  .This allows performing an alternating directions scheme to extract a new couple of   () and   () modes.
This formulation has several aspects that deserve to be highlighted: (1) No assumption about (, ) has been made other than assuming known its value at sampling points.Indeed, both problems of either interpolating or making a triangulation in a high dimensional space are circumvented due to the separation of variables.
(2) The operator M  is composed of  rank-one updates, meaning that the rank of such operator is at most .Furthermore, if a subset of measured points share the same coordinate   , the entire subset will increase the rank of the operator in one unity.
(3) The position of the sampling points will constrain the rank of the PGD operators.That is the reason why even if the possibility of having a random sampling of points is available, it is always convenient to perform a smart sampling technique such that the rank in each direction tends to be maximized.Indeed, the higher the rank of the PGD operator is, the more cardinality of a and b can be demanded without degenerating into an underdetermined system of equations.
There are plenty of strategies to smartly select the position of the sampling points.They are based on either knowing an a priori error indicator or having a reasonable estimation of the sought response surface.Certainly, an adaptive strategy based on the gradient of the precomputed modes could be envisaged.However, the position of the new sampling points will depend on the response surface calculated using the previous sampling points, making parallelization difficult.That is the reason why Latin hypercube is chosen in the present work.Particularly, Latin hypercube tries to collocate  sampling points in such a way that the projection of those points into  and  axis are as far as possible.
. .Choice of the D Basis.In the previous section, nothing has been specified about the basis in which each one of the one-dimensional modes was expressed.In this subsection, we will use an interpolant based on Kriging techniques.Simple Kriging has been used throughout history in order to get relatively smooth solutions, avoiding spurious oscillations characteristic of high order polynomial interpolation.This phenomena is called Runge's phenomenon.It appears due to the fact that the sampling point locations are not chosen properly; i.e., they will not be collocated, in general, at the Gauss-Lobato-Chebychev quadrature points.Kriging interpolants consider each point as a realization of a Gaussian process, so that high oscillations are considered as unlikely events.
Hence, by defining a spatial correlation function based on the relative distance between two points, D(  −   ) = D  , an interpolant is created over the separated 1D domain, where ( −   ) is a weighting function which strongly depends on the definition of the correlation function and the   coefficients are the nodal values associated to the   Kriging control points.Note that these control points are not the sampling points.We have chosen this strategy so as to allow us to accomplish an adaptivity strategy that will be described next.In the present work, these control points are uniformly distributed along the 1D domain.Although several definitions of the correlation function exist, a Gaussian distribution is chosen as where  is the variance of the Gaussian distribution.Several a priori choices can be adopted to select the value of the variance based on the distance between two consecutive control points, e.g.,  = ℎ √ ( +1 −   ) 2 .The magnitude of ℎ should be adapted depending on the desired global character of the support.To ensure the positivity of the variance, ℎ should be in the interval ]0, +∞[.Let us define now a set of  control points and the  sampling points Let us define in turn a correlation matrix between all control points and a correlation matrix between the control points and the sampling points as Under these settings, we define a weighting function for each control point and for each sampling point as where ( If we reorganize the terms in the same way that we did in the previous section to have a compact and close format of the shape function N   , we arrive to where each shape function is given by Figures 1 and 2 depict the appearance of the simple Kriging interpolants using 7 control points uniformly distributed along the domain, for ℎ = 1 and ℎ = 1/3, respectively.It can be highlighted that both the Kronecker delta (i.e., strict interpolation) and the partition of unity properties are satisfied for any value of ℎ.Moreover, it is worth noting that the higher the variance the correlation function has, the more global the shape functions are.Furthermore, it is known that 99 per cent of the probability of a Gaussian distribution is comprised within an interval of [ − 3,  + 3], being  the mean value of the distribution.This issue explains perfectly well why the support of each Gaussian distribution takes 2 elements for the case, where ℎ = 1/3.Indeed, the shape of the interpolants is quite similar to standard finite element shape functions, but with a Gaussian profile.The remaining 1 per cent of probability is comprised in the small ridges happening in the middle of the elements.
In light of these results, a family of interpolants based on Kriging can be easily created just selecting the value of the variance within the correlation function.Therefore, globality of the support can be easily adjusted always under the framework of the partition of unity. . .Modal Adaptivity Strategy.In a standard PGD framework, the final solution is approximated as a sum of  modes or functional products; see (2).Each one of the separated modes must be projected onto a chosen basis to render the problem finite dimensional.A standard choice is to select the same basis for each one of the modes: Despite of the fact that this choice seems reasonable, when dealing with nonstructured sparse data, it may not be such.In the previous section we proved that the rank of the separated system strongly depends on the distribution of the data sampling.Therefore, the cardinality of the interpolation basis must not exceed the maximum rank provided by the data sampling.Indeed, this constraint, which provides an upper bound to build the interpolation basis, only guarantees that the minimization is satisfied at the sampling points, without saying anything out of the measured points.Hence, if sampling points are not abundant, in the limit of low-data regime, high oscillations may appear out of these measured points.These oscillations are not desirable since the resulting prediction properties of the proposed method could be potentially decimated.
In order to tackle this problem, we take advantage of the residual-based nature of the PGD.Indeed, the greedy PGD algorithm tries to enrich a solution composed by  modes, just by looking at the residual that accounts for the contribution of the previous modes, as shown in 8).
Therefore, an appealing strategy to minimize spurious oscillations out of the sampling points is to start the PGD algorithm looking for modes with relatively smooth basis (for instance, Kriging interpolants with a few control points).Therefore, an indicator in order to make an online modal adaptive strategy is required.In the present work, we use the norm of the PGD residual, where P is the set of  measured points and (, ) is the function to be captured.
In essence, when the residual norm stagnates, a new control mesh is defined, composed by one more control point and always uniformly distributed, following By doing this, oscillations are reduced, since higher-order basis will try to capture only what remains in the residual.Here,   is a tolerance defining the resilience of the sPGD to increase the cardinality of the interpolation basis.The lower   is, the more resilient method is to increase the cardinality.
To better understand the method, we will quantify the error for two set of points: the first set is associated with the sampling points, P, where (  ,   ) is assumed not to vanish and where L also includes points other than the sampling points.This is done in order to validate the algorithm, by evaluating the reference solution-which is a priori unknown in a general setting-at points different to the sampling ones, Since the s-PGD algorithm minimizes the error only at the sampling points P it is reasonable to expect that E P ≤ E L .

. . A Preliminary Example.
To test the convergence of the just presented algorithm, we consider which presents a quite oscillating behavior along the  direction, whereas the  direction is quadratic.We are interested in capturing such a function in the domain Figures 3 and 4 show the errors E P and E L in identifying the function  1 (, ).In this case, we consider two distinct   possibilities: no modal adaptivity at all and a modal adaptivity based on the residual, respectively.Several aspects can be highlighted.The first one is that E P (asterisks) decreases much faster when there is no modal adaptivity.This is expected, since we are minimizing with a richer basis since the very beginning, instead of starting with smooth functions like in the residual based approach.However, even if the minimization in the sampling points is well achieved, when no modal adaptivity is considered, the error out of the sampling points may increase as the solution is enriched with new modes.Nevertheless, the residual-based modal adaptivity alleviates this problem.As it can be noticed, starting with relatively smooth functions drives the solution out of the sampling points to be smooth as well, avoiding the problem of high oscillations appearing out of the sampling points.

A Local Approach to s-PGD
It is well-known that, as in POD, reduced basis or, in general, any other linear model reduction technique, PGD gives poor results-in the form of a nonparsimonious prediction-when the solution of the problem lives in a highly nonlinear manifold.Previous approaches to this difficulty included the employ of nonlinear dimensionality reduction techniques such as Locally Linear Embeddings [22], kernel Principal Component Analysis [23,24] or isomap techniques [25].Another, distinct, possibility, is to employ a local version of PGD [18], in which the domain is sliced so that at every subregion PGD provides optimal or nearly optimal results.We explore this last option here for the purpose of sparse regression, although a bit modified, as will be detailed hereafter.
The approach followed herein is based on the employ of the partition of unity property [26,27].In essence, it is well-known that any approximating function (like finite element shape functions, for instance) that forms a partition of unity can be enriched with an arbitrary function such that the resulting approximation inherits the smoothness of the partition of unity and the approximation properties of the enriching function.
With this philosophy in mind, we proposed to enrich a finite element mesh with an s-PGD approximation.The resulting approximation will be local, due to the compact support of finite element approximation, while inheriting the good approximation properties, already demonstrated, of s-PGD.In essence, what we propose is to construct an approximation of the type where I represents the node set in the finite element mesh, I enr the set of enriched nodes,   are the nodal degrees of freedom of the mesh, I  is the number of finite elements covered by node  shape function's support and    () and    () functions are the -th one-dimensional PGD modes enriching node , that in fact constitute an enriching function  enr (, ).
Of course, as already introduced in Eqs. ( 6) and ( 7), every PGD mode is in turn approximated by Galerkin projection on a judiciously chosen basis.In other words, with a   and b   the nodal values describing each onedimensional PGD mode.
In this framework, the definition of a suitable test function can be done in several ways.As a matter of fact, the test function can be expressed as the sum of a finite element and a PGD contribution, so that an approach similar to that of Eq. ( 4) can be accomplished.
An example of the performance of this approach is included in Section 4.4.

Numerical Results
The aim of this section is to compare the ability of sparse model identification for different interpolation techniques.On one hand, the performance of standard techniques based on Delaunay triangulation such as linear, nearest neighbor or cubic interpolation is compared.Even though these techniques are simple, they allow to have a nonstructured sampling point set since they rely on a Delaunay triangulation.On the other hand, the results are compared to the Sparse Subspace Learning (SSL) [11].The convergence and robustness of this method is proven to be very effective since the points are collocated at the Gauss-Lobato-Chebychev points.However, two main drawbacks appear considering this method.The first one is that there is a high concentration of points in the boundary of the domain, so that this quadrature is meant for functions that vary mainly along the boundary.Indeed, if the variation of the function appears in the middle of the domain, many sampling points will be required to converge to the exact function.The second one is that the sampling points have to be located at specific points in the domain.The s-PGD method using simple Kriging interpolants will be compared as well.
The numerical results are structured as follows: first two synthetic 2D functions are analyzed; secondly, two 2D response surfaces coming from a thermal problem and a Plastic Yield function are reconstructed; finally, a 10D synthetic function is reconstructed by means of the s-PGD algorithm.
. .D Synthetic Functions.The first considered function is  1 (, ), as introduced in the previous section.Figure 5 shows the reconstruction error (E L ) of  1 (, ) for different sampling points.As it can be noticed, the s-PGD algorithm performs well for a wide range of sampling points.Nevertheless, the SSL method is the one presenting the lower error level when there are more than 150 sampling points.
A second synthetic function is defined as  2 (, ) = cos (3) + log ( +  + 2.05) + 5. ( This function is intended to be reconstructed in the domain relatively smooth in the center of the domain, whereas the main variation is located along the boundary of the domain.Indeed, this function is meant to show the potential of the SSL technique. Figure 6 shows the reconstruction error of the  2 (, ) function for different interpolation techniques.As it can be noticed, both SSL and s-PGD methods are the ones that present the best convergence properties.If the number of points is increased even more, the SSL method is the one that presents the lowest interpolation error.They are followed by linear and natural neighbor interpolations.Finally, the nearest neighbor method is the one presenting the worst error for this particular case.

. . D Response Surfaces Coming from Physical Problems.
Once the convergence of the methods have been unveiled for synthetic functions, it is very interesting to analyze the power of the former methods by trying to identify functions that are coming from either simulations or models popular in the computational mechanics community.Indeed, two functions will be analyzed: the first one is an anisotropic Plastic Yield function, whereas the second one is a solution coming from a quasistatic thermal problem with varying source term and conductivity.
Figure 7 shows the Yld2004-18p anisotropic plastic yield function, defined by Barlat et al. in [28].Under plane stress hypothesis, this plastic yield function is a convex and closed surface defined in a three-dimensional space.Therefore, the position vector of an arbitrary point in the surface can be easily parameterized in cylindrical coordinates as (,   ).The (,   ) function for the Yld2004-18p is shown in Figure 8, where anisotropies can be easily seen.Otherwise, the radius function will be constant for a given   .
Figure 9 shows the error in the identification of Barlat's plastic yield function Yld2004-18p.As it can be noticed, the s-PGD technique outperforms the rest of techniques.Indeed, the s-PGD is exploiting the fact that the response surface is highly separable.
As mentioned above, the second problem is the sparse identification of the solution of a quasistatic thermal problem modeled by and the source term varies in time.Homogeneous Dirichlet boundary conditions are imposed at both spatial boundaries and no initial conditions are required due to quasistationarity assumptions.
Figure 10 shows the evolution of the temperature field as a function of space time for the set of (33)-(35).It can be noticed how the variation of the temperature throughout time is caused mainly due to the source term.However, conductivity modifies locally the curvature of the temperature along the spatial axis.Symmetry with respect the  = 0 axis is preserved due to the fact that the conductivity presents a symmetry along the same axis.
Figure 11 shows the performance of each one of the techniques when trying to reconstruct the temperature field from certain sampling points.As can be noticed, the s-PGD in conjunction with Kriging interpolants is the one that presents the fastest convergence rate to the actual function, which is considered unknown.It is followed by linear and natural interpolations.The SSL method presents a slow convergence rate in this case, due to the fact that the main variation of the function (, ) is happening in the center of the domain and not in the boundary. . .A D Multivariate Case.In this subsection, we would like to show the scalability that s-PGD presents when dealing with relatively high-dimensional spaces.Since our solution is expressed in a separated format, an  dimensional problem () is solved as a sequence of  1D problems, which are solved using a fixed-point algorithm in order to circumvent the nonlinearity of the separation of variables.
The objective function that we have used to analyze the properties of the s-PGD is defined as with  = 10 in this case.
Figure 12 shows the error convergence in both sampling points (E P , asterisks) and points out of the sampling (E L , filled points).The L data set was composed by 3000 points, and the P data subset for the s-PGD algorithm was composed by 500 points.The number of points required to properly capture the hypersurface has increased with respect to the 2D examples due to the high dimensionality of the problem.Special attention has to be paid when increasing the cardinality of the interpolant basis without many sampling points, because the problem of high oscillations outside the control points may be accentuated. .

. An Example of the Performance of the Local s-PGD.
The last example corresponds to the sparse regression of an intricate surface that has been created by mixing three different Gaussian surfaces so as to generate a surface with   no easy separate representation (a nonparsimonious model, if we employ a different vocabulary).The appearance of this synthetic surface is shown in Figure 13.
The sought surface is defined in the domain Ω = [0, 1] 2 , which has been split into the finite element mesh shown in Figure 14.Every element in the mesh has been colored according to the number of enriching PGD functions, ranging from a single one to four.The convergence plot of this example as a function of the number of PGD modes added to the approximating space is included in Figure 15.

Conclusions
In this paper we have developed a data-based sparse reducedorder regression technique under the Proper Generalized Decomposition framework.This algorithm combines the robustness typical of the separation of variables together with properties of collocation methods in order to provide with parsimonious models for the data at hand.The performance of simple Kriging interpolation has proven to be effective when the sought model presents some regularity.Furthermore, a modal adaptivity technique has been proposed in order to avoid high oscillations out of the sampling points, characteristic of high-order interpolation methods when data is sparse.For problems in which the result lives in a highly nonlinear manifold, a local version of the technique, which makes use of the partition of unity property, has also been developed.This local version outperforms the standard one for very intricate responses.
The s-PGD method has been compared advantageously versus other existing methods for different example functions.Finally, the convergence of s-PGD method for a high dimensional function has been demonstrated as well.
Although the sparsity of the obtained solution could not seem evident for the reader, we must highlight the fact that the very nature of the PGD strategy a priori selects those terms on the basis that it plays a relevant role in the approximation.So to speak, PGD algorithms automatically discard those terms that in other circumstances will be weighted by zero.Sparsity, in this sense, is equivalent in this context to the number of sums in the PGD separated approximation.If only a few terms are enough to reconstruct the data-as is almost always the case-then sparsity is guaranteed in practice.
Sampling strategies other than the Latin hypercube method could be examined as well.This and the coupling with error indicators to establish good stopping criteria constitute our effort of research at this moment.In fact, the use of reliable error estimators could even allow for the obtention of adaptive samplings in which the cardinality of the basis could be different along different directions.

Figure 13 :Figure 14 :Figure 15 :
Figure 13: A synthetic surface generated by superposition of three different Gaussians, that is to be approximated by local s-PGD techniques.