From the Kalman Filter to the Particle Filter: A Geometrical Perspective of the Curse of Dimensionality

The aim of this contribution is to provide a description of the difference between Kalman filter and particle filter when the state space is of high dimension. In the Gaussian framework, KF and PF give the same theoretical result. However, in high dimension and using finite sampling for the Gaussian distribution, the PF is not able to reproduce the solution produced by the KF. This discrepancy is highlighted from the convergence property of the Gaussian law toward a hypersphere: in high dimension, any finite sample of a Gaussian law lies within a hypersphere centered in the mean of the Gaussian law and of radius square-root of the trace of the covariance matrix. This concentration of probability suggests the use of norm as a criterium that discriminates whether a forecast sample can be compatible or not with a given analysis state. The contribution illustrates important characteristics that have to be considered for the high dimension but does not introduce a new approach to face the curse of dimensionality.


Introduction
From the work of Bengtsson et al. [1] and Snyder et al. [2], the particle filter is known to suffer from the "curse of dimensionality" [3,4], in the sense that when the problem is in high dimension, the size of the particle ensemble should be exponentially large leading to the failure of direct Monte-Carlo strategies.
More precisely, in the particle filter, a weight is computed for each particle from the observational error distribution.This weight measures the proximity of the particle to a given observations set.If the ensemble size does not follow an exponential increase with the dimension, then the maximum weight is systematically close to 1, meaning that only one particle is compatible with the observations (or at least the less far from the observation considering the other particles): this leads to the particle degeneracy [5].While the observation error distribution is often assumed Gaussian in data assimilation, a change of observation error distribution by a larger tail distribution may be introduced to limit the particle degeneracy in realistic application [6].But this replacement should not be used for theoretical study where it would introduce incoherence between the way the observations are supposed to be and the way they are assimilated.Moreover, Snyder et al. [2] have shown that the particle degeneracy still occurs with larger tail distributions, for example, for the Cauchy distribution.
In the demonstration proposed by Snyder et al. [2], to study the particle filter, the curse of dimensionality is described in terms of weight, related to the observational space.Hence, it is still difficult to understand what happens in the state space, and also what is the real difference between the ensemble version of the Kalman equations (EnKF), introduced by Evensen [7], and the particle filter approach (PF), introduced by Gordon et al. [8].This question is not to know whether the EnKF converges toward the PF in the general framework, where known results are existing for this issue: (a) the PF is known to converge toward the nonlinear filter in the limit of large number of particles in interaction, Del Moral [9]; (b) the EnKF is known to converge toward a different solution than the nonlinear filter [10,11].How can we feel the paradox that the two algorithms, the EnKF and the PF under Gaussian assumption, should deliver the same conclusion while, in the practical high dimension case, they do not? 2

Advances in Meteorology
In the present contribution, important characteristics are illustrated that have to be considered for the high dimension, but no new approach is introduced to face the curse of dimensionality.This work investigates another point of view that relies on the multidimensional spheres or hyperspheres.A similar but different approach has been considered by Chorin and Morzfeld [12].This perspective, mentioned in the conclusion of Snyder et al. [2], helps to understand in a direct way the effect of the dimension on the difference between the Kalman filter and the particle approximation of the Bayes rule under Gaussian assumption.The main result used in this work is that a normalized Gaussian law in high dimension is not a simple extension of higher dimension of the usual scatter plot in 2d but converges toward the uniform law on a sphere in  dimension, a phenomenon known as the Poincaré Lemma [13].This phenomenon occurs even from the "high dimension" 100 that is very small compared with the O(10 7 ) degrees of freedom encountered in geophysical applications: the constraints due to the dimension are quite strong for this area.
The understanding of the behaviour in high dimension is required when using alternative to classical data assimilation to update probabilistic information contained within an ensemble.Such an update can be motivated in order to increase the ensemble size by merging lagged ensemble [14,15] with the question of how to merge ensemble from different forecast times [16].This can also be due to the real time constraints: ensemble prediction system can now rely on an analysis ensemble [17], but in practice there can be an important delay between the operational analysis production (that relies on Kalman filter formalism) and the ensemble analysis production.As a result an analysis state exists that could be used to update the last forecast ensemble available, for example, by using the Particle filter approach [18,19] where an adaptation of the metric in the computation of the weight is also introduced in the application of the particle filter strategy.This kind of situation is a motivation to clarify the differences between Kalman filter and particle filter, at least under Gaussian assumption.
To tackle this issue we first recall the behaviour of Gaussian distribution in high dimension in Section 2. In Section 3, we review the description of the filtering theory under Gaussian assumption, then detail the asymptotic consequences of the high dimension, and provide the constraints implied onto the background and the analysis distribution that constitutes the difference between the EnKF and the PF.The conclusions are reported in Section 4.

Concentration of Gaussian Law in High Dimension.
Human intuition of the high dimension behaviour is often an extrapolation of the low dimension experiences.This leads us to assume that graphical representations in low dimensions are also verified in high dimensions.This is particularly true for Gaussian distribution where the similitude of scatter plots in dimensions lower than 3 invites us to speculate that the distribution in high dimensions should not be so different (see Figure 1).But actually this is completely wrong and leads to a misleading interpretation of what a Gaussian distribution is really, despite its intensive use in large systems as encountered in geophysical sciences and especially in data assimilation.
If   ∈ R  is a random Gaussian vector with zero mean and covariance matrix B (in data assimilation this   is the statistical model for the background error), denoted by   ∼ N(0, B), then the asymptotic distribution of the norm |  | is given by (see Appendix A) where tr(⋅) denotes de trace operator and where is the effective dimension that provides a quantitative measure of the dimensionality [20].This simple formula implies a deep contradiction between the natural intuition and the reality of Gaussian laws.But at the same time it provides the correct intuition as now detailed.

Asymptotic Convergence of Normal
Law toward a Spherical Shell.For the particular case where   ∼ N(0, I) is a normal law (reduced and centered Gaussian law), the effective dimension is  ef (I) =  and then ) . ( This distribution means that, in high dimension, all samples of   are concentrated within a spherical shell of radius √ and thickness 1/ √ 2 ≈ 0.71.Note that the thickness of the shell is independent of the dimension  for the norm |  | (while this is not true for |  | 2 ).As a simplified representation, Figure 2 mimics the behaviour of the Gaussian sample as the dimension  increases.For each panel, a sample  = ( 1 , . . .,   )  of the normal law N(0, I) is represented by the point (, ) = ||(cos , sin ), where  = arg() is the argument of the complex number  = ||  =  1 +  2 formed from the first two components of  (here  2 = −1).For  = 2 this representation verifies (, ) = ( 1 ,  2 ), that is, the usual dispersion of Gaussian sample (see Figure 2 It is impressive to realise the absence of any sample within the spherical shell.The position of a sample seems binary: either it is at the right distance in the state space, or this is not really a sample of the Gaussian that was supposed to be sampled.This simplified illustration provides us with a graphical tool that supports our intuition of what are normal Gaussian laws in high dimensions.Now we develop what happens for general Gaussian law by taking into account the possible correlation between components of the random vector.

Asymptotic Convergence of Gaussian Law toward a Spherical Shell.
To illustrate the behaviour of Gaussian distribution   ∼ N(0, B) in high dimension one first needs to construct a nontrivial covariance matrix B and then to generate samples of   .
In order to construct a correlation matrix B, similar to the one encountered in data assimilation, we first consider an earth great circle of radius  = 6400 km, discretized with  = 1000 grid points   =   with  = (1/)2 ≈ 40 km.Then, following the background covariance matrix modelling based on the diffusion equation [21,22], which produces Gaussian correlation functions, a square-root matrix B 1/2 is specified as such as where   is the correlation length-scale [23].The resulting matrix B = B 1/2 B /2 is a covariance matrix where the diagonal terms representing the variance are all set to 1 provided that tr(B) = .This theoretical one-dimensional setting allows generating and computing all the quantities required from now.As specified above, B is a correlation matrix and some correlation functions are represented, in Figure 3(a), for the varying length-scale values   ∈ {10, 100, 500, 1000} km: the larger the length-scale value, the broader the correlation function.The effective dimension  ef (B) = tr(B) 2 / tr(B 2 ) (Figure 3(b)) illustrates that the larger the length-scale value, the lower the number of significative principal direction of B: the effective dimension is equal to 1000 (i.e., ) for   = 10 km, while it is below 100 for length-scale larger than   = 500 km.
In practice, samples of   can be obtained as the transformation   = B 1/2  of samples of random vector following a normal law  ∼ N(0, I).
Similarly to the schematic representation of Figure 2, we illustrate the distribution of   = 6400 samples of   ∼ N(0, B) in Figure 4.For the short length-scale   = 10 km, the grid points are almost decorrelated and the samples are uniformly distributed within a spherical shell of radius √tr(B) = √ and thickness (1/ √ 2)(√tr(B)/√ ef (B)) = 1/ √ 2, since  ef (B) =  for this length-scale (see Figure 3).When the length-scale increases, the correlation between the first two components   1 and   2 of   increases.For   = 100 km, the correlation is higher than 0.9 and the repartition of sampling points within the spherical shell becomes heterogeneous, with a concentration along the first principal diagonal.This heterogeneity is even more pronounced for   = 500 km, and it appears to be concentrated onto a subspace of an equatorial plane: this corresponds to the concentration of the sampling points within a spherical shell of the same radius √tr(B) = √ but within a subspace of lower dimension (the effective dimension is the quantitative approximation of this dimension).
While the radius of the spherical shell remains at the fixed value √tr(B), its thickness increases.This is the consequence of the effective dimension diminution as understood by the asymptotic variance formula (1/ √ 2)(√tr(B)/√ ef (B)) = (1/ √ 2)(√/√ ef (B)): if  ef (B) decreases, then 1/ ef (B) increases leading to an increase of the variance.
Hence, as the effective dimension decreases, the Gaussian distribution converges toward another Gaussian law of the same total variance but in a lower dimensional linear subspace.The graphical representation of the Gaussian law as a spherical shell is still verified.
Note that, in the continuous limit where the dimension tends to infinity (with vectors tending to functions), the above discussion remains true until the covariance matrix converges toward a covariance operator of finite trace.In particular, the continuous limit of the reduced and normalized Gaussian vector, N(0, I), does not exist since it would imply an infinite trace operator (the radius √ of its hypersphere increases to infinity).But the continuous limit of the correlated Gaussian vector N(0, B) exists (the trace of the limit operator is finite): for   > 0, the Fourier spectrum, which is also Gaussian, is summable.Figure 4(b) helps to interpret the finiteness of the covariance operator trace: all the pertinent information about the statistics is contained within a subspace of finite dimension in the functional space.

Advances in Meteorology
From these results, the consequences of data assimilation are now described.

Consequences of the Concentration of the Measure in Data Assimilation
Data assimilation faces a paradox, recognized as resulting from the curse of dimensionality [2]: while the particle filter and the ensemble Kalman filter are supposed to be two possible algorithms for the discretization of the nonlinear filtering under Gaussian assumption, the particle filter fails to produce the a posteriori distribution.This issue and the consequences of the concentration of Gaussian law are now explored.

Nonlinear Filtering.
Data assimilation aims to provide the probability of the real state of a system knowing the past/present forecast and observations.The general framework formalism is, without any assumption of linearity of dynamics or Gaussianity of statistics, reduced to the Bayes rule where (X  ) is the (density of) probability distribution to find the true state X   at time  in the vicinity of is the (density of) probability distribution to measure Y   at time  when the true state X   is known.This corresponds to the analysis step (considering the Bayesian data assimilation setting).The vector space of X (Y  ) is the state space (observational space), denoted by R  (R  ) where  () is the dimension.
The forecast step transports the conditional distribution from time  to time  + 1, by using the model propagator X +1 = M +1← (X  ).Since we are interested in comparing the respective analysis procedure proposed by the EnKF and by the PF, the forecast step is no more detailed.

Gaussian Assumption and Concentration Consequences.
Under the usual additional Gaussianity assumption, the two distributions (X  ) and (Y   | X  ) take the form where X   is the background state at time .   = X   − X   is the background error modelized as a Gaussian random vector of zero mean and covariance matrix E[       ] = B  and with where H  is the observational operator that maps the state space into the observational space.In general, an observational operator is a nonlinear operator, but, here, it is assumed to be linear. where is the analysis error that is a Gaussian random vector of zero mean and covariance matrix where is the gain matrix.Direct consequences of the concentration of Gaussian law in high dimensions are that according to (1) Other consequences of the concentration property may be obtained, for example, for the observational variance tuning (see Appendix B).Now we consider the discretization of the nonlinear filtering as it should be observed when Gaussian assumptions are verified.

Discretization of the Nonlinear Filtering. Under Gaussian assumption, a discrete version of the prior distribution 𝑝(X 𝑞
) is given by an ensemble of samples (X  , ) ∈[1,  ] , such that where   , are independent sample of the Gaussian law N(0, B  ).The empirical distribution is given by    (X  ) = (1/  ) ∑  (X  −X  , ).The convergence of the empirical distribution toward the distribution (X  ) has to be understood as the weak convergence, that is, for all  being of polynomial type, lim   →∞    () =   (), where From the asymptotic concentration of Gaussian law in high dimension, it results that A numerical experiment supports this asymptotic distribution as illustrated in Figure 5.At first order, the histogram of the normalized quantity fits very well the theoretical normal law, whatever the length-scale   .As a result, the background samples X  , are all distributed within a spherical shell of radius √tr(B  ) and of thickness √(1/2)(tr(B  )/ ef (B  )).One can notice a slight asymmetry as the length-scale increases (see Figures 5(c) and 5(d)).This is related to the asymmetry of the chi-squared distribution which is weakly approximated by a Gaussian distribution (see Appendix A).This approximation is not accurate for the small dimension sizes as encountered for   = 500 km and   = 1000 km, where the effective dimensions are of orders  ef (500 km) ≈ 40 and  ef (1000 km) ≈ 20 (see Figure 3(b)).
Note that ( 13) is useful to understand the effect of the localization based on the Schur product, often used in EnKF to limit the spurious long distance correlations resulting from the sampling noise in the background covariance matrix estimation [24].In this framework, a localized background covariance matrix B loc  is built from the element-wise product, ∘, of a correlation matrix C with the ensemble estimated background covariance matrix B   ; that is, Since C is a covariance matrix, its diagonal coefficients are all equal to 1; as a result the traces of B loc  and B   are equal.Moreover, since the correlation functions in C are chosen as compact support functions, the effective dimensions are different with  ef (B loc ) >  ef (B  ).Hence, from (13), the variance associated with the localized matrix is smaller.As an illustration from Figure 5, the effect of the localization is equivalent to the transformation from the distribution in Figure 5(d) to the distribution in Figure 5(a).Thereafter, the question of the localization is no more considered.
At this stage, there is no difference between the EnKF and the PF strategies, and the difference takes place in the way the Bayes rule is applied.

Analysis
Step in the EnKF.In this section we do not detail the practical implementation of the EnKF but only the formalism (see, e.g., Houtekamer and Mitchell [24]; Evensen [25] for practical aspects).
For the EnKF, the analysis ) is computed, and when using a "perturbation of observations" method [26], an ensemble of analysis perturbations is generated according to where   , is an ensemble of independent observational errors, randomly generated from the true covariance matrix R  as   , = R 1/2    with the random normal samples   ∼ N(0, R  ).From computation [26], it follows that The analysis ensemble is then constructed as and then, from the application of asymptotic concentration of Gaussian law, Similarly to the validation of the asymptotic distribution (13), a numerical experiment supports the asymptotic distribution (17) as illustrated in Figure 6.In this experiment, an observation is placed at every 3 grid points, and the observational covariance matrix R  is set as R  =  2  I  , where   = 1 and with  = /3 being the dimension of the observational space.The matrix operations (trace, product,. ..) are computed directly within the numerical setting.At first order, the histogram of the normalized quantity fits very well to the theoretical normal law, whatever the length-scale   is.One can again notice a slight asymmetry as the length-scale increases (see Figures 6(c) and 6(d)).
We observe that the ensemble of analysis samples X  , are all distributed within a spherical shell of radius √tr(A  ) and of thickness √(1/2)(tr(A  )/ ef (A  )).As expected, this shows that, from the distance view, the covariance matrix of the ensemble X  , is A  .The EnKF transforms an ensemble of background samples X  , distributed on a spherical shell of radius √tr(B  ) into an ensemble of analysis samples X  , distributed on a spherical shell of radius √tr(A  ).This is the exact solution of the Bayes rule as supported by the numerical experiment.Hence, the statistics of the two distances appear to be a strong constraint; the strength increases with the dimension of the problem.This is the main property we want to investigate in order to explore the difference between the EnKF and the PF.
The analysis step for the PF is now detailed.

Analysis
Step in the PF.Several strategies exist for the particle discretization of the Bayes rule.However, as far as we know, most of these variants are suffering from the curse of dimensionality, for example, the importance sampling version as described by Snyder [27].Hence, in this note, only the bootstrap filter is considered.The analysis step in the PF [5,6,8,9] follows a quite different way as it directly relies on the Bayes rule: for each member X  , one computes the weight from which the a posteriori distribution is deduced as where (X) stands for the Dirac distribution positioned in 0.
For the bootstrap particle filter [8], an ensemble of analyses X  , is generated as   random samples of the distribution    .
Hence, the difference between the EnKF and the PF is that, for the EnKF, background members are corrected to build analysis members, while, for PF, the analysis ensemble is a resampling of the background ensemble using the a posteriori weight.A natural question is to know when a background sample is an analysis sample of the particle filter.This is now addressed.

Compatibility Relation and Minimum Ensemble
Size.We want to characterize in which case a background sample X  , can be considered as an analysis sample X  , .This is achieved by giving some compatibility relations that are described in the first subsection.This is followed by two subsections that lead to formulating a constraint on the ensemble size.

Compatibility Relations.
As shown for the EnKF, all the analysis members X  , are at a given fixed distance √tr(A  ) from the analysis X   .More precisely, since the distance |X  , − X   | asymptotically follows the Gaussian distribution (17), the maximum value  +   , , that the distance can reach for an ensemble size of   is given by (see Appendix C) From the EnKF equations, X  , − X   =   , − K  (Y   − H  X   ), where it results that X  , − X   is the random vector where a quantity that depends on the analysis increment X   .Hence, for a background sample X  , to be likely an analysis sample X  , , the following distance inequality should necessarily be verified: If not, the two sample distributions of X  , and X  , are so different that no background sample can be retained as an Distance to analysis state element of the analysis samples.Of course, this result only occurs for finite sample distributions.In finite dimension, if the size of ensemble is infinite, then the EnKF and the PF for this Gaussian case provide the same a posteriori distribution, corresponding to  −   , , (X   ) = 0 and  +   , , = +∞.This trade-off is now illustrated from a numerical experiment in Figure 8 implies almost surely the existence of a background sample X  , that can be considered as being an analysis sample.This means that, within a PF, a point X  , , as ( 24) is verified, can be observed.However, this is no more the case for higher lengthscale values where one observes that  +   , , <  −   , , .The consequence is that the number of samples,  −  , required in order to inverse the inequality is the integer for which the equality applies; that is, Advances in Meteorology Hence,  −  is an exponentially large number, function of the analysis increment X   .Note that this is also a function of B  because of ( 22) and the dependence of A  on B  .
The dependence in X   can be removed considering a climatological framework, as follows.
3.6.2.Climatological Approximation.In order to eliminate the dependence on the analysis increment, an average estimation of the minimum ensemble size  −  (X   ) can be computed under the additional assumption that the statistics are sample of a climatology, eliminating the bottom index .If B (R) denotes the climatological background (observational) error covariance matrix, then we set B  = B and R  = R.Hence, the analysis increment of a day is the random Gaussian vector leading, in high dimension, to the asymptotic distribution Moreover, the average values  ,  ( 2 ,  ) of    , , ( 2   , , ) can be deduced from the computation and they are given by Hence, the average value  − of  −  is reduced to From general properties of Gaussian law, more than 99% of samples of |X   −X  | ∼ ≫1 N( √ tr A, (1/2)(tr A/ ef (A))) are within √ tr A ± 3√(1/2)(tr A/ ef (A)); another ensemble size  99% can be deduced, which corresponds to an ensemble size which is able to cover the analysis sample with more than 99% of probability; it writes (see Appendix C) This ensemble size may be considered as an upper bound for the maximal size required to sample the analysis distribution with the PF from the weighting and the resampling strategy.The ensemble size  − (solid line with diamond) and  99% (solid line with triangle) are represented in Figure 9 as a function of the length-scale.The solid line with no mark represents the ensemble size of 6400.In this experiment, the ensemble size is maximum for medium length-scale values.For the small length-scale   = 10 km, when the length-scale is very small, the minimal ensemble size  − is below 6400.This relatively small value is due to the observational network: here only one per three grid points is observed; hence when there is no correlation in the background, the posterior distribution that is provided from the few observations is not so different from the prior distribution.When the lengthscale is very large, for example,   ∈ {500, 1000} km, then the effective dimension is small (see Figure 3), and only a relatively small ensemble size is required.The case   = 100 km is intermediate; the background error correlation implies a coupling between the points while the effective dimension remains important.Note that these results are in accordance with the one presented in Figure 8.  99% follows a similar behaviour, but the gap with  − is not constant and varies with the length-scale.
The magnitude reached by the minimal ensemble size  − reminds one of the limits of the Monte-Carlo strategy when using the PF algorithm and the pitfall it presents.

Illustration of the Hyperspheres and Orthogonality
Relation.From averaged values of ( 27) and ( 29 Figure 10.On all these panels, the + symbol denotes the background X   and × symbol denotes the average position of the analysis state X   (the average value of |X   − X   | is √tr(KHB)).The hypersphere representing the typical position of samples of the background distribution N(X   , B  ) is the circle centered in X   and of radius √tr(B  ), denoted by C 1 = C(X   , √tr(B  )).The typical position of samples of the analysis distribution N(X   , A  ) is the circle with triangle centered in X   and of radius √tr(A  ), denoted by C 2 = C(X   , √tr(A  )).The circle with diamond represents the hypersphere, C 3 = C(X   , √tr[(I + K  H  )B  ]), that contains the typical position deduced from the distribution of |X   − X  , |.Since X  , should be typical of C 1 and C 2 , all the typical positions of the point X  , view from the point X   does not lie onto the whole hypersphere but only at the intersection of the two hyperspheres C 1 and C 3 , that is, a hypersphere contained in the equatorial plane of C 1 (here, the hypersphere of intersection, centered in X   , is reduced to the two points of intersection).In particular, it appears that the analysis increment X   is orthogonal to most of the background sample error   , = X  , − X   .This is another effect of the high dimension of the problem that can be understood from the computation of the cosine cos (X   , The histogram of the cosine distribution (32) is illustrated in Figure 11 for the various length-scale experiments (these are the unnormalized distributions in order to appreciate the concentration of the cosine around its average value 0).For all the length-scales, the distributions are shown to be centered on zero, meaning that the angle (X ).An averaged theoretical distribution for cos (X   ,   , ) can be deduced from (28) leading to the Gaussian N(0, tr(KHB 2 )/ tr(KHB) tr(B)).For the particular case where B = H = R = I with KH = A = I/2, it results that the distribution of the angle cosine is well approximated by cos meaning that the orthogonality is more pronounced for small length-scales.

Recommendation for PF and Beyond the Gaussian
Framework.A known result is that if the PF is quite interesting for non-Gaussian data assimilation; it faces the curse of dimensionality for problems of large size.The present contribution illustrates the geometrical interpretation of the curse of dimensionality, considering the favorable case where the distribution is Gaussian, so that the EnKF and the PF should produce the same analysis distribution.The limit of the use of PF for high dimension is that the distance between the background samples and the typical support of the analysis distribution would be too large to select some background samples as analysis samples.Of course, in the realm of geophysical data assimilation where  is of order Figure 10: Illustration of the hyperspheres that contain the typical samples of the prior and a posteriori Gaussian distributions valid for the EnKF and the PF in the Gaussian setting, where + denotes the background X   , × denotes the analysis state X   , the circle without mark denotes the background distribution, the circle with triangle marks denotes the analysis distribution, and the circle with diamond marks is centered on the analysis state with a radius of the typical distance between the analysis state and the background samples.These hyperspheres are reproduced for the various length-scale parameters (a)   = 10 km, (b)   = 100 km, (c)   = 500 km, and (d)   = 1000 km, for the dimension set to  = 1000.10 8 (at the moment of writing this), we can expect the distributions to be highly non-Gaussian, and this could be an advantage for the PF.However, for such high dimension, 10 8 , the diagnostic of the distance is entirely governed by the central limit theorem (or large deviation principle if we have to consider the correlation; see Appendix A), so the non-Gaussian distribution should lie within the hypersphere associated with the equivalent Gaussian distribution, that is, the Gaussian distribution whose covariance matrix is the covariance matrix of the non-Gaussian distribution (we assume that this is the last covariance matrix existing).Hence, even if non-Gaussian, the analysis distribution should lie within a hypersphere of smaller radius due to the data assimilation.As a consequence, the selection of background samples to produce analysis samples will face the same difficulties as in the Gaussian case as illustrated in Sections 3.6.1 and 3.6.2.
In this contribution, and considering illustration in Figure 2, it appears that high dimension starts from surprisingly very small effective dimension:  ef (B) = 100 can be considered as high dimension.Hence, for problems of typical effective dimension larger than 100, we recommend not using the PF algorithm since it can not be used with a reasonable ensemble size as encountered in the EnKF (dozen of members when writing the manuscript).Note that this recommendation is stated considering the effective dimension of the problem, not the dimension  of R  where the state vector lies.
As a consequence, and this is our strong recommendation, any new PF algorithm trying to tackle high dimension issue should proof its ability to face the intrinsic limit, due to the dimension, as explored here in terms of distance.Said differently, the distance can be considered as a diagnostic to test the ability of a new algorithm to cope with the curse of dimension: if the new algorithm is suffering from the distance consequences, then the algorithm can not be used for the high dimension, and if the new algorithm is compatible with the distance consequences, then nothing more can be said from the distance point of view and other diagnostics than the distance should be employed to conclude.

Conclusion
In this work, the behaviour of the convergence of a Gaussian law toward a hypersphere (or a part of it) in high dimension has been considered in order to discuss the difference between the ensemble Kalman filter and the particle filter.
Both algorithms correspond to the discretization of the Bayes rule (the nonlinear filter), but they are known to lead to separate results in practice.This is mainly due to the curse of dimensionality effect: particle filter requires an exponentially large number of samples as the dimension of the problem increases.
The consequence of the concentration of Gaussian probability distributions on hyperspheres has been verified within an experimental setting.It appears that the distance between the mean of a Gaussian and one of its samples is equal to the square-root of the trace of the covariance matrix: the fluctuation magnitude of the distance around this squareroot value is very small as the dimension increases.The Advances in Meteorology computation of the variance of the distance provides an approximation of the effective dimension of the covariance matrix.
These properties suggest that the distance plays the role of a diagnostic tool determining if a background sample can be compatible, or not, with a given analysis state.It appears that the ensemble Kalman filter (EnKF) is able to transform a background sample from the background hypersphere to the analysis hypersphere.If the prior and the posterior distributions are too different, no background sample can be selected in the particle filter (PF) at least when the ensemble size is not an exponential function of the dimension of the state space.A formula for the minimal bound of the ensemble size has been obtained from the concentration behaviour.
Some recommendations have been stated considering the non-Gaussian case, where the geometrical constraints associated with the distance imply a test that should be verified by any PF implementation for problems in high dimension.
The concentration behaviour is particularly interesting with the emergence of the weighting ensemble strategy that relies on the particle filter formalism to update the probabilistic information contained in ensembles of different origin.When an analysis state is available, then the diagnosis of the distance between the members of the ensemble to the analysis state can be used to position the ensemble onto the right hypersphere.

B. Observational Variance Tuning
The observational variance tuning is not a question directly related to the issue of difference between EnKF and PF for small ensemble size.However, this is of crucial importance in the realm of data assimilation where very few is known about the quality of the observations.Since the investigation of observational variance tuning illustrates an interesting use of the hypersphere constraints, we have chosen to discuss this application here.
The observational variance tuning aims to provide an inflation factor   such that R  =   R , where R  is the observational covariance matrix and R is the observational covariance matrix as implemented in a given data assimilation.If  ) . (B.5) Hence, the quantities α and α are two estimators for the inflation factor   .

C. Bounds for Gaussian Samples
The behaviour of the maximum of Gaussian samples of a random variable  following a Gaussian law N(,  2 ) of mean  and variance  2 has been characterized by Berman [29] who has shown that This is a classical result from which one has to understand that the maximum distance | − | can be reached for a sampling ensemble of size   is given by  and the maximum (minimum) sample value is  +  ( − ).
In reverse, for a given threshold , the expected minimum ensemble size  − required to observe a maximum distance | − | of magnitude  is given by (a)).But, for larger values of , the natural intuition fails to find the spherical shell illustrated for Figures2(b)-2(d).The apparent decreasing, for increasing , of the theoretically constant thickness 1/ √ 2 is due to the scaling used for the representation and corresponds to the relative error of the fluctuation over the radius: (1/ √ 2)/√.

Figure 1 :
Figure 1: Wrong intuitive extrapolation of Gaussian samples distribution from dimension 1 (a) to dimension  (c).

Figure 2 :
Figure 2: Schematic view of the concentration of   = 6400 samples of a normal law N(0, I) toward a spherical shell of radius √ and thickness 1/ √ 2 as the dimension increases from  = 2 (a) to  = 100 (b),  = 1000 (c), and  = 10000 (d) (see text for the details of the construction).

Figure 7 :
Figure 7: Histogram of distribution of the normalized distance (|X  − X   | −  ,  )/ ,  deduced from the EnKF (and valid here for the PF due to the equivalence in the Gaussian setting), compared with the theoretical asymptotic Gaussian law N(0, 1) (solid line), estimated from an ensemble of  = 6400 samples and represented for the different length-scale values: (a)   = 10 km, (b)   = 100 km, (c)   = 500 km, and (d)   = 1000 km, for the dimension set to  = 1000.

Figure 8 :
Figure 8: Histogram of distribution of |X  , − X   | deduced from the EnKF (and valid here for PF due to the equivalence in the Gaussian setting) (i.e., |  , |) compared to the one of |X  , − X   | for an ensemble size of  = 6400 samples.The maximum expected distance  +   , , of |X  , − X   | is the vertical segment in solid line.The minimum expected distance  −   , , of |X  , − X   | is the vertical segment in dashed line.The results are represented for the different length-scale values: (a)   = 10 km, (b)   = 100 km, (c)   = 500 km, and (d)   = 1000 km, for the dimension set to  = 1000.

Figure 9 :
Figure 9: Illustration of the ensemble size deduced from the averaged values  − (solid line with diamond marks) and  99% (solid line with triangle marks), for the length-scale values   ∈ {10, 100, 500, 1000} km, with the dimension set to  = 1000.The solid line with no mark represents the ensemble size 6400.

Figure 11 :
Figure 11: Histogram of cos (X   ,   , ), the cosine of the angle between the analysis increment X   and the background samples   , deduced from the EnKF (and valid here for the PF due to the equivalence in the Gaussian setting), for the various length-scales (a)   = 10 km, (b)   = 100 km, (c)   = 500 km, and (d)   = 1000 km, for the dimension set to  = 1000.