Independent Subspace Analysis of the Sea Surface Temperature Variability : Non-Gaussian Sources and Sensitivity to Sampling and Dimensionality

We propose an expansion ofmultivariate time-series data intomaximally independent source subspaces.The search ismade among rotations of prewhitened data which maximize non-Gaussianity of candidate sources. We use a tensorial invariant approximation of the multivariate negentropy in terms of a linear combination of squared coskewness and cokurtosis. By solving a high-order singular value decomposition problem, we extract the axes associated with most non-Gaussianity. Moreover, an estimate of the Gaussian subspace is provided by the trailing singular vectors.The independent subspaces are obtained through the search of “quasiindependent” components within the estimated non-Gaussian subspace, followed by the identification of groups with significant joint negentropies. Sources result essentially from the coherency of extremes of the data components.Themethod is then applied to the global sea surface temperature anomalies, equatorward of 65, after being tested with non-Gaussian surrogates consistent with the data anomalies. The main emerging independent components and subspaces, supposedly generated by independent forcing, include different variability modes, namely, The East-Pacific, the Central Pacific, and the Atlantic Niños, the Atlantic Multidecadal Oscillation, along with the subtropical dipoles in the Indian, South Pacific, and South-Atlantic oceans. Benefits and usefulness of independent subspaces are then discussed.


Introduction
The climate system is a highly complex dynamical system with many nonlinearly interacting modes of variability.A characteristic feature of the system is its non-Gaussianity.The nonnormality of climate is an important determinant of extreme events and may provide insights into understanding the system dynamics [1][2][3].For example, it is generally accepted that understanding non-Gaussian statistics of weather and climate has important consequences in atmospheric research not least because weather/climate risk assessment depends on understanding the tail of the system probability density function (PDF).Because of the complex interactions involved in the system, climate signals tend to be mixed.An important problem in climate research is to be able to disentangle main (or independent) signals from the mixture.
Empirical orthogonal functions (EOFs) and closely related methods (e.g., [4]) find signals that explain maximum variance but do not address the problem of mixing.An appropriate method that addresses the latter problem is independent component analysis (ICA).ICA is a method for separating mixed signals into their sources.It is based on the assumption that different sources, stemming from different physical and dynamical processes, can be considered to be statistically independent and can be separated by ICA.
ICA can be relevant to climate research to identify and separate possible anomaly patterns that may have different forcing [5].ICA has been used in climate research only lately.For example, [6] applied ICA to the study of the West African vegetation on intraseasonal and interannual time scales.Reference [7] analyzed sea level pressure via ICA to identify the main contributors to the Arctic Oscillation signal.

The Method
2.1.Cumulant-Based Independent Subspace Analysis.We designate by X ∈ R   ×  our (  ×   ) space-time data matrix anomaly (of rank   ), representing   realizations of a random vector x() ∈ R   ( = 1, . . .,   ).Principal Component Analysis (PCA) or Singular Value Decomposition (SVD) [20] are used to construct the PCs: x PC () ≡ [ PC,1 (), . . .,  PC,  ()]  with variances:  1 ≥ ⋅ ⋅ ⋅ ≥    .The method described below is applied to any whitened or isotropic random vector y of dimension  ≤   obtained from x PC () or to a subsequent orthogonal rotation z = R  y (RR  = 1  ) (note that since y is isotropic and standard multivariate Gaussian, then the same holds for z).The source separation, or the factorization of the pdf of y,  y into partial pdfs is not a trivial problem when y is non-Gaussian.In fact, this is the case when the joint negentropy (NE), [21], which is invariant under any linear homeomorphism, is strictly positive, that is, (z) ≡ [log( z / z )] = (y) ≡  rot > 0 where  z is the multinormal probability density function (pdf) with the same first two moments as z.
Gaussian distributed sources, (if they exist) are necessarily unidimensional and span the so-called Gaussian subspace  g (z) of dimension  g .Its orthogonal complement (with respect to the covariance inner product), of dimension  ng =  −  g , is  ng (z) =  g (z) ⊥ , that is,  g (z) ⊕  ng (z) = R  .The explicit MI dependence on R (with components   , . . ., ,  = 1, . . ., ) is quite difficult to implement in general.Here, we use a cumulant-based contrast function, relying on the Edgeworth expansion approximation of the pdf of y [23], namely:  y ≈ (y)[1+ Ed (y)], where  Ed is a multivariate linear combination of Hermite polynomials depending on cumulants up to a given  =  max maximum order, set here to  max = 4.The third and fourth-order cumulants of y (components of the coskewness and cokurtosis tensors) are written in terms of R entries for , , , and  in {1, . . ., }.Cumulant tensors of any order  ≥ 3 aggregate measures of the joint non-Gaussianity of all possible subvectors w of y.They vanish for all w when y is multinormal or for those w intersecting independent subvectors.To approximate NE, we introduce a positive defined matrix by contracting S and K as and similarly, for z, that is, M(z)  = ∑  ,=1     M(y)  .The Edgeworth-based NE approximation is given by where, for example, ‖S(z)‖ 2  stands for the Frobenius' squared norm (sum of all squared components) of the tensor S. Note that there are examples of non-Gaussian pdfs verifying  Ed (y) = 0, including isotropic pdfs, for which nonlinear methods must be used [24,25], such as the nonlinear-PCA (NL-PCA) [26] which has been applied to climatic data [27][28][29][30][31].
The approximation ( 5) is scaled as  rot −  Ed (z) = ( −1 eq ), if z behaves like an average of an equivalent number  eq of independent and identically distributed (iid) random variables. Ed (z) is appropriate to be used in ISA since it satisfies to an equivalent separation theorem (1): , z (2) , . . ., z (  ) ) , (6) where  Ed collects all the coskewness and cokurtosis mixing components of the different candidate sources.Therefore, ISA holds when  Ed is minimum over the manifold () of orthogonal rotations in R  .This is related to the Joint Approximated Diagonalization Eigen-matrices' (JADE) method [32].
The rotation matrix R = (r 1 , r 2 , . . ., r  ) composed of the Singular vectors (SVecs) of M(y) (in columns) provides the Tucker decomposition [33] or high-order singular value decomposition (HOSVD) [34].The HOSVD applied to cumulant tensors lead to the Principal Cumulant Component Analysis (PCCA) [35] and to the Joint Moment Component Analysis (JMCA), [36].Those techniques find an optimal reduced basis accounting for most of the variance and negentropy explained by a set of cumulants, making thus a generalization of PCA.
The singular values (SVs) of M(y):  1, ≥  2, ≥ ⋅ ⋅ ⋅ ≥  , can be expressed as which merge all the negentropy terms where the projection of z on the th SVec:   = r   y takes part.It provides the interpretation for the leading SVecs as being the main axes where non-Gaussianity is sought.Note also that there are other scalar tensor invariants characterizing the joint non-Gaussianity, relying on linear combinations of Tr[M  (z)],  ≤  (e.g., determinant of M).
The vectors corresponding to nonnull SVs span the estimated non-Gaussian subspace  ng-Ed (z) whereas the kernel Ker(M) spanned by the trailing SVecs associated with zero SVs spans the estimated Gaussian subspace  g-Ed (z) as far as third-and fourth-order cumulants are concerned.Note, however, that the method provides an overestimation of the Gaussian manifold  g-Ed (z) ⊇  g (z) and an underestimation of the non-Gaussian manifold  ng-Ed (z) ⊆  ng (z).
To summarize, the first step of source separation is to discard the Gaussian subspace and restrict the analysis to vectors in ] that leads to separated true non-Gaussian sources, that is,  Ed (z (1) , z (2) , . . ., z (  ) ) = 0, makes the invertible matrix M(z ng ) blockdiagonal with   blocks.The SVD of M(z ng ) yields blocks of SVecs: w () ;  = 1, . . .,   , spanning the same source's subspaces (i.e., Span[w () ] = Span[z () ];  = 1, . . .,   ) (up to rotation, permutation, or inversion of axes within each source).This, however, does not provide information on how to group the SVecs onto sources (source configuration indeterminacy), which is a complex problem.
In the presence of nonseparable multivariate sources, one will still be interested in finding, within each source, axes maximizing the single NEs.In other words, one solves the ICA problem in  ng-Ed by maximizing the NE sum Ed (  ) in (6), that is, solving an ICA problem [37,38] providing independent components (ICs)   .
For a large class of pdfs, [13] has showed and conjectured that the ISA solution results in general from the grouping of ICs into independent subspaces.
The general problem of grouping ICs coming from HOSVD followed by ICA benefits from a special decomposition of  Ed (z ng ) into positive terms, hereafter called interactivities (ITIs).The decomposition runs through all possible nonnull  int ≡ 2  − 2 −min(, max ) subsets u ⊆ z ng with dimension 1 ≤ |u| ≤ min(,  max = 4), where ( ≡  ng-Ed ).Therefore,

Complexity
As an example, if u = ( 1 ,  2 )  and keeping in mind the hypersymmetry, we get The ITIs, for 1D, that is, dim(u) = 1 (self-interactivities), coincide with single NEs whereas for dim(u) ≥ 2 they coincide with joint-interactivities, which provide a measure of the non-linear statistical relationships within u contributing to  Ed (z ng ).Using cumulant properties [39], it is straight forward to show that  Ed-IT (u) = 0 when u has non-null intersections with different independent sources.Therefore, after computing all ITIs, the source identification consists of determining a partition {u (1) , u (2) , . . ., u (  ) } of z composed of subvectors of minimal dimension such that  Ed-IT (k) = 0 for all k intersecting at least two subvectors u () and u () ,  ̸ = .We note that the NE decomposition is not unique.In fact, the concept of interaction information [40][41][42] provides a general frame for the decomposition of (z ng ) in terms of interaction information, defined for every u ⊆ z ng , which are given by linear combinations of mutual information of subvectors k ⊆ u.See [43] for a proper application of ITIs on nonlinear fluid dynamics for the assessment of nonlinear wave resonances.

Statistical Tests of Multivariate Non-Gaussianity.
The above source separation method depends crucially on the approximation of the expectation of any function (z) of z-function: [(z)] (e.g., the cumulants (2), the HOSVD matrix M(z) (4), the negentropy (5), and its interactivities (8)), through its estimation obtained by sample averages: (z).The sample high-order cumulant estimators may be quite biased due to the eventual presence of single and joint outliers on short samples.This is particularly relevant to weather/climate with relatively short datasets with nonnegligible serial correlations yielding a smaller effective number of degrees of freedom:  dof ≤   .The bias becomes even larger when we deal with high-order cumulants.In fact, the th order (hypersymmetric) cumulant tensor in  dimensions has  +−1  = ( +  − 1)!/[( − 1)!!] independent entries, which scales as a power law of .This can lead to a high bias of the cumulants of the leading SVs of M(z) and to a possible fictitious non-Gaussianity of the pdf, resulting from the "curse of dimensionality."Briefly speaking, the "false" non-Gaussianity may be due to either small sample size, high serial correlation, high dimension , or high-order statistics .To avoid non-Gaussianity overestimation, some procedures are usually applied including (i) data projection onto lowdimensional subspaces of high relevance (projection pursuit (PP) rationale [44,45]); (ii) outlier's removal; and (iii) use of resistant (or order) statistics considering reasonable values of the order .
Another strategy that we will also follow here is to evaluate the distribution, moments, and quantiles of the distribution of the estimators (z)  , necessary to perform ISA, under the null hypothesis (  ) of multivariate Gaussianity of z.All quantities under   will hereafter be noted with the subscript "."For iid realizations, all the above statistics are scaled as functions of   and , for which one has obtained analytic asymptotic formulas.In the presence of serial correlation, the statistics are derived from a Monte-Carlo procedure by taking an ensemble of surrogate Gaussian time-series with the same  dof , taken here as simple red noise AR(1) processes fitted to each z component as ( + 1) = ĉ1 () + (1 − ĉ2 1 ) 1/2 ();  = 0, . . .,   , where () is a standard Gaussian white noise process and ĉ1 is the lag-one autocorrelation.For the significance tests of non-Gaussianity, the upper quantiles (e.g., 90%-99%) statistics are taken from a large ensemble of surrogates.
The first step of source decomposition is to test the null hypothesis   .The finite-sample estimator of  Ed , hereafter denoted as JEd (all quantities based on sample statistics are overlined with a tilde) and others based on truncated NE approximations are conservative tests of   (or less powerful tests of its negation: ∼  ), because they reject non-Gaussianity (Type II error) resulting from discarded cumulants of order  >  max = 4. Now, we discuss the behavior of the estimator JEd: of  Ed: (under   ) computed from   -sized samples of a vector of  independent scalars.The pdf of JEd: is positively skewed, depending on the joint distribution of S and K , which are in general not independent since they satisfy a number of inequality constraints (e.g., K 1111 − S 2  111 + 2 ≥ 0).The bias of JEd: is given by where the first and second square-bracket terms of (10) represent, respectively, scaled versions of [‖ S ‖ 2  ] and [‖ K ‖ 2  ].The bias (10) tends asymptotically to   Ed ()/ dof .It also provides a lower-bound of the positive JEd bias for non-Gaussian pdfs, as it occurs for any negentropy estimator [46].
The biases from the Frobenius norms were obtained by multiplying the biases of the different squared cumulant estimators by their multiplicities [36]  standard Gaussian: ( 2 ) = (2)!/(2 !). Figure 1 shows the dependence on   and ĉ1 for  = 4, of the ratio between the actual bias and the asymptotic bias of JEd: (at ĉ1 = 0), obtained from 1000 Monte-Carlo replicates.Moreover, the dependence of  Ed on  is quite negligible with relative deviations less than 5% (not shown).
From the analysis of Figure 1(a) and as expected, the ratio  Ed remains close to 1 when ĉ1 is small and   is large.For   ≥ 500, the ratio has reached the asymptotic value, where it becomes dependent only on ĉ1 with  dof =   /  Ed (∞, ĉ1 , ) <   , where  Ed is interpreted as the "average interval between independent realizations."For small sample sizes,  Ed <  Ed (∞, ĉ1 , ), due to a negative higher-order term in (10).
The variance of JEd: involves covariances between generic cumulants, for example, cm(u 1 ) and cm(u 2 ), of subvectors = ⌀ and  = ∞ otherwise.As for the bias, the quantiles  sig ∈ [0, 1] of JEd: hereby noted   sig ( JEd: ) scale asymptotically as  −1  .Figure 1(b) shows the dependence on  and ĉ1 for   = 1000, of the ratio  Ed- sig ≡   sig ( JEd: )[  / Ed ()].This figure provides a kind of a rule of thumb thresholds of JEd for rejecting   in the asymptotic regime.

Estimation of the Gaussian Subspace by HOSVD.
Following a rejection of   , the next step is to estimate the Gaussian subspace  g and its dimension  g .With finite samples, one computes the sample-based matrix M(z) of M(z) (4).However,  g can no longer be estimated by Ker( M) (see Section 2.1) but through the subspace spanned by SVecs associated with the trailing (not significant) SVs.
Under   , the (estimated) SVs η,: , (see ( 7)), with quantiles   sig (η ,: ),  = 1, . . ., , are asymptotically ( dof −1 ) and the SVs become undistinguishable for large  dof [36].However, the SVs are not independent and consequently their marginal quantiles cannot be estimated independently for each .To overcome this difficulty, a sequence of statistical tests is applied, as discussed in the next subsection.
In summary, we first consider the vectors z = ( 1 , . . .,   )  = R  y and z  ≡ ( ,1 , . . .,  , )  = R   y  , where y  is a generic surrogate of y and R and R  are matrices filled by the ranked SVecs of M(y) and M(y  ), respectively.For each  in {1, . . ., }, we construct the vector b  ≡ ( 1 , . . .,  −1 ,  , , . . .,  , ), where the trailing , . . .,  components are swapped by the associated ranked surrogates.Then we compute their NE contributions M (b  ) (5) and finally the quantile   sig [ M (b  )] evaluated from a large ensemble of surrogates.The test then proceeds backward, starting from the smallest SV ( = ).Therefore, if M (z) <   sig [ M (b  )], for all  ≥ , then the null hypothesis  , (  undistinguishable from Gaussian surrogates  , for all  ≥ ) cannot be rejected.The smallest  for which  , is not rejected determines the estimated Gaussian subspace  g-Ed with dimension  g-Ed (spanned by the trailing  g-Ed SVecs of R), and  ng-Ed ≡  g-Ed ⊥ (spanned by the leading  ng-Ed ≡  −  g-Ed SVecs of R).

2.2.3.
Estimation of Multivariate Sources.The next step is to identify the non-Gaussian sources within the vector of non-Gaussian components z ng ≡ ( 1 , . . .,   ng-Ed )  spanning  ng-Ed .Since z ng is a subvector of z, we easily get JEd (z ng ) ≤ JEd (z).
We test the separability of  ng-Ed into scalar and/or subvector sources by looking for a new orthogonal rotation ,ISA , . . ., z  (  ),ISA ]  composed of a maximal number   ≥ 2 of subvector candidate sources z (),ISA ,  = 1, . . .,   , such that the joint pdf satisfies where the norm | ISA | of the pdf separation error  ISA is below the threshold of rejection of the null hypothesis of nonseparability (  = 1).There is an a priori source configuration indeterminacy.For instance, for  ng-Ed = 4, there are five possible ways, namely: 4 scalars (ICA case); 2 scalars and a dyad; 2 dyads; a triad plus a scalar and the global quartet.
For a chosen source configuration, an appropriate contrast function to maximize is where ĨEd must be below some significance threshold, passing the separation statistical test.However, an undecidable situation may occur when multiple configurations pass the test.To avoid the case of multiple configuration indeterminacy, we solve instead the ICA problem providing the vector of best scalar sources: Finally, we build multivariate sources forming appropriate groups of z ICA components, according to the conjecture of [13], and as below.
We start with the NE decomposition into  int interactivities (8) as To compare and account for the statistical significance of u-interactivities in (15) with different dimensions dim(u), we introduce scaled interactivities JEd-ITsig (u) through its comparison with the quantiles of the NE JEd-IT (u  ) of the ranked (by HOSVD) Gaussian surrogates u  of u.It is is defined as When dim(u) > 1 in ( 16), w is the rotation of u Gs towards isotropy (e.g., by Gram Schmidt orthogonalization) with u Gs being composed of the standard Gaussianized components of u.Since u  has Gaussian marginals, this procedure leads to a fairer and unbiased test when u has non-Gaussian pdf marginals.This is also supported by the fact that the mutual information of u is where the second term is the Gaussian MI component [47,48], which depends on the correlation matrix C Gs of u Gs and the third term is the sum of the NEs of the components of u Gs .The above measure is therefore resistant to single outliers of the marginal pdfs.
The different  int scaled interactivities ( 16) are then sorted by decreasing order, retaining uniquely the leading ones verifying JEd-ITsig (u) ≥  sig where  sig is a fixed level of acceptance.The emerging large interactivities normally yield ICs, which contribute to the multivariate sources.The NE interactivities are then grouped along the relevant interactivities until the stopping criteria are reached, for example, ∑ u JEd-ITsig (u) < JEd (z ng ) − [ JEd (z ng )], which will be adopted in practice.

Analysis of Sea Surface Temperature Data
3.1.Data Description and Processing.We consider the monthly SST data from the "Extended Reconstruction Sea Surface Temperature" (ERSST) version v3b product (https:// www.ncdc.noaa.gov/oa/climate/research/sst/ersstv3.php), from the International Comprehensive Ocean-Atmosphere Data Set (ICOADS) release 2.4.The SST data are on 2 ∘ × 2 ∘ latitude-longitude resolution and span the period 1875 to present [49].We only consider the period 1910-2011 (102 years), restricted to the region equatorward of 65 ∘ .At each grid point, the climatology and the (12-month moving average) linear trend in addition to the mean seasonal cycle are removed from the data [50], thus yielding   = 1224 monthly SST anomalies (SSTAs), on   = 8714 grid points.A EOF analysis is then applied [51] and we only use the space spanned by the leading  max = 12 mode for the application of ICA/ISA.Its output is independent from PC variances, even when the used PCs are quasi-degenerated.The total variance is   2(a) and Table 2 show the % of explained variance of the 12 leading EOFs with their confidence intervals [52] and Figure 2   Figure 3 shows the leading 12 EOFs which are described, with references in Table 1.Based on ĉ1 , an estimate of the "average interval between independent realizations" is Δ ind =   / dof =  Ed (  , ĉ1 , ), (Figure 1(a)) which is not too different from (−1.)/ ln(ĉ 1 ) (e.g., [53]).
Figure 2(b) shows that Δ ind varies from about 3 months (PC10) to more than 12 months (PC2).For our simulation later, we will use the geometric average of the 12 ĉ1 values, that is, 0.86 (see Section 3.3).

Negentropy Distribution and Non-Gaussian Subspace Estimation. Non-Gaussianity and negentropy of a field
x have counterparts both on a local and on a PC basis.In fact, local cumulants are obtained from cumulants of the whitened PCs composing vector y, through linear operators.Let us consider, for example, the local-basis coskewness tensor S(x).Its Frobenius norm is the area integral of the local squared skewness.By using the EOF orthogonality property, we may express this as a weighted sum over all grid point triads (, , ): where â stand for the relative area at grid point  and σ2  is the fractional local variance.The contributions to (17) of a triad of whitened PCs are proportional to their variance fractions.The Frobenius norm ( 17) is dominated by contributions primarily from regions of high variance and absolute skewness, namely, the positively skewed El Niño zone [19,54], in addition to to the positive skewness southwards of 50 ∘ S [17].Now, the NE JEd (y) can be decomposed using M(y) (5).For each  = 1,..., max = 12, we denote by y () = ( 1 , . . .,   ) the set of  leading whitened PCs. Figure 4 shows the diagonal terms ( M [y () ]), Figure 4 = ), providing therefore good candidates for the Gaussian subspace of y () .For example, with  = 12, we have  = 2, 7, 8, 9, and 12 (Figure 4(a)).Statistical significance of NE components and totals are assessed by using quantiles computed from Gaussian surrogates (Figure 4).Notice, for instance, the confidence level <70% of M22 [y () ] of the PC2 due to its small degrees of freedom.
Figure 4(a) shows that, for all , the largest NE component is M11 [y () ], resulting mostly from the nonlinear correlations of PC1 with other PCs and from its self-NE, due to its positive skewness (Table 2).
Since M12,12 [y (12) ] is quite low and not highly significant (Figure 4(a)), we have decided to apply ICA and ISA in the 11D subspace (see Section 3.4) in which the total NE reach a confidence level > 95%.Besides the  g test to limit  [55] presents an alternative limit.The dimensionality reduction, however, is performed below via the statistical detection of the non-Gaussian subspace as detailed in Section 2.2.2.
Therefore, we now consider the state vector y (11) and the associated vector z = ( 1 , . . .,  11 )  , yielding JEd [z] = 2.55, and a bias of 1.73, leaving ∼0.82 for "true non-Gaussianity."   of z  all lie below the above thresholds, supporting the consistency of the Gaussianity test.Figure 6 also suggests that the estimated Gaussian subspace  g-Ed of the SSTA is spanned by the last 6 SVecs (with a confidence level  sig = 95%) and so  g-Ed = 6 and,  ng-Ed = 5 for the non-Gaussian subspace  ng-Ed .The total NE restricted to z ng ≡ ( 1 , . . .,  5 )  is JEd [z ng ] = 1.11 with a bias of 0.14.The "useful" true non-Gaussianity is 0.97 which is of the order of the above estimated value (0.82).The NE components M (z ng ) (open black circles), with their low values compared to M (z),  = 1, . . ., 5, are also shown in Figure 6.
Figure 7(a) shows the Monte-Carlo average [ JEd (y)] obtained from 1000 surrogates.As expected, the mean NE increases with  2 (nonlinearity effect) and  g (dimensionality effect), though at a much smaller rate.The bias [ JEd: (y)] is the value reached at  2 = 0 in agreement with (11).

Gaussian Subspace Recovery.
We investigate the strength of JEd (y) as a function of  2 and  g .The null hypothesis   is rejected at the confidence level  sig if JEd (y) > [ Ed ()  −1 ] Ed- sig using  Ed- sig (Figure 1(b)).
The Monte-Carlo fraction  of   rejection is shown in Figure 7(b) for  sig = 99%.The contour line at  = 0.99 provides the graph  2 min ( g ) of  2 for the recovery of the non-Gaussian sources.As expected,  2 min increases with  g , starting at  2 min = 20% and at  g = 0 and reaching  2 min ∼ 45% at  g = 6.This means that the larger , the stronger (through  2 ) the non-Gaussian embedded sources.Moreover,  2 min diminishes under a less conservative test (lower  sig ) or when the effective number of temporal degrees of freedom increases.This puts a limit to what is recoverable from the available (often short) weather/climate data, and, therefore, less conservative tests of Gaussianity or data dimension reduction must be considered (e.g.,  sig = 90%) [47].
Figures 8(a) and 8(b) are similar to Figure 6 but for the cases (a) and (b), respectively.The analysis yields the correct dimension in both cases:  ng-Ed = 5 with  sig = 90% despite the fact that the SVs in the "high noise" case (a) are smaller.The score of the non-Gaussian subspace recovery is assessed using the inner products  ng-Ed ⋅  ng (compared to 5 in the ideal recovery).Those products are, respectively, 3.99 and 4.49 for cases (a) and (b).Lower values of  2 lead to a subestimation of  ng and to a worse estimation of  ng .Now, by applying ICA (see Section 2.2.3 and Appendix A) we get ICs making the vector z ICA = (R  ICA R  )y = ( 1,ICA , . . .,   Ng-Ed ,ICA )  .Tables 3 and 4 give, respectively, for cases (a) and (b), a summary of ICA with the sorted maximized self-NEs JEd ( ,ICA ), the NE components M (z ICA ), and the IC  vector loadings in RR ICA .A few conclusions can be drawn from these tables; (i) the sum of self-NEs of ICs (2nd column) cannot explain the total NE (3rd column).This corroborates the existence of NE originating from IC interactivities and (ii) Up to a certain statistical significance, a given IC projects uniquely onto a certain source as seen through the dominant projections of ICs (in bold font).From these tables, the estimated dyad z Ed-dyad is spanned by IC1 and IC4 whereas the estimated triad z Ed-triad is spanned by IC2, IC3, and IC5.That supports the conjecture of [13] about the source emerging by IC grouping.Recovered sources may be rotated or inverted with respect to initial data.The distribution of ICs among sources may differ for smaller  2 .The squared projections onto the true sources are, respectively, 1.35 (on a maximum of 2) and 2.42 (on a maximum of 3) in case (a) and 1.68 and 2.75 in case (b).Like the non-Gaussian subspace, the recovery of individual sources is more skillful in case (b).The final step consists of merging the ICs as outlined in Section 2.2.3.Figure 9 presents the results for both cases, where the self-interactivities ( 16) are the most relevant ITIs due to their maximization by ICA (except IC5 in case (a)).The cumulated NE reaches the unbiased NE value (dashed line) near the threshold of the ITI acceptance (scaled-ITI > 1).In case (a) (Figure 9(a)), the accepted ITIs are for ICs 1-5, and dyadic links (IC3, IC2), (IC5, IC3) yielding the emergence of the true triad (IC2, IC3, and IC5) (see Table 3).The marginally accepted ITIs holds for (IC4, IC3, and IC2) (false triad) and (IC1, IC4) (true dyad).Excluded ITIs remain at the level of the NE bias.For smaller values of  2 (not shown), the recovered sources may be mixed, partly restored, or even not detected.For case (b) (Figure 9(b)) the source recovery improves.In fact, the significant accepted ITIs are for ICs 1-5, the links for (IC5, IC3), (IC5, IC2), (IC3, IC2) with the true emerging triad (IC2, IC3, and IC5), and for the true dyad (IC4, IC1).The sum of ITIs associated with ICA and ISA is, respectively, 0.84 and 1.13, showing clearly the advantage of ISA in this case.

Application of ICA and ISA to SSTA Data.
Using the estimated non-Gaussian subspace (Section 3.2), we apply here ICA to the leading  ICA = 5 SVecs of the SSTA dataset (see Figure 6).Table 5 gives a summary of the application with the characteristics of the ranked ICs  ,ICA ,  = 1, . . ., 5, by decreasing self-NEs.They explain, respectively, 7.1%, 5.5%, 4.3%, 4.1%, and 3.7% of the total variance.
Most of the NE components come from the self-NEs (an effect of ICA), through their high skewness and positive kurtosis or leptokurtosis (Table 5), leading to fat long pdf tails.Extreme values of about 3-5 standard deviations are often reached as seen by the IC time-series (Figure 10).Under those conditions, ICs are basically aggregative and cooperative combinations of single and joint PC extremes.Quasi-independence and noncorrelation between ICs means that occurrence of extremes among different ICs tends to be asynchronous (see Figure 10) and therefore extreme phenomena are independent.
Figure 11 shows the correlation maps between the ICs and SSTAs.An alternative way to interpret ICs is through the patterns displayed in the corresponding normalized loading maps (Figure 12) obtained from combinations of EOFs (Figure 3).Correlation and loading map values at the grid point  for the th IC are weighted by the PCs standard deviation and its inverse, respectively, ∑ =11 =1  , e  ()(RR ICA ) , where  , is λ1/2  /σ  (Figure 11) and λ−1/2  (Figure 12), respectively, and e  () is the th EOF at  (Figure 3).By definition, ICs are spatial inner products between correspondent normalized loading maps (Figure 12) and the SSTA field.Consequently, maximal positive (negative) ICs on a given month are favored by highly positive (negative) pattern correlations between their loading maps and the SSTA field.Note, however, that the patterns shown in Figures 11 and 12 are not orthogonal.On loading maps, the weighting gives rise to the impact on ICs of PCs of small explained variance.Because of this weighting, the correlation maps (Figure 11) often display much broader and large-scale patterns than loading maps (Figure 12) [15], though both share the same main features when the range of PC variances contributing to a given IC is not too wide, which is the case here.
The leading IC1 explains about 2/3 of the total NE with the remaining 4 being almost evenly distributed (Table 5).Further experiments show that IC1 is quite robust with the use of  ICA < 5 leading SVecs with the remaining ICs being slight mixtures of those obtained with  ICA = 5.The above mixture of PCs suggests the existence of a global nonseparable source thanks to the nonlinear interaction across scales.This strongly suggests that ICs, obtained as blind source separation, are convenient tools for identifying nonlinear variability.Dynamical constraints may be included explicitly, for example, via nonlinear dynamical modes [71,72].
The leading IC1 reaches values of +5 standard deviations in the years 1983 and 1997 of strong El Niños, during which PC1, PC3, PC4, and PC11 are synchronised (see also Figures 5(a) and 5(b) and Table 5).The coherence between those four PCs may sometimes be partial.
The correlation pattern for IC1 (Figure 11) has an El Niño pattern shape due to the high PC1 amplitude (Table 5) and PC1 variance whereas the loading map (Figure 12) is much correlated with EOF11, showing more local features.Therefore, from Figure 12 and Table 5, IC1 is maximal under El Niño conditions, positive NPGO (colder central North Pacific), slightly negative PDO, warmer (colder) mid-latitude (subtropical) Atlantic region, a dipolar structure in the South Pacific, and cold pools southward of 30 ∘ S in the Indian and Atlantic oceans.That is consistent with the oceanic influence of ENSO on decadal timescales [73].Moreover, that agrees with the dynamical interactions between ENSO, PDO, and NPGO which have been discussed by several authors [74][75][76].The quite relevant EOF11 pattern in IC1 is mostly correlated   with the El Niño SST composite (asymmetric with respect to La Niña) at mid-latitudes in all oceans, during the December-February season [68,77], which explains the above referred coherence of PC1 and PC11 during strong El Niños.The IC2 loading map (Figure 12) mainly coincides with the symmetrical EOF6, that is, with the positive Atlantic Niño phase, while the remaining dominant EOF contributors to IC2, that is, EOF1, EOF7, and EOF9 (Table 5), practically cancel each other, especially in tropical Pacific El Niño region, despite the presence of non-null correlation in that region (Figure 11).Negative IC2 extremes (e.g., in 1912 and 1951) coincide with the strongest Atlantic Niñas in agreement with the positive skewness of PC6 (Table 2).
The IC3 loading map pattern (Figure 12) is quite similar to EOF5 (Central Pacific (CP) El Niño), combined with the NPGO (colder North Pacific for positive IC3).That is also visible in the correlation map (Figure 11).The negative skewness of IC3 (Table 5) is compatible with the primarily negative SST skewness in Central Pacific [78].It supports the existence of strong Central Pacific La Niñas (strong negative values of IC3), as in 1943-1944 (see Figure 10), especially during the positive phase of AMO (see the negative loading factor of PC2 for IC3 in Table 5).The loading factors of PC4, PC5, and PC2 contributing to IC3 (Table 5) also reflect the tendency for CP-El Niño (CP-La Niña) during negative (positive) phases of AMO [79], which is visible through the symmetrical correlation signals of IC3 with SSTA (Figure 11), respectively, in CP and North Atlantic areas.The imprint of the characteristic South Pacific dipole shown in EOF10 is also visible in the loading map (Figure 12).
The IC4 loading map (Figure 12) is a mix of several EOFs.The inverse proportionality of IC4 to the PCs standard deviation makes EOFs 4, 5, 9, 10, and 11 the strongest contributors (Table 5 and Figure 2(a)) to the loading map of IC4 (Figure 12), showing that IC4 is highly positive when the Indian and Atlantic oceans are warm, while the eastern (western) North and South Pacific are warm (cold), including the tropical El Niño region.That is also reflected in the correlation map (Figure 11).
The last IC5 loading map pattern (Figure 12) is nearly symmetrical to EOF10 (Table 5), especially over the Austral oceans with a characteristic sequence of three dipoles, located over the Indian, Pacific, and Atlantic subtropical oceans.Negative extremes of IC5 (e.g., 1947, 1992/93) are also associated with a strong negative AMO and NPGO.The NE of IC5 result essentially from the highly positive kurtosis and long pdf tails  of PC10 and PC2 (Table 2).Both the loading and correlation maps are quite similar.
The results of ISA are summarized in Figure 13.They suggest 5 ICs with the emergence of one dyadic link (IC3, IC5) with ITI = 0.015 and a triadic link (IC1, IC4, and IC5) with ITI = 0.024 and finally the strongest dyad (IC1, IC5) with an ITI = 0.045.This interaction is even larger than the self-NEs of ICs 3, 4, and 5.This interaction is mainly due to the nonlinear correlation: cor(IC1, IC5 2 ) = 0.23, that is, extremes of IC5 occurring preferentially under PC1 > 0 (e.g., strong El Niños).A possible explanation is that PC4 (simulating NPGO) influence both IC1 and IC5.From the ensemble of interactions, only IC2 appears to be significantly independent of the remaining ICs.All the remaining ITIs remain at the noise level (Figure 13).
The above analysis extracts highly negentropic ICs resulting primarily from independent combinations and coherency between PC extremes.An alternative ICA and ISA analysis is obtained with Gaussianized (followed by whitening) PCs.Further experiments (not shown) indicate that the total NE  comes from cross cumulants only and ICs emerge exclusively from combinations of joint PC extremes.The two leading ICs are practically unchanged as compared to previous ICs; the third and fourth ICs are slightly mixed while the fifth IC is not significant, that is, remaining in the Gaussian manifold.In summary, a prior Gaussianization of PCs leads in general to a reduction, both of the dimension and of the strength of the non-Gaussian subspace.

Sensitivity to the PC Spanning Space.
Here, we study the behavior of the non-Gaussian subspace and ICA/ISA outputs when we sequentially increase the metric space dimension  max using the leading whitened PCs.In the discussion below, it is worth mentioning that, given the direct sum:  + =  − ⊕   between two orthogonal subspaces  − ,   , the relationships between the associated non-Gaussian subspaces and their dimensions hold: which leads to the identity Hereafter, we focus on the case where  − ,   are spanned, respectively, by the leading  max −   PCs and by the remaining   ≥ 1 PCs up to rank  max .In (19), the equality (19) holds when  − ,   are statistically independent or equivalent when spaces  − ,   have separable pdfs.Moreover, under statistical independence, the ISA/ICA may be done independently.However, the pdf separability is hardly verified on multiscale nonlinear natural systems as studied here which means that  ng ( + ) intersects the Gaussian subspaces  g ( − ),  g (  ) of the parcel spaces while ICA/ISA is also changing when passing from  − to  + .An equivalent relationship to (19) for  ng-Ed , obtained with finite samples, is much harder to obtain since  ng-Ed may be undersized (oversized) by errors of type I (II) when considering the Gaussianity null hypothesis   .This means that  g-Ed ( + ) ⋅  ng-Ed ( − ) ≥ 0; that is, there may be some "leakage" of the "prior" non-Gaussian subspace ( ng-Ed ( − )), especially through the less non-Gaussian ICs, to the "posterior" Gaussian subspace  g-Ed ( + ).
Next, we analyze how the ICA/ISA applied to SSTAs varies with  max .To this end, we have considered the sequence  max = 3, 9, 11, 15, 17 in which the number of ICs  ng-Ed , obtained by the method of Section 2.2.2 using a confidence level  sig = 90% (see, e.g., Figure 6), is  ng-Ed = 1, 4, 5, 6, 7, respectively.The above  max are those for which  ng-Ed have shown a jump (of one or more dimensions) with respect to that obtained for  max − 1, where more substantial changes are expected to occur in the ICA/ISA sources.For example,  ng-Ed = 5 for 11 ≤  max ≤ 14.Let us consider the corresponding sequence of embedding spaces:  − ,  + , denoting by r − 1 , . . ., r −  − and r + 1 , . . ., r +  + the IC weighting vectors (sorted by decreasing self non-Gaussianity), spanning  ng-Ed ( − ) and  ng-Ed ( + ), respectively.
To assess the impact on ICA/ISA, we have computed some inner-product-based diagnostics, plotted in Figure 14.First, the measure while the difference  −+ −  − ≥ 0 yields how the non-Gaussian subspaces are preserved throughout the space sequence, that is, how closely (20) is preserved.Then, we identify which terms 0 ≤ r −  ⋅ r +  ≤ 1 are mostly contributing to that, typically above a threshold (e.g., 0.08).For dimensions within the range of the selected  max where  ng-Ed is kept constant, every r −  is very well projected onto a certain r + () , leading to well tracked and robust ICs independently of the space dimension.For instance, in the sequence  max = 11, 12, 13, 14 ( ng-Ed = 5), we get  −+ = 4.7, 4.6, 4.8, respectively, (quite close to the ideal value of 5) with very low leakage of non-Gaussianity.We also get the tracks of the five ICs: [IC1 → .We conclude that ICA and ISA are quite robust in the range  max = 11-14.Only the ranks of IC2 and IC3 were swapped in the step  max = 11 → 12. Now, in Figure 14, we analyze what happens when a highly non-Gaussian or highly nonlinearly interacting PC comes into the analysis space, corresponding, for instance, to PC3, PC9, PC11, PC15, and PC17.EOF15 and EOF17 are both dominated by subtropical modes in the South Pacific and Indian oceans (not shown), nonlinearly interacting with precedent modes.
In every sequence  − →  + , the ICs may experience three situations: (1) the emergence of new ICs; (2) the "leakage" of non-Gaussianity to the forthcoming Gaussian subspace; and (3) the decoupling of sets of ICs (one or more) into sets of ICs of larger dimensions.In our case, the emergence of 3 new ICs appears when  max changes from 3 to 9 (see Figure 14).The "leakage" is more relevant with weaker ICs (low negentropy).mainly at higher dimensions, being the main mechanism of IC generation.For instance IC3 at  max = 9 decouples into (IC4, IC5) with  max = 11.Multiple interactivities, like the dyadic ones (expressed by red segments in Figure 14), may emerge on incoming ICs or transfer to the tracked ICs (e.g., the dyad (IC1, IC6) with  max = 15 transfers to the dyad (IC1, IC7) with  max = 17).There are some ICs that remain robust without any decoupling like the case of IC1 and IC3 with  max = 11.In general, as  max increases, ICs project progressively onto low-ranked PCs dominated by smaller spatial scales and complex sharper features.In this situation, a hybrid criterion for the choice of the truncation  max (not explored here) may be favored by four factors: the negentropy and explained variance by ICs, the low dispersion of IC weighting loadings (either on PC or original basis), and a low  ng-Ed .That justifies the choice  max = 11 as being the lowest dimension for which  ng-Ed = 4.

Summary and Conclusion
We present in this paper an approximation of the multivariate negentropy (NE) motivated by the Edgeworth expansion of the joint probability distribution, using a linear combination of Frobenius norms of the joint cumulant tensors of order  ≥ 3.This form is especially sensitive to the extent and frequency of extremes.The NE approximation is based on the trace of a positive definite matrix M resulting from the high-order singular value decomposition of an appropriate contracted form of the coskewness and cokurtosis joint tensors in consistency with the exact negentropy.First, it is a tensor invariant, that is, invariant under orthogonal rotations.Second, when variables are whitened, a version of the "Separation Source Theorem" holds; that is, the approximated NE may be explicitly decomposed into positive terms measuring self-NEs (due to skewness/kurtosis of marginals) and NEs resulting from nonlinear associations between variables.This permits the extraction of statistically independent components (ICs) from observations through the maximization of the sum of marginal NEs.The decomposition is further generalized to multivariate sources under the Independent Subspace Analysis (ISA), through the formation of groups of disjoint ICs, associated with positive NE contributors (or interactivities).The jointly Gaussian ICs span the Gaussian subspace, being simply approximated by the kernel subspace of M.
When cumulants are estimated from finite or serially correlated samples, the NE estimator bias is positive and grows with the dimension (Curse of Dimensionality).Therefore, the estimated NE of a Gaussian distributed finite multivariate sample is prone to produce "false non-Gaussianity," hence the importance of Gaussian subspace identification.This subspace is estimated from the trailing singular vectors of M for which the null hypothesis of Gaussianity cannot be rejected.The search of non-Gaussian ICs or multiple sources is sought in the non-Gaussian subspace.
The method is then applied to real data consisting of 11 leading PCs obtained from 102 years of the SST anomaly (SSTA) field over the global ocean, equatorward of 65 ∘ .To test the effectiveness of the method, prior to using the SSTA PCs, we have used synthetic data with a prescribed amount non-Gaussianity and sharing the same dimensions, sample size, lagged-one autocorrelation, and the level of negentropy as the SSTA PCs.In order to assure the recovery of the "true non-Gaussianity," the experiment suggests a minimum of variance of the true non-Gaussian sources, which increases as the sample gets smaller and the dimension of the embedding space gets larger.
Application of the method to SSTA data based on the leading 12 PCs (the last one going practically to the Gaussian subspace), explaining more than 50% of the monthly variance, yields 5 ICs.The most prominent or most negentropic IC (or IC1) represents El Niño conditions combined with negative PDO, positive NPGO, and composite impacts of El Niño (asymmetric with respect to La Niña) on oceanic mid-latitudes, poleward of 30 ∘ .The second component, IC2, reveals a loading map mostly projecting onto the Atlantic Niño and the AMO pattern.The third component, IC3, reflects the tendency of Central Pacific El Niños during the negative phase of AMO.The fourth component, IC4, combines several EOFs and the last component, IC5, is dominated by EOF10, being positive when the SSTA field in the South Pacific exhibits a zonally oriented dipole.All ICs are basically independent combinations of PC extremes and suggest independent responses of the SST field to different forcings.Only IC2 appears to be significantly isolated from the other components.The remaining components have some statistical links dominated primarily by the dyad (IC1, IC5), possibly associated with the sharing of the NPGO effect, via PC4, among those ICs.
The identification of possible independent physical forcings, responsible for the independent sources may bring some benefits, concerning, namely, the forecasting and modelling of the SSTA variability and its impacts and teleconnections on relevant surface variables (e.g., precipitation, temperature, moisture, and wind).That opens the possibility of building phenomenological, dynamical, and stochastic models for the different ICs and independent subspaces.

Figure 1 :
Figure 1: (a) Contour plot of  Ed as a function of   , ĉ1 for a fixed dimension  = 4.(b) Ratio  Ed- sig for the confidence level  sig = 0.95, as a function of the dimension  and the lag-one autocorrelation ĉ1 for a sample size   = 1000.The ratio increases with decreasing  or increasing ĉ1 .
(b)  shows the fitted lag-one autocorrelation ĉ1 based on an AR(1) model.

Figure 2 :
Figure 2: (a) % of explained variance confidence interval and (b) lag-one autocorrelation of the AR(1) fitting model of the leading 12 PCs.
(a) and the total NE JEd [y () ] = Tr( M[y () ]), Figure 4(b).Note that, for a given , the smaller values of M [y () ] are either the least non-Gaussian PCs or the less interactive PCs with the remaining PCs ( ̸

Table 1 :
EOF description with the main SSTA modes, along with corresponding available indices from literature and references.EOF rank Description with the main SSTA modes

Figure 3 :Figure 4 :
Figure 3: Maps of the 12 variance-leading, spatially normalized EOFs of the monthly SSTAs field, computed in the period 1910-2011.Ranked EOFs are read from left to right and top to bottom.

Figure 6
Figure 6 shows the NE components (or SVs) M (z) ( = 1, . . ., ), along with the threshold quantiles 90% and 95% of   sig [ M (b  )] obtained from an ensemble of 1000 Gaussian surrogates.For comparison, we also show the SVs M (z  ) (open squares) of surrogate realizations z  of z.The SVs

Figure 7 :
Figure 7: (a) Monte-Carlo average of the sample negentropy estimator JEd (y) as a function of  2 and  g .(b) Contour plot of the fraction  of times for which the null hypothesis   is rejected at  sig = 99% (1% significance level) as a function of  2 and  g .The contour  = 0.99 can mark  2 min at the 99% confidence level.

Table 5 :
Sorted ICs  = 1, . . ., 5, for SSTA data showing self-NEs contributing to the ICA contrast function (2nd column), NE components (3rd column), and loading factors (RR ICA ) , ( = 1, . . ., 5), of the leading ICs on the PC basis.Loading factors, sorted by absolute value, explaining up to 85% of the unitary norm are in bold font.The last two columns give the skewness and kurtosis of ICs.IC rank

Figure 9 : 3 Figure 10 :
Figure 9: Sorted scaled-interactivities by decreasing order (solid circles), ITIs (solid squares) with cumulated ITIs (open crossed circles) and unbiased total NE (horizontal dashed line).The sorted IC groups are labeled (e.g., 32 means group of IC2 and IC3).The 95% confidence levels of the selected ITIs are those for which the cumulated ITIs are below the dashed line.(a) Case (a): high noise and (b) case (b): intermediate noise.

Figure 12 :
Figure 12: Spatially normalized loading maps of the 5 sorted ICs.

Figure 13 :
Figure 13: As in Figure 9 but for the ISA summary of SSTA data.

Figure 14 :
Figure 14: Schematic of the ICs (rank and negentropy shown in rectangles), aligned vertically for each  max = 3, 9, 11, 15, 17 (circles) with values of  −+ (bulk arrows) between the sequential PC spanning spaces.Values with the dominant r −  ⋅r +  are written next to the corresponding thin arrows.The red straight vertical lines indicate the significant dyadic links with corresponding NE interactivities.

Table 2 :
% of explained variance, skewness, and kurtosis of the 12 leading PCs and the squared projections of  ng-Ed onto the individual whitened PCs for the analysis over  = 11 PCs (see the end of Section 3.2).

Table 3 :
(18)ributions to the ICA contrast function (2nd column).NE components (3rd column) and IC vector loadings on the basis of original variables(18)for the high noise case.Components in which ICs are most projected are in bold font.

Table 4 :
Table 3 but for the case of intermediate noise (b).