Modifying Covariance Localization to Mitigate Sampling Errors from the Ensemble Data Assimilation

The ensemble-based Kalman filter requires at least a considerable ensemble (e.g., 10,000 members) to identify relevant error covariance at great distances for multidimensional geophysical systems. However, increasing numerous ensemble sizes will enlarge sampling errors. This study proposes a modified Cholesky decomposition based on the covariance localization (CL) scheme, namely a covariance localization scheme with modified Cholesky decomposition (CL-MC). Our main idea utilizes a modified Cholesky (MC) decomposition technique for estimating the background error covariance matrix; meanwhile, we employ the tunable singular value decomposition method on the background error covariance to improve the ensemble increment and avoid the imbalance of the system. To verify if the proposed method can effectively mitigate the sampling errors, numerical experiments are conducted on the Lorenz-96 model and large-scale model (SPEEDY model). The results show that the CL-MC method outperforms the CL method for different data assimilation parameters (ensemble sizes and localizations). Furthermore, by performing one year of assimilation experiments on the SPEEDY model, it is found that the 1-day forecast RMSEs obtained by the CL are approximately equal to the 5-day forecast RMSEs of CL-MC. So, the CL-MC method has potential advantages for long-term forecasting. Maybe the proposed CL-MC method achieves good prospects for widespread application in atmospheric general circulation models.


Introduction
Data assimilation (DA) is the combination of observations with "prior knowledge" (e.g., numerical models and model outputs) to achieve an estimate of the true state of a system (cf. Katzfuss et al. [1]). In the process of sequential data assimilation, the traditional Kalman lter (KF; Kalman and Bucy [2]) is used for the estimation and prediction of linear problems. For complex multidimensional systems, the classical ensemble DA lter (EnKF; Evensen [3]) derived from the merging of the linear estimation theory of Kalman lter and the nonlinear Monte Carlo estimation theory has been proposed, where is widely applied in the large-scale models (cf. Evensen and Van Leeuwen [4]). However, the ensemble members are always much smaller than the model components so sampling errors cannot be avoided. e sampling errors can introduce numerous other problems (e.g., inbreeding and lter divergence) into the process of the ensemble lter. Inbreeding is a term used to describe the underestimate of the analysis error covariance (cf. Furrer and Bengtsson [5]), and lter divergence occurs when the analysis error distribution moves away from the truth and this can be caused by the underestimating of the background error covariance. Such problems can be addressed with the covariance in ation method, which operates on the background error covariance multiplied by an in ation factor and prevents the underestimation of the covariance (cf. Anderson and Anderson [6]). However, the size of the ination factor will depend on many factors including the type of lter and the used dynamical system (cf. Hamill et al. [7]). us, covariance in ation does not address all problems, especially since the spurious long-range correlations between great distance observations and model components from the EnKF. It is common to use localization schemes to solve such spurious correlations.
Localization of the covariance matrix in the EnKF is necessary for large-scale models because the traditional ensembles are limited to only 10-100 states, which is considerably less than the degree of freedom of the model. Localization methods are typically implemented by taking the Schur product of the correlation matrix or the localization function, such that the localization methods contain covariance localization (CL; Hamill et al. [7]) and observation localization (OL; Hunt et al. [8]). Alternatively, the domain is decomposed as in domain localization (DL; Houtekamer and Mitchell [9]) and separate analysis is conducted for each domain. e CL method is commonly employed in ad hoc treatment of the spurious correlation problem. e OL and DL methods are frequently combined to use and achieve better observation weights according to their distance. Additionally, other localization techniques, such as local analysis (LA; Sakov and Bertino [10]), background error CL (BL; Greybush et al. [11]), and observation error CL (RL; Greybush et al. [11]) have been devised. Although such localization can be effective in avoiding spurious correlations and limiting the negative effects of sampling errors, it simultaneously introduces imbalances into the system (cf. Kepert [12]). Especially covariance localization is a major driver of potential imbalance in EnKF. Efforts are being made to find approaches to avoid the imbalance system. e imbalance is primarily produced by the computation of analysis increments. e incremental analysis update (IAU) procedure is used to prevent imbalance generation during the computation of the local increments (cf. Kleist and Ide [13]). In addition, it is possible to avoid imbalances by exploiting the shorter assimilation window (Houtekamer and Zhang [14]).
In the Localization scheme framework, the localization parameters (i.e., the localization length and the weighting function), a key indicator of the localization scheme, are often manually configured. ese parameters are unknown a priori and can usually be found by trial and error. Kirchgessner et al. [15] investigated that the optimal localization radius of DL and OL are linearly related to the local observation dimension, and the OL effect was demonstrated to be better than the DL effect. With higher-resolution models, the use of a narrower localization radius reduces the sampling error, but it limits the use of observations. Miyoshi and Kondo [16] proposed a multiscale localization (500 km and 900 km) approach for EnKF to find analysis increments in independent subobservation spaces. However, it requires a large computational cost when the number of observations available is O(10 7 ). In addition, to preserve the larger-scale structures from a limited ensemble size, Kondo et al. [17] employed a dual-localization approach to analyze smallscale and large-scale analysis increments separately using spatial smoothing. For complex geophysical models, although the impact of the observations will be decreased with longer localization ranges, some issues of localization require to be briefly discussed. First, distance-based localization methods employ unimodal segmentation functions (e.g., GC functions) to construct the corresponding matrices, which have difficulties in "localizing" nonlocal observations (Bai et al. [18]). ese observations exhibit multimodal correlations with model components (cf. Fertig et al. [19]; De La Chevrotière and Harlim [20]). Furthermore, in some cases, the model components or the corresponding observations may not have correlated physical locations, which may be against the principles of distance-based methods (cf. Bai et al. [21]). e fuzzy control functions are used for yielding more precise observation weights instead of localization functions (e.g., GC functions) (cf. Mingheng et al. [22]; Mingheng et al. [23]). For an ideal intermediate atmospheric general circulation model (AGCM), the negative effects of localization could be avoided with a sufficiently large ensemble size. Miyoshi et al. [24] investigated that using large ensemble sizes (up to 10,240 members) could capture long-range scale error correlations in the implementation of the local ensemble transform Kalman filter (LETKF; Miyoshi et al. [25]), however, it is essential for preventing memory overflow to employ relatively large ensemble members. Kondo and Miyoshi [26] modified the EnKF code without using the covariance localization prerequisite. It was found that using 10,240 ensemble members also reveals the long-term structural features of the background error covariance matrix. e aforementioned studies commonly considered the effect of horizontal scales on assimilation effects. Farchi and Bocquet [27] considered a realistic extension of the LEnSRF using covariance localization in the vertical direction. Wang et al. [28] introduced a multiscale local gain from ensemble transform Kalman filter (MLGETKF) to allow simultaneous update of multiple scales for the forecast ensemble mean and perturbations via assimilating all observations at once. Recently, e Cholesky decomposition method has been applied to the estimate of the forecast error covariance matrix (cf. Kang and Deng [29]).
In the abovementioned studies, the use of a multiscale localization scheme or large ensemble members (e.g., 10,000 members) has so far been necessary to mitigate the sampling errors. However, there are few investigations on the estimate of the forecast error covariance matrix. Here, a covariance localization scheme with modified Cholesky decomposition (CL-MC) for estimating the background error covariance is proposed. e effectiveness of the proposed CL-MC method is examined by the Lorenz-96 and the SPEEDY models.
us, the article is organized as follows: Section 2 focuses on the background of the ensemble Kalman filter, covariance localization, modified Cholesky decomposition, and the proposed covariance localization with the modified Cholesky decomposition. Section 3 describes the configuration of the twin models and analyzes the experimental results. Section 4 provides the discussion and conclusion of the results.

Ensemble Kalman Filter (EnKF).
e EnKF is an ensemble-based Kalman filtering scheme, it was proposed by Evensen [3]; subsequently, it evolved many forms of EnKF methods (cf. Tippett et al. [30]; Houtekamer and Mitchell [9]; Evensen and Van Leeuwen [4]; Evensen [31]; Burgers et al. [32]. Anderson [33]). ese methods can be roughly classified into two categories: deterministic filters (cf. Burgers et al. [32]) and stochastic filters (cf. Whitaker and Hamill [34]). e main difference between them is whether or not perturbed observations are introduced to maintain the correct statistics of the filter. e implementation of EnKF is to update the a priori estimate x f (k) of the atmosphere valid at atmosphere valid time k with the information in the new observation y o to reach the latest estimate x a (k) of the atmosphere (equation (1)). For this purpose, the Kalman gain matrix K is used to dispatch appropriate weights to the observation error covariance R and the background error covariance P f (equation (2)). In all equations, the superscripts a and f refer to the analysis and predictor variables, respectively. e observation operator H denotes the mapping from the model space to the observation space. e model operator M denotes the transfer of the estimate x a (k) to the next analysis time (equation (3)).
where n denotes the ensemble size and k represents the atmosphere valid time. For the Kalman gain K in equation (2), the ensemble size is usually used to approximate the estimates P f H T and HP f H T (cf. Houtekamer and Mitchell [35]; Ghil and Malanotte-Rizzoli [36]). In the subsequent equations, we omit the superscript of the time index k for ease of notation. For the stochastic EnKF, small spurious correlations between the ensemble background and observations lead the analysis to move away from the truth. Whitaker and Hamill [34] introduced the deterministic ensemble square root filter (EnSRF). However, using the EnSRF method to compute the analysis error covariance with a relatively small ensemble size, the Kalman gain is susceptible to contamination by sampling error, making it likely that the true analysis error is much larger than the background error. e ensemble may become too concentrated to the detriment of the assimilation results one would expect. It is common to use covariance inflation or covariance localization to compensate for this particular problem. Depending on the covariance inflation, it does not address all problems associated with undersampling. e covariance localization (CL) is essential for data assimilation.
ere are two reasons for using covariance localization in EnKF, these are usually listed in the literature as "Spurious covariance" and insufficient ensemble rank (cf. Sakov and Bertino [10]). e "spurious covariance" means that the limitation of the ensemble size makes no connection between the covariances of the elements of the remote state vectors. It can lead to excessive reduction of the state error covariance in the analysis, resulting in ensemble collapse and filter divergence. e insufficient ensemble rank denotes that the ensemble size is less than the model components. erefore, the proper use of the ensemble size is one of the most basic rules to ensure the design of EnKF systems.

Covariance Localization (CL).
For the CL method, it is used to modify and update equation (2) by multiplying the distance correlation coefficient matrix ρ with each element of the background error covariance (Schur). It is one of the localization methods that are applied to large-scale systems.
e core idea behind this operation is to modify each element of the background error covariance to increase the rank of the covariance matrix.
However, the localized background error covariance matrix has hardly been explicitly computed except for smallscale systems. Instead, if we have enough observations, we can approximately compute the terms (ρ°P f )H T and H(ρ°P f )H T in equation (2) by the ensemble size. Algorithm 1 summarizes the analysis step of the resulting generic CL method in detail.
Where Although the introduction of covariance inflation and localization techniques can overcome the problems in the EnKF. ere will be a restriction to apply these techniques only to the background error covariance matrix (B ≈ P f � (n − 1) − 1 · [X f · (X f ) T ]). ere still exists finite ensemble sizes. erefore, the local sample covariance matrix may be rank-deficient. To avoid this problem, the background error covariance matrix B is estimated directly by an improved Cholesky decomposition. A sparse approximation of B is replaced by B. e aim is to avoid limiting the ensemble size as well as to prevent memory overflow.

Modifying Cholesky Decomposition.
e accuracy of background error covariance estimation has been the core problem for the covariance localization schemes when using limited ensemble sizes. Recently, some researchers employ Cholesky decomposition to achieve the accuracy of background error covariance estimates (cf. Ueno and Tsuchiya [37]; Nino-Ruiz et al. [38]; Kang and Deng [29]; Boyles and Katzfuss [39]). In this regard, we achieve the estimation of the inverse of the sparse background error covariance matrix by using a modified Cholesky decomposition. e Cholesky decomposition is also known as the square root decomposition method. Its core principle is to represent a symmetric positive definite matrix as a decomposition of the product of a lower triangular matrix and its transpose. Consider (approximately) N samples of Gaussian random vectors X � [X 1 , . . . , X j , . . . , X n ] ∈ Υ m×n with statistical moments X j ∼ N(0 n , B) for 1 ≤ j ≤ n. x i ∈ Υ n×1 indicates the i th model component for all samples. e modified Cholesky decomposition is derived from the previous component regression (Nino-Ruiz [40]).
Advances in Meteorology en, the estimate of B can be computed as follows: for equations (5) and (6), the eigenstructure of (ρ°B) depends on the structure of U. Based on the structural features of U, the sparse estimates of (ρ°B) are obtained by imposing the elements of U to be zero. e main zero components in U are the following: if the two model components are conditionally independent for a given localization radius r, the corresponding elements in (ρ°B) are zero. Furthermore, based on the CL approach, the conditional independence of background errors between different model components can be achieved by using local neighborhoods (cf. Ueno and Tsuchiya [37]). For a given model component, its neighborhood is formed by some components within a localization radius r. If a model component exceeds the r distance, the correlation between the corresponding background errors is set to zero.

Covariance Localization with the Modified Cholesky Decomposition.
is section investigates the estimation of the covariance matrix with modified Cholesky decomposition of the covariance localization (CL-MC). First, according to equation (5), the structure of B depends on the structure of U, considering the estimation of the background error covariance. If we assume that the correlation between the model components is local and there is no correlation outside the localization radius r, we obtain a lower triangular sparse estimate of U, and U is sparse and localized. According to literature (cf. Nino-Ruiz et al. [38]; Boyles and Katzfuss [39]), the regression equation can only be performed on the former for each model component, so the model components must be sorted (labeled) before computing U. For the grid model, we mainly considered the column order. e estimation procedure for (ρ°B) − 1 is as follows: 1. For the i th model component, we use the truncated singular value decomposition method to estimate the regression coefficients η i,j for 1 ≤ i ≤ m and j ∈ P(i, r), the model component x i as follows: where P(i, r) represents the predecessor indexes of the model component i when the localization radius is r. x j denotes the j th model component which precedes x i under the condition of 1 ≤ j ≤ i ≤ m. Figure 1(a) shows the local predecessors and neighborhood of model element 28 when column order is employed. ε i represents the error of the i th model component regression. η denotes the coefficients can be calculated by the solving the optimization problem, that is e residuals in equation (8) will never be zero, it implicitly estimates the covariance inflation. e application of covariance inflation may not be required for the CL-MC assimilation scheme.
Once (ρ°B) is estimated, in the next section we present how to avoid the imbalance effect and handle it with (ρ°B). However, here, the analysis formulation is as follows: Observation vector y ∈ R N y (input). λ is a multiplicative inflation factor (parameter). 1 represents the vector with all elements being 1. I is the identity matrix. ρ is the localizing function (0 ≤ ρ ij ≤ 1).
11) return m × n analysis ensemble E (12) end for ALGORITHM 1: Analysis step for CL. 4 Advances in Meteorology here, (9), we obtain it by using the basic decomposition matrix identities. (10) represents the perturbation observation. Equation (10) is well known as the EnKF dual formulation and equation (9) is the incremental form of the primal formulation. Using equation (10), we conclude that the computational effort of the method is bounded by the following equation:

Balance and Truncated Singular Value Decomposition (svd).
In the above section, we introduced the CL method with the modified Cholesky decomposition. However, covariance localization is a major source of significant potential imbalances in the EnKF, especially for short localization distances. us, how to effectively reduce the source of imbalance has become one of the key issues for data assimilation. ere are many main ways to the reduction of sources of imbalance (such as modifying the initial spin-up procedure and incremental analysis update (IAU)) (cf. Bloom et al. [41]; Kleist and Ide [13]). In practical operations, modifying the initial spin-up procedure or applying IAU has so far been essential to mitigate the imbalance in the system. In our localization scheme, we use the sparse background error covariance (ρ°B) as an alternative to the B, the distribution of the model components, and the analyzed point in the spatial dimensions as shown in Figures 1(b) and 1(c). Schematic diagrams of the observations used in the localization scheme from the i-th grid point to the (i + 1)-th grid point. For efficiency, we compute the distances for only observations with the coordinates limits x and limits y . Valid local observations reside within the circle of the local range radius l, which is determined according to this distance. e physical distance between the analyzed point and the observation position is calculated before the state estimate is updated. e closer the distance to the observation position is, the greater the weight of the observation point, and vice versa.
Although we use a modified Cholesky decomposition to estimate the (ρ°B), (ρ°B) and B exhibit a strong correlation. e correlation is retained within a defined distance but has been eliminated beyond the defined distance. However, with high-dimensional systems, the ensemble covariance becomes full rank, and we have to deal with a great covariance matrix (by exploiting symmetry, we can represent it by using (m · (m − 1))/2 elements). One does not want to represent such a matrix in memory, and it is preferable to formulate the filter in some matrix-free manner. Most of the matrix-free formulations rely on one of two things: domain decomposition and Schur product. Here, we employ the truncated svd and represent such a matrix in memory.
Suppose that we have a truncated svd of B given by the following equation: where Q∈ Υ m×n orthogonal matrix, definite, equation (12) is a truncated singular value decomposition. In truncated svd scheme, if the ensemble size is smaller than the number of local observations, then the local ensemble perturbation is written as Note that X f′ ∈ Υ m×n represents n ensemble perturbations, which call that "augmented ensemble" for simplicity; and n denotes the ensemble increments. en, X f′ is a solution to equation (12) with n � n + 1 perturbations. For equation (13), how can we efficiently construct X f′ from Q · D 1/2 . Let X f′ ∈ Υ m×(n−1) matrix. We want to construct a matrix that has the same empirical covariance matrix and which is centered. Since X f′ has rank at most n − 1, we need to find a matrix Z ∈ Υ m×n such that − ξ) − 1 (multiplicative inflation factor) and Q ξ as the n × n matrix whose coefficients are as follows: It can be easily checked that Q ξ · Q T ξ � I (i.e., Q ξ is an orthogonal matrix) and Q ξ · 1 � e 1 , the first basis vector.
Let W ∈ Υ m×n matrix whose first column is zero and whose other columns are those of X f′ , that is, By construction, Z � W · Q ξ is a solution of equations (15) and (16).
For the study of covariance localization, it is mainly limited to the effect of each observation on the analyzed distance within a fixed horizontal distance L and vertical distance V. In addition, each model component within the background covariance matrix B is correlated with each other. erefore, in the context of the sequential assimilation scheme, the elements of the Kalman gain matrix K are zero beyond these distances (cf. Houtekamer and Zhang [14]; equation (20)). e observations in the grid point are applied to different weights based on their physical distance from the grid points. ese weights are mainly imposed on the inverse of the Schur product of the observation covariance matrix R and the correlation coefficient matrix to implement (cf. Hunt et al. [8]). e traditional localization function is an adjustable function with a distribution function similar to the probability density function of a normal distribution. In addition, efforts are being made to find the Gaussian function that can be fitted to the widely used Gaspari-Cohn function with compact support (cf. Gaspari and Cohn [42]), where the observations in this localization region are applied different observation weights and the Gaspari-Cohn function outside this localization region is zero. e step function is as follows: where l denotes the adjustable cutting radius. For observation weighting, there are mainly weights inside the observation region and weights outside the observation region (usually set to zero). erefore, equation (19) is also considered as domain localization (DL) (cf. Kirchgessner et al. [15]).

Twin Models and Experimental Settings
Here, n � 40, and the observation members are 20 (i.e., p � 20). e input parameters (e.g., inflation factor, localization radius, and ensemble size) are also considered. e initial conditions are generated from the filter's ensemble members.
e model was first integrated forward over several years of real-world time without assimilating any variables. e initial forecast ensemble members are similar except for the imposition of additional Gaussian noise. e true values are produced randomly by small perturbations of the instability. In this case, the initial ensemble members move away from the model attractor.
ere is no clear relationship between the spread and the error. e spin-up period of one and a half years is sufficient to eliminate any unfavorable effects (cf. Hoteit et al. [44]). Constant multiplicative inflation factor (cf. Anderson and Anderson [6]) is used to enlarge the ensemble spread and prevent filter divergence. e default strength forcing is 8. A set of data 6 Advances in Meteorology assimilation experiments for each data assimilation method is performed for 365 days, using the default settings and different assimilation parameters (inflation and localization). e manually tuned optimal parameters are shown in Table 1.

Experimental Settings for the SPEEDY Model.
e moderately complex SPEEDY model (cf. Ritchie [45]; Miyoshi [46]) is similar to weather dynamics. Its assimilation variables contain temperature, dimensional wind component, meridional wind component, specific humidity, surface pressure, and precipitation (cf. Bourke [47]). It has eight vertical levels of 96 × 48 resolution in the sigma coordinate system. e SPEEDY model creates an output file every 6 hours for weather prediction (cf. Greybush et al. [11]). A more detailed SPEEDY model is available in Molteni [48]. e initial ensemble of states is generated randomly by the model with a spin-up period of 1 year (cf. Kondo and Miyoshi [26]). e natural runtime starts from a stationary standard atmosphere (i.e., no wind) at 0000 UTC 1 January. e total model component is m � 133632. For all filter cases, the model state space exceeds the ensemble size (m ≫ n).
For the SPEEDY model, identical observations are generated by adding Gaussian-distributed random numbers with the natural run every 6 h at radiosonde-like locations for 8 levels (100 hPa, 200 hPa, 360 hPa, 520 hPa, 800 hPa, 850 hPa, 880 hPa, 925 hPa). e standard deviations of the observation errors for the zonal wind, temperature, specific humidity, and surface pressure are fixed at 1.0 ms −1 , 1.0 K, 1.0 gkg −1 , and 1.0 hPa.

Validation Statistics.
If there are m-dimensional model components, the root mean square errors (RMSEs cf. equation (18) in Bai et al. [21]) and the ensemble spread (cf. equation (22) in Hoteit et al. [44]) are applied to evaluate the assimilation performance error factors, the equation is as follows: where the subscript i denotes the index of the state components and the subscript j refers to the index of the ensemble members; x a i and x t i denote the i-th analyzed value and the true value, respectively; σ x i denotes the ensemble variance of x t i , the analysis quality of the assimilation algorithm can be measured more comprehensively by averaging the computations in space. Usually, better assimilation results correspond to smaller RMSEs.

Experimental Results for the Lorenz-96 Model.
is section analyzes the CL and CL-MC algorithms assimilation results using the Lorenz-96 model. Sensitivity experiments are conducted on the CL and CL-MC algorithms by configuring different localization scales with the default ensemble size, the fixed observation members, the strength forcing, and the manually tuned optimal inflation value ( Table 1). To improve the accuracy of the experiments, each experiment was repeated twice, applying their mean value as the evaluation criterion for each algorithm. e power spectral density (PSD; cf. Sakov and Bertino [10]) evaluation criterium is used to test the performance of the CL and CL-MC algorithms. With the different localization scales, the CL and CL-MC algorithms of PSD change correspondingly as shown in Figure 2. e blue curve "Forecast" in the graph represents the Fourier spectrum of the ensemble singular value of the forecast values, and the green curve "CL" and the red curve "CL-MC" represent the Fourier spectrum of the ensemble singular values of the analysis values. e PSD of the vertical coordinate indicates the PSD of the ensemble mean and the area under the panel represents the variance. Figure 2 shows the influence of the CL and CL-MC algorithms on the PSD with different localization radius. Here, the default ensemble size, observation fraction, observation error variance, strength forcing, and the optimization of the inflation factor are configured. e result shows that the areas under the curves gradually become small while the PSD value is less than the forecast value for the CL and CL-MC algorithms as the spectral component increases. In addition, as the spectral component increases, the PSD values for the CL and CL-MC algorithms exhibit different results. (1) e area under the curve for the CL-MC algorithm is always smaller than the CL algorithm; however, as the spectral component increases, the PSD values for the CL and CL-MC methods are far less than forecast values. (2) Based on the modified Cholesky decomposition leads to the magnitude of the CL-MC algorithm between 0.1 and 0.2, in the CL scheme, the magnitudes are between 0.10 and 0.30 when the localization radius is 10 (Figure 2(a)). When the localization radius is increased to 15, the modified Cholesky decomposition leads to magnitudes in the PSD of the CL-MC algorithm between 0.05 and 0.15 and in the CL algorithm between 0.13 and 0.25 (Figure 2(b)). It is worth noting that the PSD value of the CL and CL-MC methods become small when the localization radius is 20. However, the PSD value of CL-MC is still smaller than the CL. Overall, the CL-MC algorithm exhibits better performance than the CL algorithm. In the local analysis process, it is known from equation (1) that the update of Kalman gain plays an essential role in the analysis value, which affects the final analysis accuracy. Here, the default ensemble size, observation fraction, observation error variance, strength forcing, and the optimization of the inflation factor and localization radius are used (Table 1). Figure 3 shows the impact of CL and CL-MC methods on Kalman gain for different localization radii. In this case, the Kalman gain is a 40 × 20 matrix. e horizontal axes in Figure 3 indicate the index numbers of the corresponding observations, and the vertical axes indicate the index numbers of the state variables. e Kalman gain values were obtained by the CL and CL-MC methods, as well as without using the localization scheme. e figure is similar to literature Sakov and Bertino [10] in Figure 3. e results show that the Kalman gain without the localization scheme retains the spurious correlation from the long-range observations, whereas the CL and CL-MC methods eliminate the spurious correlation of the long-range in the gain matrix. Furthermore, we observed that the Kalman gain matrix exhibits significant diagonality and symmetry due to the use of localization schemes (including CL and CL-MC). However, the superior performance of the CL-MC method may be more significant compared to the CL scheme (as seen from the colorbar scale). In terms of the gain obtained for each observation (Kii), increasing the localization radius is favorable for the CL and CL-MC methods to obtain the small Kalman gain, whereas the CL, CL-MC, and without using the localization scheme have larger Kalman gain values when the observation index approaches 20 (fourth column of Figure 3).
Time series of the RMSEs for zonal wind (ms −1 ) at the fourth model level (∼510 hPa) with different ensemble sizes: (a) ensemble size � 75, (b) ensemble size � 100, (c) ensemble size � 125, and (d) ensemble size � 200. Here, 14600 assimilation steps correspond to 10 years from 0000 UTC 1 January 1982 to 0000 UTC 1 January 1992. e localization method is beneficial for the ensemble Kalman filter to eliminate spurious long-range correlations. Especially, the choice of localization parameters is unknown a priori, and most researchers obtain the optimal localization and inflation factor parameters sufficient to handle the corresponding assimilation problem with continuous iterative experiments. In this study, the inflation factor and localization are manually tuned with optimal parameters. Here, the default ensemble size, observation members, observation error variance, strength forcing, and the optimization of the inflation factor are used (Table 1). Figure 4 shows the time series of the RMSEs for the CL and CL-MC algorithms with different localization radii. In the operational model, the assimilation cycle is 1460 steps. One assimilation cycle is approximately equivalent to 6 hours of "weather". e 1460 assimilation cycles are one year (365 days). e results show that the CL and CL-MC methods achieve better filter performance with increasing localization radius. However, the CL and CL-MC methods perform large RMSEs since the instability of the system state at the initial moment. e RMSEs of the CL and CL-MC methods present convergence Localization radius � 15. (c) Localization radius � 20. "K(none)" represents the without using localization scheme; "K(CL)" and "K(CL-MC)" represent using and without using modified Cholesky decomposition, respectively. "Kii" denotes the Kalman gain from each observation for CL, CL-MC, and without using the localization scheme.

Advances in Meteorology 9
results as the assimilation step increases. After a period of assimilation steps, the RMSEs do not reduce and achieve a convergent trend, this may be due to the optimization of the inflation factor and the default ensemble members of 25, since small inflation and a large localization radius are desired. In addition, it is worth noting that after about 50 days of assimilation steps, the CL exhibits inferior balance (Figure 4(c)), this may be due to the missing part of the analysis increment in updating the local state variables, resulting in poor assimilation results. However, the CL-MC method shows better assimilation results and achieves wellbalanced during the whole assimilation process.

Experimental Results for the SPEEDY Model.
In the abovementioned experiments, the effectiveness of CL and CL-MC methods was checked by the toy Lorenz-96 model with different assimilation parameters. To further investigate the application of CL and CL-MC methods in large-scale models, in this section, the performance of the CL and CL-MC are further examined on the SPEEDY model. All experiments are initialized by the initial ensemble randomly chosen from the nature run at 0000 UTC 1 January (cf. Miyoshi [46] and Kang et al. [49]). Here, to avoid manual optimization of the inflation parameters, the adaptive covariance inflation method was applied (cf. Miyoshi [50]). To investigate the assimilation results of the CL and CL-MC methods in the large-scale models, an experiment was conducted using different ensemble sizes, since this choice plays an essential part in the DA process. In practical applications, the feasible ensemble size is usually smaller than the degrees of freedom of the system (or dimension of the unstable subspace or error subspace). e other assimilation parameters, such as localization scale (containing horizontal  (Table 2), and an adaptive covariance inflation method is also employed to enlarge the ensemble spread and prevent filter divergence. Figure 5 shows the time series of RMSEs of the CL-MC and CL methods are examined for the zonal wind at the fourth level mode (∼510 hPa). e results show that increasing the ensemble size improves the assimilation results for the CL-MC and CL methods. When the ensemble size is 75 and 125, the RMSEs profiles of the CL-MC and CL methods show a similar assimilation behavior (Figures 5(b) and 5(c)). However, as the assimilation step increases, in particular when the ensemble size is 100, 125, and 200, the RMSEs of CL-MC and CL methods present peaks (at the same assimilation step) (Figures 5(b) to 5(d)).
is may be the adaptive inflation for larger ensemble sizes, since small inflation and broad localization scales are what we expect as the ensemble size increases. e RMSEs of the CL-MC method are much smaller than CL when the ensemble size is 200, this implies that a larger ensemble size is beneficial for the CL-MC method ( Figure 5(d)). Overall, the CL-MC method outperforms CL, especially when the ensemble size is larger.
To better understand the mechanisms behind the CL and CL-MC methods and to test their applicability to real atmospheric systems. e zonal wind of the spatial distribution of the time-averaged analysis RMSEs for the CL-MC and CL methods are examined from 0000UTC 1 March 1982      RMSEs are approximately 0.120 ms −1 (Figure 6(e)). (2) In most of the regions, the CL-MC method obtains the smaller analysis RMSEs (the maximum value reaches 0.112 ms −1 ) ( Figure 6(f )). is is probably due to the CL-MC scheme retaining the main characteristics of background error covariance for larger localization radius. CL scheme neglectes some short to medium-range correlations. When the ensemble size is 200, over the Arctic, especially near 60°W-60°E where observations are relatively sparse, the time-averaged analysis RMSEs for the CL-MC method reach almost 0.04 ms −1 (Figures 6(h)), the CL scheme reaches about 0.10 ms −1 (Figure 6(g)), as a reference, observations are available only at the position shown at the intersection with an error standard deviation of 1.0 ms −1 . In general, these results demonstrate that the CL and CL-MC techniques tested here resulted in a well-constrained analysis, and the error is significantly reduced with increasing ensemble sizes, but when the ensemble size increases from 75 to 125, the error of the CL scheme does not present a significant reduction in the few areas where observations are sparsely distributed. is may be due to the broader localization scale and the larger inflation factor used for the CL scheme since the less inflation and the wider localization scale is what one would expect when the ensemble size increases (Figures 6(a), 6(c) and 6(e)). erefore, the superior behavior of the CL-MC method here may be more significant when considering practical application aspects.
Finally, the stability of the CL and CL-MC methods for adaptive covariance inflation factor is explored by using time-averaged analysis ensemble spread instead of timeaveraged analysis RMSEs (cf. Kondo and Miyoshi [26]). e results show that in regions with few observations, the effect of the CL and CL-MC methods on the time-averaged analysis ensemble spread is larger with increasing ensemble size. In contrast, in regions with a larger number of observations, the time-averaged analysis ensemble spread values are surprisingly small (Figures 7(a) to 7(h)). us, the adaptive covariance inflation factors exhibit strong robustness for different ensemble sizes.  relatively large adaptive covariance inflation factors are used.
To prevent such a situation, the short-term forecast RMSEs of the CL and CL-MC methods are examined with the default observations, observation error covariance, localization scale, and adaptive covariance inflation factors. Here, the ensemble size is 200 members and the analysis ensemble mean is used as the initial condition ensemble members. For all variables (including zonal wind, temperature, specific humidity, and surface pressure), the average 6-day forecast RMSEs for 245 forecast cases from 0000UTC 1 March to 0000UTC 1 May are investigated (Figure 8). e results show that for all variables, except for the specific humidity, the profiles of the average 6-day forecast RMSEs are never crossed. e average forecast RMSEs of the CL-MC method is always smaller than CL for the entire 6-day forecast from the initial time. By calculating the average 6-day forecast RMSEs for 245 forecast cases, we find that the 1-day forecast RMSEs of CL is approximately equal to the 5-day forecast RMSEs of CL-MC. is suggests that the CL-MC method has potential advantages for relatively large ensemble sizes and for considering the correlation of short-term forecast errors. However, during the experimental settings, the results may be overly optimistic since assumptions such as fully known observation error statistics and a simplified model with low resolution are used. In general, the CL-MC clearly outperforms the CL method in terms of short-term forecasting.

Conclusion
In ensemble DA, since using a large ensemble size leads to sampling error and imbalance in the DA system, this paper proposes a covariance localization method (CL-MC) with Cholesky decomposition to eliminate sampling error and achieve the estimation of the inverse of the background covariance matrix. Several advantages of using such a new approach are listed below. (1) Typically, the ensemble covariance is a full-rank matrix, and most rank-deficient matrices depend on the Schur-product. Since the matrix has large dimensionality, it is not favorable for matrix storage. e predefined sparse matrix structure is constructed as the inverse of the covariance coefficient by Cholesky decomposition. If two distant model components are uncorrelated, the corresponding entries in the inverse covariance matrix are zero, thus avoiding the use of excessive memory; (2) Since the only nonzero entries in the Cholesky factors correspond to model components close to each other, the use of sparse structure on the inverse background covariance matrix is a form of covariance localization. (3) To facilitate the storage of the background error covariance matrix, an adjustable singular value decomposition method is applied to the background error matrix.
To verify the effectiveness of the proposed method, a series of experiments are conducted in the simple Lorenz-96 model and the SPEEDY model. e results of the Lorenz-96 model show that the RMSEs obtained by CL-MC are smaller than the CL method with sparse observation results (p � 20) . In addition, increasing the localization radius is beneficial for using more observation information and improving the assimilation result of the proposed algorithm. e results of the SPEEDY model show that increasing the ensemble number is favorable for the CL and CL-MC methods to obtain better assimilation results. However, the differences in the spatial distribution of the analysis ensemble spread and the analysis RMSEs between the CL and CL-MC methods are more obvious. is may be due to the fact that the use of adaptive covariance localization is susceptible to large inflation factors in the filtering process and reflects the changing characteristics of the background field of the chaotic model, which is not a good choice for the DA system. us, investigating adaptive covariance localization methods is a future topic. In addition, calculating the average 6-day forecast RMSEs for 245 forecast cases, we find that the 1-day forecast RMSEs of CL are approximately equal to the 5-day forecast RMSEs of CL-MC. is suggests that the CL-MC method has potential advantages for relatively large ensemble sizes considering the correlation of short-term forecast errors.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.

Authors' Contributions
Mingheng Chang performed conceptualization, methodology, software, validation, original draft preparation, and reviewing. Hongchao Zuo supervised conceptualization and methodology. Jikai Duan performed investigation and Editing.