A New Method for TSVD Regularization Truncated Parameter Selection

The truncated singular value decomposition (TSVD) regularization applied in ill-posed problem is studied.Throughmathematical analysis, a newmethod for truncated parameter selection which is applied in TSVD regularization is proposed. In the newmethod, all the local optimal truncated parameters are selected first by taking into account the interval estimation of the observation noises; then the optimal truncated parameter is selected from the local optimal ones.While comparing the newmethodwith the traditional generalized cross-validation (GCV) and L curve methods, a random ill-posed matrices simulation approach is developed in order to make the comparison as statistically meaningful as possible. Simulation experiments have shown that the solutions applied with the new method have the smallest mean square errors, and the computational cost of the new algorithm is the least.


Introduction
Ill-posed problem is widespread in the field of geophysical survey, such as GNSS rapid positioning, precise orbit solution of spacecraft, and downward continuation of airborne gravity [1][2][3][4][5][6][7].The least square estimation is not stable in the ill-posed problem and other approaches have to be utilized.Tikhonov [8] proposed the Tikhonov regularization theory, Hanke [9] utilized iteration while Walker and Birch [10] and Hoerl and Kennard [11] applied ridge regression method.Hańćkowiak [12] and Lin et al. [13] also studied this problem.Among those approaches, methods based on singular value decomposition (SVD) have drawn extensive attention, thanks to the numerical stability of their solutions [7].On that basis, Wiggins [14] and Xu [15] intensively studied the truncated singular value decomposition (TSVD) regularization method.
The standard form of the observation equation is (nonstandard form can be transformed to a standard one) as follows: where  is the  ×  coefficient matrix,  > , rank() = ,  is the  × 1 parameter vector,  is the  × 1 observation vector,  is the  × 1 residual vector, () = 0, and (  ) =  2 0 .The condition number of normal matrix ( =   ) is generally used as the measurement of the ill-posed severity as follows: where  1 and   are the maximum and minimum singular values of , respectively.According to application experiences, if lg(cond) > 4, the system is considered as ill-posed [16].
In ill-posed problem, some singular values of the coefficient matrix approximate to 0, the least square estimation will enormously amplify the observation noises and degrade the precision.In TSVD regularization, items containing these small singular values are discarded to maintain the stability of the solution.Setting a small positive number  as the threshold value, items containing singular values smaller than  are discarded; thus, the amplification to the observation Mathematical Problems in Engineering noises is restrained.If there are  singular values bigger than , the TSVD solution is given as Normally, the mean square error (MSE) is used to evaluate the quality of ill-posed problem solution.The key point of the TSVD is how to select a proper truncated parameter to get the smallest mean square error of the solution.Many scholars have extensively studied this problem.Golub et al. [17] proposed GCV method, Hansen [18,19] and Hansen and O'Leary [20] made use of the property of the  curve to select truncated parameter, and these methods will be introduced in Section 2. A new truncated parameter selecting method will be proposed in Section 3. In Section 4, a random ill-posed matrices simulation approach for simulation experiment will be developed, and the performances of GCV,  curve, and the new methods will be compared Section 2.

Traditional Truncated Parameter Selecting Method
2.1.GCV Method.Based on statistical point of view, Golub et al. [17] proposed GCV method for truncated parameter selection.In linear system  = , assuming that  # is the matrix mapping observation vector  to regularized solution X ,  #  = X .GCV method is equal to selecting a truncated parameter  to minimize the GCV function, which is defined as where "trace(⋅)" denotes the trace of a matrix.The disadvantage of GCV method is that the GCV function converges very slowly in some cases.It may lead to the GCV function minimization while  is equal to , in which situation, the truncated parameter cannot be selected, and the GCV method fails.

𝐿 Curve Method.
Hansen [18,19] extensively studied the application of  curve method in regularization.With the variation of , the discreet points (ln(‖ X ‖ 2 ) and ln(‖ −  X ‖ 2 )) ( = 1, 2 . . ., ) can be fitted with a cubic spline curve  = Ω().He defined the maximum curvature point of this curve by (5) [18] as follows: where  and  are the independent and dependent variables of the cubic spline function, respectively.The original discreet point which lies closest to the maximum curvature point from the left side is selected.Its truncated serial number  is the truncated parameter asked for.
Further descriptions of the  curve method can be seen in Hansen [18,19].Hansen [19] indicated that the truncated parameter selected by the  curve method is not the optimum parameter but the asymptotically optimum one.

A New Truncated Parameter Selecting Method
In this section, we develop a new approach to selecting the truncated parameter.We focus on the MSE of the solution, making some proper estimation, and finally derive the conditions that the optimal truncated parameter should meet.In this section and the subsequent sections Taking the singular values decomposition into consideration, the least square solution of the observation equation can be written as According to (1) and (3), we rewrite the TSVD solution as Inserting ( 8) into (6) yields As mentioned above, if  is the local optimal value, the following inequalities are satisfied: Denoting  =    = [ 1 ,  2 . . .,   ]  , inequalities (10) can be rewritten as As  2 0 and  are unknown, we should estimate them first.Premultiplying matrix   to both sides of (7), we obtain a new vector denoted as Ŷ as follows: where ŷ is the th element of Ŷ .
In order to overcome this shortcoming, the interval estimation of  2 0 is used instead of σ2 0 in this algorithm.Assuming that [ 1 ,  2 ] is the confidence interval of  2 0 under the confidence level 1 − , we replace  2 0 with  1 and  2 in the first and second inequality in inequalities (15), respectively.By this means, a fault-tolerance interval between the judgment thresholds is given.As seen in Figure 1, by properly selecting  1 and  2 , there are  1 < ( + 1) < () <  2 , and so,  is judged as the local optimal value.
To get a proper interval estimation of  2 0 , analysis goes as follows.
The estimation of the observation noises is where ε ∼ (0,  2 0 ) (note that among the  elements, only  −  of them are independent).
Thus, according to the definition of  2 distribution, 0 under the confidence level 1 − , we yield Considering the  2 distribution, (18) can be transformed as follows: Selecting Equation ( 20) is the estimator of  1 and  2 , the final local optimal truncated parameter selecting inequalities reads as follows: There is an undetermined parameter in inequalities ( 21): the confidence level is 1 − .This parameter adjusts the size of the fault-tolerance interval.Enlarging  will decrease the fault-tolerance interval and the risk of leaking local optimal value increases.While decreasing  will increase the fault-tolerance interval, it will cause the judgment efficiency to decline.In order to balance the fault-tolerance and the judgment efficiency of inequalities (21), we first set  = 0.5 in this algorithm.If no truncated parameter matches inequalities (21), the value of  will be decreased until one or more truncated parameters can match it.

Optimal Truncated Parameter Selection.
If there is only one local optimal value, we regard it as the optimal value.If not, assuming that there are  local optimal values  1 ,  2 . . .,   , It is essential to select the optimal value from them.Analysis goes as follows.
The mean square error function can be transformed as where "cov(⋅)" denotes the covariance matrix.Inserting (1) and ( 3) into (22), the first and second items of which can be simplified as follows: ( Assuming that   and   are both local optimal values,   <   , MSE( X  ) minus MSE( X  ) is denoted as Δ(  ,   ), we derive According to inequalities (23), to determine the sign of Δ(  ,   ), we take proper estimations of ∑ simultaneously.If we regard  as an unknown random vector, the elements of which are independent and have the same statistics weight.That is, (  ) =  2  .Then, it yields As   and   are both local optimal values, the regularization solutions X  and X  are much more stable than the least square solution.We estimate  2   by X  and  2 0 by its unbiased estimator σ2 0 , which yields According to (26), if 1/ 2  < 0, we regard   as the better truncated parameter and vice versa.After comparing all the local optimal values  1 ,  2 , . . .,   utilizing (26), we can get the minimized MSE of all the solutions, and the optimal truncated parameter is determined.

Algorithm Flow.
In Sections 3.1 and 3.2, we have discussed the conditions that the optimal truncated parameter should meet.And then, the equations to select the local optimal and the optimal truncated parameter are deduced in detail.Although the derivation process is somehow complicated and trivial, the execution of this algorithm is quite simple.In this subsection, we summarize the steps of the algorithm flow to make it clear and more readable.

Begin
Step 1. Set the confidence level of interval estimation 1 −  = 0.5.
Step 3.For each truncated parameters  from 1 to −1, calculate inequalities (21).If the inequalities are satisfied, select the corresponding truncated parameter as a local optimal truncated parameter.
Step 6.If there is only one local optimal parameter, it is the optimal truncated parameter simultaneously; else, make pairwise comparisons of the local optimal truncated parameters according to (26), until getting the one minimizing the MSE as the optimal truncated parameter.

The Approach of Random Simulation.
Although we can use a particular coefficient matrix  to compare different truncated parameter selecting methods, such comparisons are of limited value [21].Because the doubt to the conclusions drawn from them will be threefold: (i) whether the conclusions are still valid if the dimensions of the coefficient matrix have changed; (ii) whether the conclusions are still valid if the condition number and singular values distribution of the coefficient matrix have changed; (iii) whether the conclusions are valid if they are applied to another coefficient matrix which has the same dimensions and condition number with .
In order to make the comparison of the three methods as persuasive as possible, we have to simulate the coefficient matrices by taking into account the three issues mentioned above.According to (2),  = Λ  ; thus, the simulation problem has turned into how to design the matrices , Λ, and  and their dimensions.
Theoretically, the dimensions  and  can be set any arbitrary positive integer which satisfy:  >  > 1. Considering the computing burden, we set 5 ≤  ≤ 25 and  <  ≤ 40, each of them is randomly generated in their interval.As it is mentioned in Section 1, a coefficient matrix is regarded as ill-posed if lg(cond) > 4. But the condition number cannot be infinite; otherwise, the ill-posed matrix will be turned into rank defect matrix.In the simulation, the mathematical expectation of lg(cond) is uniformly distributed over [4,7].Furthermore, about the singular values, we assume that the mathematical expectation of the sum of the logarithms of all the singular values is 0. Finally, the equations set can be written as follows: where the random variables  and  conform to uniform distribution,  ∼  [4,7] ;  = /( − 1) is also a random variable,  ∼  [−,] .
Utilizing the Gram-Schmidt method, unitary matrix  (or ) can be derived from an -dimension (or -dimension) random full rank matrix.For instance, assume that  is an -dimension random full rank matrix and  1 ,  2 , . . .,   are column vectors of ; thus, the column vectors of  are as follows: To make the simulation meaningful, we sample 100 random generated coefficient matrices for our simulation experiment.The dimensions and condition numbers of these matrices are shown in Figures 2(a) and 2(b), respectively.

Comparisons of Truncated Parameter Selecting Methods.
In this section, we will compare the performances of TSVD solutions applied with the GCV,  curve, and the new methods proposed in this paper.The criterion is the mean square errors of the solutions.In the simulation, we set  as -dimension vector, all elements of which are 1.The elements of residual vector  are Gaussian white noises, generated by randomizer,  ∼ (0,  2 0 ).To enrich our comparison, two observation noise levels are used in our simulation, the variances of which are   /(100) and   /(1000).In consideration of the randomness of , objective comparisons cannot be obtained if we only do the experiment for once.In our simulation, 1000 times independent repeated experiments have been done for each coefficient matrix and each observation level.The mean square errors which are used in our comparisons are the mean value of those in the 1000 experiments.

(i) Results Comparison between GCV and the New Methods.
The comparisons between GCV and the new methods under the observation noise levels  2 0 =   /(1000) and  2 0 =   /(100) are shown in Figures 3(a) and 3(b), respectively.In order to gain further details of the comparisons, we draw the -value curves of their mean square errors.The value curves of Figures 3(a (iii) Time Consuming Comparisons.Besides the quality of results, computational cost is another important aspect to evaluate the effectiveness of an algorithm.In the simulations, we record the time consuming of the algorithms for each coefficient matrix.The average time consuming of these three algorithms are shown in Figure 5 the cost of the new method is obviously smaller than that of the other two by dozens of times.With the increasing of the matrices' dimension or the count of times, this advantage will be more and more obvious.Another conclusion we can get from Figure 5(b) is that the observation noises only have a little influences with the computational costs.

Conclusions
A new truncated parameter selecting method is proposed in this paper.Focusing on how to decrease the MSE of the solution, we divide the truncated parameter selection into two parts: local optimal and optimal truncated parameter selection.Detailed analyses are made and finally the equations to select the local optimal and the optimal truncated parameter are obtained.Although the derivation of the new algorithm is somehow trivial, its execution is quite simple and efficient.
The key point of local optimal truncated parameter selection is the estimation of the observation noises.To overcome the inaccuracy of point estimation, we apply the interval estimation with the confidence level 1 − .The parameter  balances the fault-tolerance and the judgment efficiency of local truncated parameter selection.In this algorithm,  is firstly set as 0.5, it can adjust automatically according to the situation of local truncated parameter selection.The selection of optimal truncated parameter is based on the local optimal truncated parameters.We regard the parameter vector as a random variant vector, making use of the stability of its regularization solutions, and finally derive the equations of optimal truncated parameter selection.
A random ill-posed matrices simulation approach is developed and a great deal of experiments is made in Section 4. The results have shown that the solutions applied with the new method have the smallest mean square errors.Meanwhile, the computational cost of the new method is the least of the three.

Figure 2 :
Figure 2: Dimensions and condition numbers of the 100 coefficient matrices randomly generated, respectively.

Figure 3 :
Figure 3: Results comparison between GCV and the new methods in two observation noise levels.

Figure 4 :
Figure 4: Results comparison between  curve and the new methods in two observation noise levels.

Figure 5 :
Figure 5: (a) time consuming comparisons of the three methods; (b) time consuming of the new algorithm in two noise levels.