TEC Data Forecasting Using a Novel Nonlinear Model

A novel nonlinear TEC forecasting model is proposed in the paper; the main produces of the model are as follows: first the EOF decomposition of TEC data is made, then the genetic algorithm is used to establish the nonlinear time field model, and finally the decomposed space field and the predicted time field are reconstructed to achieve the purpose of forecasting the TEC data. Experiments indicate that the performance of the novel forecasting model is effective and superior to the direct forecasting and linear forecasting models.


Introduction
The ionosphere is the main area of spaceflight activities, and during these activities the processes of energy transfer and severe disturbance will occur [1,2].These ionospheric activities have considerable influence on human military activities and daily life.TEC (Total Electron Content) is the key parameter in describing ionospheric characteristics, so that measuring and forecasting TEC is significant [3][4][5][6], since now there have been some TEC forecasting methods proposed in many literatures [7][8][9][10].However, the forecasting accuracies of those methods are not high enough and are possible to be improved further.Hence, it is essential to propose other novel methods that further improve the accuracy of TEC forecasting.
In the experiment, the global TEC data of 2007 are derived from the Shanghai observatory of Chinese Academy of Sciences.The data are observed every two hours, and there are 4380 observations in a year.The observatory area is from 180W to 180E and 87.5S to 87.5N, the data pixel size is 5183 × 31 × 12, and the resolution in longitude is 5 ∘ and 2.5 ∘ in latitude.
In the actual practice, the global TEC data are derived from the Shanghai observatory of Chinese Academy of Sciences, and the TEC of the area of 180W to 180E and 87.5S to 87.5N are observed every two hours; the resolution in longitude is 5 ∘ and 2.5 ∘ in latitude, so there are total 5183 (73 × 71) gird points.It is feasible in theory to forecast the TEC of the 5183 grid points; however, the forecasting error is very large without consideration of changing physics features of the TEC of grid points.Therefore, in order to get over the above defects and improve the forecasting accuracy, a new forecasting model is proposed, and the main processes are described as follows: firstly the TEC values of the 5183 points on the Earth-fixed coordinate are converted to the values on the sun-fixed coordinate to reflect the physical features of TEC data distribution, and then the data on the sun-fixed coordinate are decomposed by the method of EOF [11,12].Taking the time field as a series of nonlinear time series, then the genetic algorithm [13][14][15] is used to establish the best functional relation of time series to forecast the time field.The results confirm that the method is better than traditional forecasting models.
The paper is organized as follows.The principle and detailed processes of the proposed forecasting model are described in Section 2. The results and analytics are elaborated in Section 3. A conclusion of the work is given in Section 4.

Principle of Nonlinear Forecasting Model
The principle of the proposed forecasting model contains three parts: the first step is transforming the original Earth-fixed coordinate to sun-fixed coordinate, the second step is making EOF decomposition of the TEC data, the third step is using the genetic algorithm to establish the nonlinear forecasting equation to forecast the time field, and the final step is reconstructing the decomposed space field and the predicted time field; after finishing the above steps the purpose of forecasting TEC and reducing noise can be achieved.
The general graphical procedures of forecasting model are illustrated in Figure 1, and the detailed processes of the above steps are described in Figure 1.

Coordinate Transform.
The TEC data are recorded on Earth-fixed coordinate and, in order to reflect physical features of the TEC distribution, the original Earth-fixed coordinate needs to be converted to sun-fixed coordinate.
The way of transforming the Earth-fixed coordinate system to the sun-fixed coordinate system can be described in (1), where TEC(Lat, Lon, UT) denotes the three-dimension distribution of TEC data on the Earth-fixed coordinate and TEC(Lat, LT) denotes the two-dimension distribution of TEC data on the sun-fixed coordinate system, Lat and Lon are geographical latitude and longitude, respectively (in degrees for unit), and UT and LT are the world and local time (with the hour for the unit), respectively.The main advantages of the sun-fixed coordinate system are as follows: first the calculation is simplified and its stability is improved by reducing the dimension, and then the precision of estimating is improved by making full use of the high temporal resolution of GPS observation:

EOF Decomposition and Reconstruction. Assume matrix
is the original observation data and the dimension of  is  ×  ( > ), where  is the spatial dimension and  is the temporal dimension.Each column of  corresponds to a dataset of size , and there are  observation data.The EOF decomposition of  can be written as  =   , where  represents the spatial decomposition field and its dimension is  × ,  represents temporal decomposition field and its dimension is  × , and  is a  ×  ( ≤ min(, )) diagonal matrix filled with eigenvalues.When the values of matrix , , and  are confirmed, the matrix  can be resolved via reconstruction formula  =   = ∑  =1        , where   is the -order eigenvalue and  ( ≤ ) is the order.The optimal  is a key point during the EOF data reconstruction.On one hand, if the value of  is too small, the reconstructed data cannot reflect the inner physical distribution.On the other hand, if the value of  is too large, the errors are large and the calculation time is long.The general criterion of determining the optimal  is as follows: the variance contribution rate of the  orders eigenvalues should be 90% of the variance contribution of the whole eigenvalues established by the general approach of sensitivity analysis [16].The criterion can make the reconstructed data reflect the inner physical distribution and avoid the large error.

Nonlinear Forecasting of Time Field by Genetic Algorithm.
The time field can be taken as a series of nonlinear time series which are described in (2), where  is a scalar index for the time series,  is the time delay, and  is the embedding dimension.Consider ( We need to determine the dependence of the state value () on its previous state values.Takens Embedding Theorem guarantees that the system's state information can be recovered from a sufficiently long observation of the output time series.According to the theorem, the system state follows the existence of a smooth map  :   →  satisfying  () =  ( ( − ) ,  ( − 2 ⋅ ) , . . .,  ( −  ⋅ )) . ( Thus, the time series can be forecasted by establishing the functional relation (⋅) with the genetic algorithm.The genetic algorithm is an evolutionary method that mimics the process of natural selection.GA generates solutions to optimization problems using techniques inspired by natural evolution, such as selection, crossover, and mutation.During each generation, a proportion of the existing population is selected to breed a new generation through a fitness-based process.Crossover is a process of taking more than one parent solutions and producing a child solution from them.It is analogous to reproduction and biological crossover.Mutation is used to maintain genetic diversity from one generation of a population of genetic algorithm chromosomes to the next.It is analogous to biological mutation.Mutation alters one or more gene values in a chromosome from its initial state.The purpose of mutation is preserving and introducing diversity.Hence GA can come to better solution by using mutation.The detailed procedures of using genetic algorithm to establish the best functional relation are described as follows.
Step 1 (set the encoding rules).In order to encode the symbolic form of the equation strings into a numerical structure, the encoding utilizes coordinate pairs, and the detailed rules are listed in Table 1.The second coordinate indicates an argument or operator in the equation.If the second coordinate is 1, then the pair represents operator.If the second coordinate is 2, then the pair represents constant term of the equation.If the second coordinate is 3, then the pair represents the variable of the equation.
Step 2 (initialize).The precursor process is the generation of initial population of individuals as a basis for future generations.As the encoding rule of the equation is special, in order to make the decoded individual form the equation, there are several rules when building the initial individuals.
The first two elements of the individual must be constant or variable terms; the last one must be an operator.
Given one location in the individual the number of nonoperators (constant or variable terms) on the left must be greater than the number of operators.
The total number of nonoperators in the individual must be the total number of operators plus one.
Step 3. Calculate the fitness value.The fitness value of the individual is computed as where  is the total length of the training set.
Step 4. Make selection, crossover, and mutation genetic operation.The selection operator chooses individuals that are used for crossover and mutation based on the fitness of the individuals.Proportional selection operator is used to select the individuals, and the probability of selecting -individual is proportional to its fitness value.The probability is computed as Once the mates are selected according to their strength, the crossover operation between the two parent individuals is carried out to generate two new offspring.The crossover procedure starts determining randomly one of the arguments, constant terms, or variables terms of the time series in the first individual.A mutation operation is taken to yield solutions with new information.In order to preserve the information of the top ranked individual, the mutation is not applied to the best individual with the minimal fitness value of the current iteration.Each element of a determined string is changed by a mutation process with some probability.An element is randomly selected and, depending on whether it is a number, a variable, or an operator, if the element is an operator, then the element is self-mutated; if the element is a number or variable, then the element is self-mutated or mutual mutated.
Step 5. Determine the best individual of the current iteration and check whether the stopping criterion is met; if the criterion is met then decode the optimum individual and obtain the final equation expression; otherwise, continue the iteration processes until the criterion is met or the maximum iteration number is achieved.

Results and Analysis
In the numerical experiment 365 data samples of the whole year 2007 are selected and the front 280 samples are used to establish the forecasting model, the left 85 samples are used to test and verify.The distribution of forecasting errors is shown in Figure 2, where Figures 2(a), 2(b), 2(c), and 2(d) denote that the forecast span is one day ahead, two days ahead, four days ahead, and seven days ahead, respectively.The detailed forecasting precisions of different time scales are listed in Table 2, where Δ is the error of the real and the predicted value; the percentages of the values of Δ belonging to different intervals (Δ < 1, 1 ≤ Δ < 2, 2 ≤ Δ < 3, and Δ ≥ 3) are also listed in the table.It can be seen from the results that the forecasting performance is good overall, and the forecasting average error of one day ahead is the lowest; with the forecasting span increasing the errors increase overall.Besides, it can be also found from Figure 2 that the areas with large forecasting errors are concentrated in the equatorial region, and the time is concentrated from 10LT to 17LT; the main reason is that the asymmetry structures of ionospheric always exist in the equatorial region and the  In order to indicate the advantage of the nonlinear forecasting model based on genetic algorithm, the traditional forecasting models such as direct forecasting and linear forecasting are also used to forecast under the same condition.The detailed results and analysis are also elaborated in the following.The forecasting model of the direct forecasting is ARIMA (Auto Regressive Integrated Moving Average) model [8], and it is adequate for nonstationary time series.The distribution of forecasting errors is shown in Figure 3, where Figures 3(a), 3(b), 3(c), and 3(d) denote that the forecast span is one day ahead, two days ahead, four days ahead, and seven days ahead, respectively.The detailed forecasting precisions of different time scales are listed in Table 3.Compared with the above proposed nonlinear forecasting model it can be found that the forecasting errors are much larger and the forecasting performance of long forecasting span is obviously poor.The main reason is that the grid TEC data are simply taken as independent time series and the inner correlations between different grid TEC data are not considered.The linear time series forecasting model is also ARIMA model; the coefficients of ARIMA model are evaluated by least square method.The distribution of forecasting errors is shown in Figure 4, where Figures 4(a), 4(b), 4(c), and 4(d) denote that the forecast span is one day ahead, two days ahead, four days ahead, and seven days ahead, respectively.The detailed forecasting precisions of different time scales are listed in Table 4.It can be seen from the results that the forecasting performance is better than the direct forecasting model, and the main reason is that the EOF decomposition and reconstruction are used in the linear forecasting model, and then the inner correlations between different grid TEC data are considered.However, compared with the nonlinear forecasting model the performance is worse; the main reason is that the linear ARIMA model is used to express time field in the linear forecasting model and the nonlinear model which is established by genetic algorithm is used to express time field; in fact, the distribution characteristics of time field are nonlinear and chaotic, so the nonlinear model estimated by genetic algorithm is better than the linear ARIMA model.

Conclusion
In the paper, a novel forecasting model is proposed.There are three main features in the model: first the Earth-fixed coordinate is converted to sun-fixed coordinate to reflect the physical features of TEC data distribution, then the data on the sun-fixed coordinate are decomposed by the method of EOF to reflect inner correlations between different grid TEC data, lastly the time filed is taken as a series of nonlinear time series, and then the genetic algorithm is used to establish

Figure 1 :
Figure 1: General graphical procedures of forecasting model.

Figure 2 :
Figure 2: Error distributions by nonlinear forecasting based on genetic algorithm.

Table 1 :
Encoding rule of equation strings.

Table 2 :
Statistics of forecasting precisions of different time scales by nonlinear forecasting.

Table 3 :
Statistics of forecasting precisions of different time scales by direct forecasting.

Table 4 :
Statistics of forecasting precisions of different time scales by linear forecasting.