Real-Time Modeling of Regional Tropospheric Delay Based on Multicore Support Vector Machine

Real-time modeling of regional troposphere has attracted considerable research attention in the current GNSS ﬁeld, and its modeling products play an important role in global navigation satellite system (GNSS) real-time precise positioning and real-time inversion of atmospheric water vapor. Multicore support vector machine (MS) based on genetic optimization algorithm, single-core support vector machine (SVM), four-parameter method (FP), neural network method (BP), and root mean square fusion method (SUM) are used for real-time and ﬁnal zenith tropospheric delay (ZTD) modeling of Hong Kong CORS network in this study. Real-time ZTD modeling experiment results for ﬁve consecutive days showed that the average deviation (bias) and root mean square (RMS) of FP, BP, SVM, and SUM reduced by 48.25%, 54.46%, 41.82%, and 51.82% and 43.16%, 48.46%, 30.09%, and 33.86%, respectively, compared with MS. The ﬁnal ZTD modeling experiment results showed that the bias and RMS of FP, BP, SVM, and SUM reduced by 3.80%, 49.78%, 25.71%, and 49.35% and 43.16%, 48.46%, 30.09%, and 33.86%, respectively, compared with MS. Accuracy of the ﬁve methods generally reaches millimeter level in most of the time periods. MS demonstrates higher precision and stability in the modeling of stations with an elevation at the average level of the survey area and higher elevation than that of other models. MS, SVM, and SUM exhibit higher precision and stability in the modeling of the station with an elevation at the average level of the survey area than FP. Meanwhile, real-time modeling error distribution of the ﬁve methods is signiﬁcantly better than the ﬁnal modeling. Standard deviation and average real-time modeling improved by 43.19% and 24.04%, respectively. Empirical models show disadvantages of poor regional


Introduction
Zenith tropospheric delay (ZTD) correction methods can generally be divided into model correction [1], parameter estimation [2], and external correction methods [3]. e parameter estimation method is generally used in precise point positioning (PPP) and long baseline solution, and other methods are needed to obtain ZTD prior value before parameter selection. e external correction method requires expensive equipment and demonstrates difficulty in obtaining accurate results in a short time. e model correction method is the focus of this study. e ZTD model can be divided into two types, namely, measured meteorological (such as Black, Saastamoinen, and Hopfield models) and empirical (such as GPT series, GZTD, IGGtrop, EGNOS, and UNB models) models, according to whether measured meteorological parameters are needed [4]. Measured meteorological models demonstrate disadvantages of difficult-to-obtain meteorological parameters and low applicability. Empirical models show disadvantages of poor regional correction effect and dependence on a large number of external grid data. Establishing the ZTD model with nonmeteorological parameters in the region without inputting a large number of external initial data is important in practical application [5,6]. Time series analysis technologies have been used in previous studies to explore such problems. Dai et al. [7] established a precise troposphere delay model in the Hong Kong region using the calculated troposphere delay of the CORS network for many years. e model takes into account the influence of latitude and elevation of the station and obtains satisfactory results using curve fitting method. ZTD was modeled using the historical ZTD value, and a tropospheric correction prediction model without measured meteorological parameters based on spectrum analysis and autoregressive model (AR) compensation was proposed [8].
In recent years, machine learning (ML) algorithms have been widely used and demonstrated their modeling and predictive accuracy in fields of data mining, pattern recognition, regression, classification, and spatial interpolation. Regional ZTD modeling based on ML method has the advantages of better describing the nonlinear variation law of ZTD, conducting high-precision and high-stability modeling in a large area without actually measuring meteorological parameters and better utilizing meteorological big data for data mining. It is successfully employed in ZTD modeling to overcome the shortcomings of the above two types of ZTD models; A GPS tropospheric delay interpolation model based on radial basis function (RBF) neural network was constructed, and the accuracy of tropospheric delay interpolation of the model can reach the millimeter level with CORS station data of Anhui power system [9]. A fusion model was proposed to compensate for errors in prediction of ZTD, by incorporating backpropagation neural network (BPNN). Modeling results from the Hopfield and proposed model were compared by reference to the true ZTD. It showed that the proposed model could improve ZTD prediction accuracy by more than 90% [10]. e traditional BPNN model was initialized with rich regional longterm continuously operating reference stations (CORS) information to reduce model parameters. e experimental results with National Center for Atmospheric Research (NCAR) troposphere data showed that the improved BPNN model demonstrates remarkable improvement in fitting and prediction accuracy [11]. A regional fusion model with BPNN and genetic algorithm that obtains high accuracy (57% higher than EGNOS) and independent of large external amounts of initial data was established on the basis of the EGNOS model [12]. A method of tropospheric delay interpolation estimation for long baseline network real-time kinematic (RTK) based on support vector machine (SVM) theory was proposed. e experimental results showed that the tropospheric delay accuracy of the interpolation estimation was better than 2 cm, and the estimation error was generally stable [13]. Compared with the traditional BPNN algorithm, the advantage and feasibility of the proposed regional ZTD model based on least squares support vector machine algorithm using global navigation satellite system (GNSS) and ERA5 Data were investigated in [14]. Regional ZTD modeling based on ML method has important applications in real-time precise positioning, real-time ZTD vertical profile inversion, atmospheric water vapor parameter inversion, extreme weather forecast, regional InSAR deformation monitoring, and so on. A method based on a relevance vector machine was developed for retrieving atmospheric refractivity profiles in real time from the slant path tropospheric delay at low elevations (<5°) of a single groundbased GPS receiver. e simulation and experiment results showed that the retrieving profiles based on the proposed method were better than those obtained by Lowry's method and the Hopfield model [15]. Sangiorgio et al. [16] implemented advanced deep learning predictors, such as the deep neural nets (DNNs), for the forecasting of the occurrence of extreme rainfalls.
e experimental result showed that the DNN with ZTD model accuracy improved 10%, compared with the traditional logistic regression without ZTD model. Shamshiri et al. [17] proposed a new technique based on ML Gaussian processes (GP) regression approach using the combination of small-baseline interferograms and GNSS derived ZTD values to mitigate phase delay caused by troposphere in interferometric observations, exploiting roughly 200 permanent GNSS stations (CPOS) provided by the Norwegian mapping authority (NMA).
Regional ZTD modeling products can improve precise point positioning (PPP) performance, which is of great significance to PPP application research. Yao et al. [18] established a global nonmeteorological parameter ZTD model with an external coincidence accuracy of about 4 cm. By using the ZTD obtained from the model as the virtual observation of PPP model, the PPP convergence time can be improved up to 15% [19]. Li's research [20] showed that when the regional ZTD interpolation (40 km scale) accuracy was less than 1 cm, it could meet the demand of PPP for instantaneous ambiguity resolution. Dai's research [21] showed that for the regional tropospheric modeling with the four-parameter method, the accuracy of small region modeling (40 km scale) was 1 cm, and the accuracy of large region (300 km scale) was 2 cm. Single-frequency and dual-frequency PPP performance could be significantly improved by using the atmospheric correction products. e real-time ZTD of each CORS station is calculated with real-time PPP technology based on the variance component in [22] to realize the practical application of the model without measured meteorological parameters, and Hong Kong regional CORS (15 km scale) is used as an example. e real-time modeling of ZTD in the Hong Kong region is carried out with multicore support vector machine (MS) based on genetic optimization algorithm, single-core support vector machine (SVM), four-parameter method (FP), neural network method (BP), and root mean square fusion method (SUM). e model accuracy is evaluated, and the real-time and high-precision regional ZTD model is further studied using artificial intelligence technology. statistics and originally used to solve the problem of computer pattern recognition [23]. SVM is a "small sample" learning method that can avoid limitations of other machine learning methods, such as underlearning, high dimensionality, and easy to fall into local optimality, and can be used in a regression analysis called support vector regression (SVR).
SVR primarily defines a loss function, establishes a corresponding kernel function, maps the original data space to a high-dimensional space through nonlinear transformation, converts the regression function problem into a quadratic convex optimization problem, introduces relaxation factors and penalty coefficients to form a dual optimization problem, and finally obtains the optimal solution of the optimization problem [24].
e SVR principle is expressed as follows.
e training sample set is assumed {(x i , y i ), i � 1, 2, . . ., l}. e number of samples is equal to l. x i ∈ R N and y i ∈ R are input and target values, respectively. Insensitive loss function ε can be expressed as follows: (1) e regression estimation function f (x) in formula (1) can be constructed in the following form after learning from sample sets: where ψ (x) is the nonlinear function that maps x to a linear feature space of dimension w (possibly infinite dimension). e objective problem of the sample (x i , y i ) and the estimated function becomes the following convex quadratic programming problem: where ξ i ≥ 0, ξ * i ≥ 0 are slack variables and c is the penalty factor. e following regression estimation function is obtained by solving formula (3) with the Lagrangian multiplier method: where α i , α * i are Lagrange multipliers. x i corresponding to nonzero α i or α * i obtained by the optimization calculation is called the support vector, and only the support vector contributes to w, that is, contributes to the estimation function f (x). SV and N NSV are the support vector set and the standard support vector number, respectively, and K (x i , x j ) is the kernel function.

Selection of Multicore Functions and eir Parameters.
e performance of SVM depends on the selection of the kernel function and its parameters. However, many types of kernel functions exist, each with its own characteristics, advantages, and disadvantages. erefore, if a single kernel function is used to solve the actual problem, then the classification performance of SVM is often nonoptimal with certain limitations. Moreover, constructing corresponding kernel functions on the basis of actual problems is often necessary in addition to using common single kernel functions (such as polynomial, Gaussian, Sigmoid, and linear kernel functions). erefore, combining different kernel functions for learning can be considered to select a suitable kernel function. e multicore function K mk (x i , x j ) can be expressed as follows [25]: where K r local (x i , x j ) and K t global (x i , x j ) represent r-th local and t-th global kernel functions, respectively, m and n are the number of local and global kernel functions, respectively, and the fusion coefficient of each kernel function satisfies 0 ≤ p 1 , p 2 , . . ., p r , . . ., p m ≤ 1, 0 ≤ p 1 , p 2 , . . ., p t , . . ., p n ≤ 1, m r�1 p r + n t�1 p t � 1. e local kernel function demonstrates strong learning but weak generalization abilities. e global kernel function exhibits strong generalization but weak learning abilities. e hybrid kernel function shows both local and global characteristics and certain learning and generalization abilities. Commonly used local and global kernel functions selected in this study are radial basis and polynomial kernel functions, respectively, which are expressed as follows: Mathematical Problems in Engineering where σ, a, c, and q are kernel parameters that satisfy σ, a > 0, c ≥ 0, and q ∈ N, respectively. Formulas (5)- (7) showed that extrapolation ability of the radial basis kernel function decreases with the increase of the parameter σ when selecting kernel parameters. A large parameter q indicates high mapping dimension and computational workload. Learning complexity will be excessively high and the phenomenon of "overfitting" will likely occur when q is overly large. e multikernel function clearly degenerates into a single kernel function when the fusion coefficient of a certain kernel function is 1 and that of other kernel functions is 0. e model parameter selection of the SVR machine based on a single kernel function is the only selection of internal parameters of the single kernel function. e model parameter selection of the SVR machine with multicore function as the core should select not only internal parameters of the local kernel function but also those of the global kernel function while determining fusion coefficients of all kernel functions to ensure that the performance of the SVR machine based on the multikernel function is optimal [26,27].
Manually selecting these parameters is evidently time consuming and laborious, and optimal results cannot be guaranteed. Hence, choosing an acceptable optimization algorithm is very important. Commonly used parameter optimization algorithms include grid search, genetic algorithm, and particle swarm algorithm. Genetic optimization algorithm [28], which uses a probabilistic optimization method, is selected in this study. is algorithm can automatically obtain and guide the optimized search space, adjust the search direction adaptively without needing definite rules, and demonstrate characteristics of inherent implicit parallelism and enhanced global optimization ability. Calculation steps of the genetic algorithm mainly include initialization, selection, crossover, mutation, and global optimal convergence.

Other Modeling Methods.
Other widely used modeling methods, such as FP [29], BP neural network [11], SVM with radial basis kernel function [24], and SUM, which is a simple fusion of the first three methods, are also used to compare and analyze the effect of multicore SVM modeling.
Searching for three stations closest to the user location; obtaining the ZTD results of the three stations with the three model algorithms of FP, BP, and SVM; and analyzing the root mean square error (RMS) of each of the three model algorithms can summarize the basic principle of SUM modeling. Weights under the results of the three model algorithms being fused are obtained according to the RMS weighting method. e respective weights of the three model algorithms decrease with the increase of the RMS when their results are fused. Specific modeling steps are presented as follows: (1) Search the nearest three stations as verification points according to the user's input coordinate position. (2) Sequentially delete verification point data from the original data and use the above three models to calculate the ZTD of the verification point on the basis of remaining data. (3) Compare the calculation result with the original data and calculate the deviation and RMS, which is expressed as follows: where RMS i represents the RMS error (RMSE) of the i-th modeling method and bias represents the deviation between the calculated value of the ZTD and the actual value. (4) Calculate the weight with the RMSE of each method as follows: where ω i is the weight of the first algorithm and p is the power value (usually 1 or 2, and this article uses 2). A large power value indicates a large weight of the result with a small RMS i . (5) Obtain the ZTD value of the user input coordinate position with the three algorithms and then calculate the weighted average as the optimal ZTD, which is expressed as follows: where ZTD i is the ZTD calculated via the i-th modeling method.

Elevation Naturalization.
Tropospheric wet delay is strongly related to altitude. e elevation of the station must be considered in tropospheric modeling to avoid a deviation of several centimeters [30]. e exponential function used in [18] is selected in this study for establishing the GZTD global tropospheric model to describe the law of ZTD change with elevation as follows: where h is the altitude of the point to be determined, ZTD i represents the ZTD value of the known point, and h i is the corresponding altitude. Altitude H is the only unknown parameter in the process of transmitting tropospheric correction information from the known point to the point to be determined. Let A � 1/H for ZTD data of the station, and the least square algorithm can be used to solve A. ZTD of the point to be determined is then calculated. e process of real-time or final ZTD modeling in this article is shown in Figure 1.
In this paper, the ZTD modeling is completed under the software MATLAB 2017a. e MATLAB toolboxes of MS, SVM, and BP, and SUM models are LIBSVM, SVM, and neural network, respectively. e ZTD models can be delivered to real-time PPP users after taking the ZTD modeling process and modeling products as a submodule of a realtime PPP service system [31] and establishing data communication between the service system and the submodule. e realization process is as follows. Firstly, the data needed for modeling is obtained in real time from the service system through Internet. en, the ZTD modeling is carried out, and the real-time ZTD modeling products are transmitted to the service system through Internet. Finally, the service system broadcasts real-time precision products, including satellite orbit, clock offset, and atmospheric error corrections, to real-time PPP users through Internet or navigation messages.  [22]. e product of orbit with a sampling interval of 5 min and satellite clock offset with a sampling interval of 30 s used in the final PPP solution is the final multi-GNSS precise product "GBM" from German Research Centre for Geosciences (GFZ). e product of the orbit with a sampling interval of 5 s and the satellite clock offset with a sampling interval of 5 s used in the real-time PPP solution is the real-time multi-GNSS precise product "CLK93" from

Mathematical Problems in Engineering
Centre National d'Etudes Spatiales (CNES). Missing parts of real-time orbit products are filled with GFZ ultrafast predicted orbits with Lagrangian interpolation. e missing part of the real-time clock offset is supplemented using the real-time clock offset prediction method in [32]. e PPP solution software is independently developed. e software-specific parameter configuration and processing strategy are shown in Table 1. Figure 2 shows that 16 stations marked by yellow triangles are used for modeling, 3 stations marked by red triangles are used for accuracy verification, and 3 stations marked in green text denote internal accuracy check stations for modeling among the 16 base stations. Real-time and final ZTD values of 19 stations for five consecutive days are calculated according to the sampling interval of 30 s before modeling. ZTD fitting and forecasting of regional stations are performed according to the sampling interval of 5 min during the modeling process. e modeling results of the first two hours of each day are removed when performing model fitting and forecast accuracy statistics to reduce the influence of PPP convergence on the ZTD solution results. at is, 264 epoch fitting or forecast data participate in the statistical analysis of model accuracy in a day for a certain station among the 19 stations. ree indicators, namely, standard deviation (STD), RMSE, and average deviation (bias), are used to evaluate the modeling effect.
e experimental results of [22] showed that the real-time ZTD solution (10 mm) and final ZTD (6 mm) accuracy levels reach the average level obtained by many studies. erefore, the calculated ZTD value can be used as the true value to evaluate the accuracy level of realtime and final ZTD modeling in this study. Accuracy indicators of bias, RMSE, and STD are calculated as follows [11]: where N 1 and N 2 are the number of fitting and forecast samples, respectively, and ZTD model

Final ZTD Modeling
Analysis. ZTD of each station calculated using the final PPP solution is taken as the true value, and the accuracy of various models is evaluated. Figure 4 shows the daily fitting and forecast accuracy statistics results of the FP, BP, SVM, SUM, and MS models. Accuracy statistics of five consecutive days for the five models are listed in Table 2. e following can be deduced from Figure 4 and Table 2: (a) e fitting and prediction accuracy of the five models all demonstrate a certain relationship with the elevation of the station. e HKCL station with lower elevation shows the lowest prediction accuracy and the HKQT station with lowest elevation exhibits lower fitting accuracy but still higher than HKCL mainly because the tropospheric wet delay of the station with lower elevation changes significantly and the water vapor around the HKCL station changes significantly given that the area is near the sea. e fitting or prediction accuracy of the four other stations demonstrates no significant difference. e decrease in modeling accuracy of the station with high elevation is insignificant.     Figure 4: ZTD daily fitting and prediction accuracy statistics results of the five models. Note that bias in the figure refers to Bias ture in formula (12) or Bias observ in formula (13). RMS refers to RMSE in formula (14) or STD in formula (15), and the meaning in the following figures is similar.

Real-Time ZTD Modeling Analysis.
After real-time ZTD modeling of the six stations, ZTD of each station calculated using the final PPP solution is taken as the truth value to evaluate the accuracy of various models. Accuracy statistics of the five models for five consecutive days are listed in Table 3. e following can be deduced from Table 3: (a) Relatively poor modeling effects of HKCL and HKQT stations and nearly the same modeling effects of the four remaining stations indicate the similar modeling effects of the five modeling methods in real-time and final ZTD modeling.
(b) Compared with the results in Table 1, the absolute deviation, average deviation, and RMS of real-time ZTD modeling are generally larger than those of the final ZTD modeling. is finding indicated that a certain level of difference in accuracy exists between real-time and final ZTD modeling results. is phenomenon is consistent with the results of previous studies. (c) Compared with the four other models, MS demonstrates certain improvements in the fitting and prediction accuracy. e difference in modeling effects of the four other models is insignificant. e RMS of SVM model is slightly better than the three other models. Average deviations and RMS of the five models of the stations, except for the HKCL station, are 1.32 and 9.10 mm, respectively.

Comparative Analysis of Real-Time and Final ZTD
Modeling. ZTD of each station calculated using real-time PPP technology is taken as the true value, and the accuracy of various models is reevaluated to analyze characteristics of real-time ZTD modeling further. e daily fitting and forecast accuracy statistical results of FP, BP, SVM, SUM, and MS models are illustrated in Figure 5. Accuracy statistics of the five models for five consecutive days are presented in Table 4. Real-time and final ZTD modeling error distributions are compared and analyzed. Real-time and final ZTD error changes of six stations for five consecutive days are shown in Figure 6, and their error distributions are illustrated in Figure 7. e following can be deduced from Figure 5 and        e modeling error analysis of the six stations for five consecutive days showed that the standard deviation and average value of real-time modeling improved by 43.19% and 24.04%, respectively, compared with those of the final modeling primarily because the stability of the station real-time ZTD before real- time ZTD modeling is better. ZTD is obtained using the real-time PPP solution with high-frequency realtime orbit and clock offset in a sampling interval of 5 s.

PPP Accuracy
Verification. e HKQT, HKSL, HKNP, HKCL, HKWS, and HKST stations data in the 20th day of 2019 is selected, with 30 s sampling interval. e data of the first 2 h is not used. To validate the effectiveness of the MS modeling method, static PPP solution is carried out, with GPS and GLONASS satellite system observations, and precision orbit and clock offset products being the same as in Section 2.1. Four experimental schemes are designed: Scheme 1, PPP solution based on GBM final precision products; Scheme 2, based on Scheme 1, adding precise a priori constraint [19] (initial values of the final ZTD derived from MS modeling); Scheme 3, PPP solution based on CLK93 Real-time precision products; and Scheme 4, based on Scheme 3, adding precise a priori constraint (initial values of the real-time ZTD derived from MS modeling). Figure 8 shows the station HKST′ positioning results in N, E, and U components under the four schemes. Table 5 shows the positioning results, convergence time, and comparison under the four schemes for the six stations. e convergence criterion is defined as the moment when the error of positioning is less than 0.1 m in each component of N, E, and U. e following can be deduced from Figure 8 and Table 5:    Note. P and T are the positioning precision and the convergence time in N, E, and U components, respectively. Improvement rate describes the improvement effect of the PPP solution with ZTD constraint compared with the solution without constraint, in unit of %.
from that of final PPP. e main reason is that there is little difference between the accuracy of highfrequency CLK93 real-time products and GBM final products used in this paper.

Summary and Discussion
Real-time and final ZTD values of 19 stations for five consecutive days in the Hong Kong CORS network are first obtained in this study by using the PPP solution. Five ZTD modeling methods (FP, BP, SVM, SUM, and MS models) are used to carry out regional real-time and final ZTD modeling. Finally, real-time and final ZTD fitting and prediction accuracy values are analyzed. e following conclusions can be drawn from this study: (1) MS based on the genetic optimization algorithm demonstrates better learning and generalization abilities than other models. e high accuracy real-time modeling of the regional tropospheric delay in this study can provide a theoretical reference for real-time PPP research based on regional atmospheric enhancement products. Meanwhile, the neural network model based on the genetic algorithm can be used to optimize the BP model in this study and reduce the occurrence of the network falling into the local optimal value. e time factor taken as the reference value can be the input to the neural network or SVM training center to improve the modeling accuracy further. An increasingly correct elevation naturalization model can be introduced to improve the modeling accuracy.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare no conflicts of interest.

Authors' Contributions
Xu Yang and Xinyuan Jiang conceived and defined the research scheme. Xu Yang and Chuang Jiang verified the feasibility of the method and implemented the software algorithm. Xu Yang and Lei Xu checked the data processing results and wrote the manuscript. Chuang Jiang and Lei Xu helped to revise the manuscript and modified some figures and tables.