A Unified Framework for GPS Code and Carrier-Phase Multipath Mitigation Using Support Vector Regression

Multipath mitigation is a long-standing problem in global positioning system (GPS) research and is essential for improving the accuracy and precision of positioning solutions. In this work, we consider multipath error estimation as a regression problem and propose a unified framework for both code and carrier-phasemultipathmitigation for ground fixedGPS stations.We use the kernel support vector machine to predict multipath errors, since it is known to potentially offer better-performance traditional models, such as neural networks. The predicted multipath error is then used to correct GPS measurements. We empirically show that the proposed method can reduce the code multipath error standard deviation up to 79% on average, which significantly outperforms other approaches in the literature. A comparative analysis of reduction of double-differential carrier-phase multipath error reveals that a 57% reduction is also achieved. Furthermore, by simulation, we also show that this method is robust to coexisting signals of phenomena (e.g., seismic signals) we wish to preserve.


Introduction
Multipath is defined as one or more indirect replicas of the line-of-sight (LOS) signal from satellites arriving at a receiver's antenna from a satellite. It normally occurs due to reflection from objects in the vicinity of the receiver and constitutes a major error source that contaminates receivers' measurements, resulting in performance degradation of GPS positioning solutions. The errors induced by multipath are typically up to 15 meters for C/A code [1] and a few centimeters for carrier-phase measurements [2]. Multipath mitigation is hence important for a variety of applications which utilize this data, such as ionospheric monitoring [3], geodesy [4,5], and navigation [6].
On the one hand, multipath mitigation is a very challenging task. As multipath is site-dependent, differencing measurements among multiple short-baseline receivers (DGPS) are unlikely to help. Furthermore, aggressively removing multipath error may harm wanted coexisting information and perturbations such as seismic signals induced by an earthquake, as their frequency spectra likely overlap with that of the contaminating multipath error.
Various mitigation approaches have been proposed in the literature, classified into either frequency-domain or timedomain processing. The former is based on spectral analysis of multipath error in the frequency domain using fast fourier transform (FFT) [7], or wavelet decomposition [8,9]. However, they unintentionally rule out other coexisting signals. To overcome this issue, signal-to-noise (SNR) measurements [10,11] can be used as alternative for analysis. Unfortunately, this suffers from unavailability and inconsistency in units from different types of receivers. Time-domain methods range from the popular carrier smoothing filter (CSF) [12,13], band-pass finite impulse response (FIR) filter [14,15] to stacking [16][17][18]. These methods also tend to filter out coexisting signals of interest and require high-rate data to boost their performance [16].
In previous works [19,20], we propose a regression model, which integrates kernel support vector regression (SVR) with geometrical features to deal with the code multipath error prediction on ground fixed GPS stations. To the best of the authors' knowledge, this is the first work using machine learning to address GPS multipath error estimation. This paper extends our previous work to define a unified framework for both code and carrier-phase multipath mitigation. The contribution of this paper is threefold: (1) deriving geometric models for code and carrier-phase multipath errors, (2) formulating multipath error estimation as a regression problem with geometrical features, and (3) unifying the framework for code and carrier-phase multipath estimation using support vector regression.
The rest of this paper is organized as follows. Section 2 briefly reviews the mathematical models of GPS measurements and derives the geometrical models of multipath errors. By posing multipath estimation as a nonlinear regression problem, Section 3 defines the framework for multipath mitigation. Experimental results and discussions for code and carrier-phase multipath mitigation will be presented in Sections 4 and 5, respectively. The conclusion will follow in Section 6.

GPS Measurements and Code Multipath Extraction
In this section, we briefly review the GPS measurement data as generated by GPS receivers. Following this, geometrical models will be derived for both code and carrier-phase multipath errors.

GPS Measurements.
The code measurement 1 and carrier-phase measurement 1 for channel L1 are given as in (1) and (2), respectively [13,21]. The measurements for L2 are similar, where represents the true range from a satellite to a receiver, is clock bias, the subscripts and refer to the user (receiver) and the satellite, respectively, is the speed of light, , , , , 1 , and 1 denote tropospheric delay, ionospheric delay, code multipath error, random receiver noise on code, carrier-phase multipath error, and random receiver noise on carrier-phase, respectively, and the symbols 1 denote wavelengths of L1. The term 1 is the ambiguous integer of L1. The opposite signs of the ionospheric delays in (1) and (2) are due to the fact that the ionosphere affects code and carrier measurements equally but in opposite directions when the signals travel through the dispersive ionospheric layer in the atmosphere [2].

Geometrical Model of Code Multipath.
Ideally, in a multipath-free environment, only one direct signal is received by the antenna from each satellite. However, no environment is completely multipath-free in practice. A receiver antenna receives one or more replicas of the direct signal reflected from objects near the LOS path, particularly those in the vicinity of the receiver. As a result, the receiver will track a composite signal that is a combination of the direct path and the multipath replicas. For clarity, let us consider the simplified case of one multipath signal. Let and denote the amplitudes of the direct signal and the multipath signal, respectively, the path delay, the multipath relative phase in radians, = / ≤ 1 the ratio of the multipath and direct amplitudes, and and the azimuth and elevation angles, respectively. The position of any reflecting object is described as a planar surface tilted relative to the local level with a tilt angle at a distance ℎ from the antenna centre as illustrated in Figure 1.
Let denote the reflection angle relative to the reflecting surface; the induced code multipath error is given as [16]: In order to obtain the geometrical model, we need to relate multipath error with geometrical parameters, that is, azimuth and elevation angles. Assuming that the satellite, antenna, and normal vector to the reflecting surface are coplanar, multipath reflections fall into two categories: forward-scatter and backscatter [22] as in Figure 2.

Geometrical Model of Carrier-Phase
Multipath. The effect of multipath on carrier-phase measurement can be demonstrated by the phasor diagram in Figure 3. Additionally, let denote the amplitude of the composite signal which is the combination of the multipath signal and the direct signal. , , and are the direct signal's phase, the composite signal's phase and the multipath relative phase with respect to the direct signal, respectively. denotes the phase error due to multipath.
The phase error due to multipath is easily derived in terms of multipath parameters as follows: where 0 is a possible phase offset at time 0. Therefore, ) .

Multipath Mitigation Framework Using Nonlinear Support Vector Regression
In this section, we will define the unified framework for code and carrier-phase multipath mitigation followed by the formulation to solve multipath estimation as nonlinear regression using support vector regression algorithm.

Multipath Mitigation Framework.
Under assumption that the multipath environment around a ground fixed receiver is held fixed, then quantities , ℎ , in (8) and (12) will remain constant. As a result, code multipath error and carrier-phase multipath error are complicated functions of satellite-relative elevation angle and azimuth angle in the general case which characterizes the geometry of the satellite with respect to the receiver. Give these functions, in order to compute multipath errors, it is necessary to somehow evaluate the functions in (8) and (12) given as an azimuth/elevation pair. Viewed as a regression problem, these functions will be approximated by learning from the historical multipath data and satellite orbital information. Specifically, estimating code and carrier-phase multipath errors is a 2-dimensional regression setting: multipath = (azimuth, elevation) .
The rationale behind this approach is the observation that a fixed receiver experiences highly sidereal day-to-day correlation of satellite-receiver geometry and multipath error. It is well known that the GPS satellite orbits were selected to have a period of half a sidereal day (23 hours 56 minutes 4 seconds) with a daily repeating ground track [23,24]. Because of this, satellite visibility from any point on earth is the same from day to day, with the satellites appearing in their positions approximately 4 minutes or 236 seconds earlier each day due to the time difference between the sidereal and solar day. Furthermore, for a fixed station, its surroundings and the antenna usually do not change significantly across consecutive days. Therefore, GPS multipath signals are expected to largely repeat over the same time period. Our proposed approach leverages this observation with a regression approach to approximate the function of the repeatable multipath error. Code and carrier-phase multipath errors can be estimated independently, but the procedure for training the estimators is the same. Satellite-specific multipath estimator will be trained using -SVR [25], a well-known support vector regression algorithm. After being trained, the code and carrier-phase multipath estimators will be used to provide estimate of multipath errors. Afterwards, these estimates will be used to correct code and carrier-phase measurements to achieve multipath mitigation.

Nonlinear Regression for Multipath Estimation.
Among various kinds of regression models, SVRs [25,26] are well known to have more potential advantages than traditional models, such as neural networks [27]. They are based on the strong statistical learning theory and have been shown to be less prone to overfitting as well as being independent of the dimensionality of the input space. We employ -SVR [25], which is most commonly used for regression problem. In the hard margin loss setting for -SVR, the estimate has at most deviation from the actual target for all training data, meaning that one does not care about errors as long as they are less than , but will not accept any deviation larger than this. Nevertheless, owing to noise in data, we usually use soft margin loss setting to allow for some errors by introducing slack variables in the formulation of -SVR.
We denote input vector as ∈ R and denote a value of multipath error as ∈ R. Given the multipath model in (8) and (12), our goal is to learn a function : R → R mapping from an observation vector to an estimate of multipath error̂. Formally, this can be accomplished by first choosing a set of training samples {( 1 , 1 ), . . . , ( , )} ∈ R × R. Due to noise in the training data, it is unlikely that ( ) will be equal to for all , so a loss function ( ( ), ) must also be chosen to quantify the penalty for ( ) differing from . The estimator can be found by minimizing the total loss over the training data.
For each satellite, the multipath estimator is trained using -SVR [25]. We denote the regression function ( ) = ⟨ , ( )⟩ + , where is the weight vector in the kernel feature space, ∈ R is a bias term, ⟨⋅, ⋅⟩ denotes the dot product, and is the kernel feature map of data point . The -insensitive loss function given by (14) is chosen so that the function ( ) is found to have at most deviation from the targets for all training samples: ( ) can be solved through the following optimization problem [25]: where > 0 is the parameter of the -insensitive loss function that controls the accuracy of the regressor. The constant > 0 adjusts the tradeoff between the regression error and the regularization on . = { 1 , . . . , } ∈ R and * = { * 1 , . . . , * } ∈ R are slack variables allowing errors around the regression function. After solving the optimization problem, the form of the estimator is where 1 , . . . , SV are scalar coefficients, 1 , . . . , SV are support vectors, and : R × R → R is a kernel function.
( ) depends only on the training samples having nonzero coefficients (support vectors) through the representation of the kernel function . The Gaussian kernel given by (17) is reasonably chosen due to its ability to handle nonlinearity: where is kernel bandwidth. Solving the regression problem by -SVR, learning from training data, code multipath error can be estimated for each visible satellite.

Experiment I: Code Multipath Mitigation
In this section, we will describe experiments conducted to train multipath estimators and subsequently use them for multipath mitigation. We demonstrate that our approach outperforms state-of-the-art results in code multipath mitigation in terms of standard deviation. The advantages of the exploited methods will be also discussed.
Advances in Artificial Neural Systems   [13]. Rooftops are usually bad multipath environments since there are often many vents and other reflective objects within the GPS antenna field of view. In the photographs shown in Figure 4, it can be seen that the observation site is surrounded by many buildings and reflectors which make multipath potentially more severe. During the evaluation period there were 31 visible satellites ranging from PRN 2 to PRN 32 observed at this site.
To illustrate the repeatability of GPS satellite geometries, Figure 5 plots the geometries of 4 visible satellites with respect to the observation site during 4 consecutive days from Day 306 to Day 309. For the sake of simplicity and clarity, only 4 of 31 in-view satellites whose full arcs completely fall in each day period are plotted. As seen from these plots, the footprints of the day-to-day repeated geometries of the satellites are obviously exposed.
Likewise, in order to illustrate the day-to-day repeatability of code multipath error, Figures 6(a) and 6(b) plotted the code multipath sequences extracted from the recorded data set during four observation days from Day 306 to Day 309. The code multipath error can be extracted in batch mode from code measurements using code-minus-carrier (CmC) combination on a whole arc [9,13]. The correlation is more clearly revealed after we smoothed the sequences with a CSF [9] having a 50-second window to largely remove high-frequency noise. The original multipath sequences are plotted in blue and CSF-smoothed multipath sequences are plotted in red. Day-to-day correlation of the two multipath sequences is numerically evaluated by their normalized cross-correlation. Pair-wise normalized cross-correlation values of the multipath sequences are tabulated in Table 1 for PRN 12 with blue and red values representing the original and smoothed multipath sequences, respectively. The day-today correlation is around 89% between two consecutive days and slowly degrades with time. This is understandable since cumulative environmental changes become noticeable as the time span increases.

Training Code Multipath Estimators.
For each satellite, the training data is prepared using 4-day data from Day 306 to Day 309 extracted from the experimental data set, which we have found to be redundant enough to capture distribution of multipath sequences. Azimuth and elevation angles of the satellites in degrees with respect to the receiver, which are inputs for training, are computed from the broadcast navigation data [13]. For the desired multipath outputs, after being detached from observation data, the CmC sequences containing multipath errors are filtered with CSF to remove high-frequency noise. The effect of this smoothing on the multipath error is negligible as long as the smoothing window is shorter than the highest rate multipath. The smoothing window is set to 50 seconds (equivalent to 5 epochs) which is only a fraction of the shortest anticipated multipath fading period of 200 seconds [9,28]. Thus, the receiver noise was significantly reduced without removing the multipath which was to be quantified. This smoothing operation helps to  clearly expose multipath patterns and, as a result, to enhance the estimators' generalization.
Scaling is applied to the training data before use. Azimuth angles, elevation angles, and code multipath values are scaled to the range [−1; +1]. The main advantage of scaling is not only to avoid numerical difficulties during the calculation but also to prevent domination of values in greater numeric ranges over those in smaller numeric ranges.
The libSVM package (http://www.csie.ntu.edu.tw/∼cjlin/ libsvm), which implements -SVR, was used to find the support vectors and coefficients of each satellite-specific code multipath estimator. , the kernel parameter and the penalty parameter of the error term , must be chosen as a priori. It is not known beforehand which parameter values are best for a given problem; consequently, some kind of model selection (parameter search) must be done. For and , grid search and cross-validation were applied for parameter search. Following the recommended libSVM approach, a coarse grid search was firstly performed on exponentially growing sequences of = 2 −15 , 2 −13 , . . . , 2 3 and = 2 −5 , 2 −3 , . . . , 2 15 followed by cross-validation for each pair of ( , ) with a fixed = 0.01. The result with the best 6-fold crossvalidation accuracy was picked. With this strategy, better region on the grid can be identified and finer grid-search Advances in Artificial Neural Systems Learning from the training data set, the support vectors and coefficients in (16) are found for multipath estimation of each satellite. Each separately trained estimator should estimate multipath error when presented with a new observation of azimuth/elevation angles thereafter.

Experimental Results.
In order to evaluate the performance of code multipath estimators, the proper multipath correction is directly applied to code measurements for Day 310. Note that the inputs must be scaled as they were during the training phase, and the estimated multipath values subsequently need to be descaled.
For the sake of demonstration, Figure 7 presents pseudorange multipath errors and responses of the multipath estimator corresponding to PRN 12 in the Day. As observed, standard deviation of the multipath sequence is significantly reduced after being corrected with the responses of the SVR estimator.
Performance of all multipath estimators corresponding to the visible satellites is tabulated in Table 2 where multipath reduction is measured in terms of the reduced standard deviation of multipath errors. The percentages of reduction range from 68% to 91%. On average, calibrating the data with CSF followed by the SVR estimators gains improvement from 36.68% to 78.99%. With the assumption about unchanged surroundings of this approach, performance of the multipath estimators would depend upon how fast the reflecting surfaces along the propagation direction change. The environmental changes are expected to be different for different propagation directions of the GPS satellites. Therefore, the variation in performance of the estimators as seen in Table 2 is expected.
The goodness of the corrections can also be illustrated in the positional domain. Figure 8 shows the variation of the solved positions from the nominal position of the receiver. Weighted least mean square single-point positioning [13] with broadcast navigation data was applied to the data of Day 310. The measurements were only multipath corrected while other noises and biases (e.g., atmosphere delays, etc.) were not calibrated. The plot reveals noticeably higher centralization of the solution on the data corrected with SVR estimators over those obtained from the original data and the 50-second CSF-smoothed data. The reduction of standard deviation of coordinate time series North, East, and Up is tabulated separately in Table 3.

Discussions
For comparison, Table 4 tabulates numerical performance of different methods in terms of percentage of reduced multipath error. The performance of CSF is reported with a 100-second smoothing window [13]. The performance range of the frequency analysis method using fast fourier transform (FFT) [7] is reported with block sizes of 256 and 512, respectively, whilst the block size of the frequency analysis method using wavelet analysis [9] is 100 seconds. In particular, the  performance of the FIR filter method [14] is unreliable as only one satellite (PRN 9) was used for analysis. It is clear that SVR estimators significantly outperform the other reported methods. The state-of-the-art performance of SVR estimators on pseudorange multipath reduction emphasizes the efficiency of the proposed method. However, as the accumulative environmental changes become more and more severe over time, the performance of the estimators would temporally degrade. Therefore, the multipath estimators need to be equipped with adaptability, which has not been addressed so far.
Advances in Artificial Neural Systems Unlike the multipath stacking-based approaches [16,17,29], modeling multipath errors as functions of continuous variables (i.e., azimuth and elevation angles) does not experience the difficulty in the determination of the timeshifting period. In addition, the interpolation ability of the trained estimators makes them applicable for different data rates provided that training data is adequate to capture the underlying distribution of multipath errors. Furthermore, with the nature of sparsity, the multipath estimators just count on a subset of training data, being simpler while requiring less storage. All of these imply better scalability.
Since signals of phenomena are usually low frequency [14,23], the frequencies of the simulated event were chosen to exhibit diminishing effects of CSF, which is a low-pass filter [9,12]. The PRN 12's pseudorange multipath sequence was smoothed by CSF with a 50-second smoothing window and then corrected by the trained PRN 12's SVR estimator. As shown in Figure 9, the corrected multipath sequence aligns very well with the event signal; that is, the event signal is not affected significantly.
Although carrier-phase multipath error is more sensitive to environmental variation, it also saw correlation on a sidereal daily basis [15,29]. Therefore, the proposed method could be extended to apply to carrier-phase multipath mitigation.

Experiment 2: Carrier-Phase
Multipath Mitigation 6.1. Double-Differential Carrier-Phase Multipath. Unlike code multipath error, carrier-phase multipath error cannot be isolated using combinations of code and carrier-phase measurements of one receiver alone. In order to do that, double differential (DD) combination of two short-baseline receivers (<10 km) is used. In practice, high-precision applications rely on this combination for relative positioning. The DD combination of two satellites and observed at two receivers and is given by In (19), clock error terms are eliminated through the differencing process. Furthermore, for a short baseline, atmospheric effects are approximately equal, leading to Δ ≈ 0 and Δ ≈ 0. The integer ambiguity term Δ can be computed by debiasing over the entire DD sequence as long as there are no cycle slips. The multipath effect, hence, is dominant in the DD measurements which can now be written as The DD multipath error is the composition of four multipath errors of the four carrier-phase components. Denote and the elevation and azimuth angles of the satellite with respect to the receiver , DD multipath error actually depends on eight geometrical variables: Fortunately, at each epoch, eight variable parameters can be calculated given the orbital information of the satellites broadcast to the receivers and nominal positions of the receivers which are known beforehand. Therefore, the DD multipath function can be learned from historical data.  Figure 10 shows photographs of the locations and surroundings of the two stations. As seen in Figure 10, the stations' vicinity is covered by snow; therefore, the multipath environment is expected to change quickly even in a short time span as the weather changes. If so, the performance of the multipath estimator would degrade. The station KIRU is selected as the reference station. With predetermined nominal positions of the stations, the geometrical parameters of the satellites with respect to the stations can be computed easily at every epoch given the broadcast navigation data. In order to extract DD multipath as described earlier, a satellite with high elevation is usually selected as a reference to form the DD combination [1,13]. The pair of satellites PRN 8 and PRN 18 with the longest DD sequences are thus selected for analysis. PRN 8, with higher elevation peak, is used as the reference satellite. Their geometries with respect to the two stations are illustrated in Figure 11.

Training Multipath Estimator.
In order to train a DD multipath estimator for the satellite pair, the steps that have been done previously for code multipath estimation will again be performed. The geometrical data and multipath errors in Day 050 are scaled to [−1; +1] before feeding to the training program. The libSVM package that implements -SVR algorithm is employed to train the multipath estimators. The values of , the Gaussian kernel parameter , and the penalty parameter of the error term are again chosen using grid search and cross-validation with the same strategy as described previously for code multipath mitigation.
Learning from the training data set, the support vectors and coefficients of the DD multipath estimator are found. Finally, the multipath estimator should estimate DD multipath error when presented with a new observation of azimuth/elevation angles thereafter. The estimated multipath error will be used to correct carrier-phase DD of the satellite pair in the successive days.

Experimental Results.
In order to demonstrate the ability of the proposed method, the trained estimator is used to estimate the carrier-phase multipath error of the following Day 051. From top to bottom, Figure 12 shows the original multipath sequence, the response of the trained estimator, and the corrected multipath sequence with the response. The standard deviations of the original multipath sequence and the corrected multipath sequence are 2.15 cm and 0.92 cm, respectively. In other words, the multipath error has been reduced by 57.2% using the trained multipath estimator. Although only two data sets for one satellite pair have been presented here, this preliminary result provides a strong indication that the proposed technique can be applied to the carrier-phase multipath mitigation problem.

Discussion
As expected, the response of the multipath estimator can capture the dominant trend of the multipath sequence which is the dominant component of the multipath error [8,9]. However, the result of the proposed method when applied to carrier-phase multipath mitigation seems to be lower than what was achieved for code multipath mitigation. It is explainable as there is no equivalent smoothing algorithm like CSF to smooth the multipath sequence to attenuate highfrequency noise during training and correction. Although it is unwise to conclude firmly since the experimental result is only for one pair of satellites, for the sake of comparison, the proposed method outperforms the FIR filter method (39.8%-56.1%) [15], and the frequency-domain processing methods relying on analysis of SNR measurements (20%) [10,22]. It provides a motivation for further exploration of the proposed method on the carrier-phase multipath mitigation problem. However, this result is incomparable to that of the direct frequency-domain processing methods using wavelets such as those of Elhabiby et al. [8], which can reduce carrier-phase DD multipath up to 84%. It is because these methods can attenuate both errors from low to high frequencies to achieve better mitigation effects at the cost of filtering out other signals if their frequency content overlaps with multipath's frequency content. Although it has not been proven and more comprehensive experiments need to be conducted in future work, the proposed method should share similar advantages of code multipath estimators that it, intuitively, does not affect other phenomena signals. This could be fulfilled if data without displacement is used for training before estimating multipath error for data with displacement. Therefore, for applications where additional important signals such as seismic signals exist, the proposed method is promising.
As carrier-phase multipath is much more sensitive to environmental changes than pseudorange multipath, the data training over a short time span is more suitable for training SVR estimators under the constant environment assumption. It leads to higher data rates being required for training in order to retain data density. However, it does not strictly require high-rate data (≥1 Hz) as stacking-based approaches do [17,29]; only data rates that can capture the underlying distribution of the multipath error are required. Fortunately, the proposed method is also scalable with data rate due to its inherent sparsity. In other words, only a subset of training data which are support vectors are kept and involved in the computation. However, if the number of support vectors is significantly large, computational complexity may still be a concern for real-time operation. This issue needs further exploration.

Conclusion
This paper has presented a unified framework based on nonlinear support vector regression to address the GPS code and carrier-phase multipath mitigation problems for ground fixed GPS stations. Based on analysis of the geometry of multipath signal reflections, geometrical models of multipath errors are developed. More specifically, multipath errors corresponding to a satellite are mathematically formulated as functions of the satellite's geometry with respect to a receiver, which is parameterized by azimuth and elevation angles. As a result, the problem of multipath error estimation amounts to regression problem where the multipath functions are approximated by learning from training data using -SVM algorithm. Finally, the trained multipath estimators are employed to correct measurements for successive days. The proposed method demonstrates good performance and is scalable to data rate. Furthermore, the multipath estimators do not affect the simulated signal of phenomena.