Mitigating GNSS Multipath Effects Using XGBoost Integrated Classifier Based on Consistency Checks

,


Introduction
With the implementation of global navigation satellite system (GNSS) overall networking and the further development of BeiDou navigation system, the accuracy requirements of location-based services are getting higher and higher. Despite the more comprehensive range and number of positioning satellites, the ionospheric and tropospheric delays in the atmosphere produce serious errors, which in turn a ect the reception of satellite signals [1].
For urban environments, the simultaneous re ection of direct signals from satellites by the surfaces of buildings and obstacles can seriously disrupt the available signals in the target area. Among them, the re ected and non-line-of-sight (NLOS) are generated because of re ected and blocked signal during transmission process. e reception delay caused by the re ected and di racted signals can cause di erent degrees of distress for the position accuracy of the receiver, which are known as multipath propagation phenomena [2]. e presence of multipath phenomenon does not obtain more reliable and real-time information in studies, such as urban tra c path planning and pedestrian obstacle detection. erefore, highly accurate discrimination of direct, re ected multipath signals or NLOS signals and e ective exclusion are key aspects of the study to mitigate multipath e ects on GNSS positioning errors.
So far, the four most classical types of NLOS and multipath detection techniques have been applied, including antenna-based, receiver-internal improvement techniques, navigation processor-internal, and machine learning-based [3]. As antenna array technology, the current dual-polarized antenna implementation [4] is more mature, but it is difficult to implement considering the experimental cost and equipment limitations in universities [5]. Among the research results based on receiver signal processing techniques, solving the code loop distortion and separating the direct from reflected signals in two perspectives can only reduce the MP effect and improve the success rate of NLOS detection [6]. erefore, [7,8] proposed a method for NLOS discrimination using GNSS receiver output, but the method is difficult to guarantee the accuracy of the discrimination with the help of signal-to-noise ratio (SNR) only as a discrimination criterion. In order to increase the multipath detection probability, the use of triple frequency receivers was investigated in [9]. e difference in SNR across three frequencies was modeled as a threshold for determining multipath and assessing the robustness of the system. However, the validation performance of the method makes no sense when open areas with high SNR are encountered. Secondly, among the research results on the internal improvement of navigation processors, the application of CC is a breakthrough in the last years [10,11]. ey use empirical thresholds for test statistics of pseudorange residuals to determine the possible presence of NLOS or multipath signals from the perspective of measurements. e receiver autonomous integrity monitoring method (RAIM) for intermediate applications is more mature and is based on exhaustive and greedy detection and exclusion of "faulty" signals after the fault detection and exclusion (FDE) framework has been built [10]. In deep urban environments, exhaustive FDE is able to reduce the localization error by 8%. However, a single-layer consistency check is only able to find a consistent set of measurements, making the overall positioning solution error large [3]. Further, many papers have proposed the use of practical instruments, such as visible light, fisheye, and infrared laser scanning [12][13][14][15], to detect NLOS and determine the approximate satellite position with the help of a combination of sensors and navigation processors. However, these special sensors are easily affected by light and weather, and the limited measurement range of the scanners cannot be fully applied to special buildings in urban living areas. erefore, methods to detect and correct NLOS errors in cities with the help of 3D map matching are proposed [16][17][18][19]. e ability of 3D models to predict satellite visibility was investigated in [20]. In most implementations, three-dimensional (3D) mapping was used to improve positioning accuracy through shadow matching (SDM), terrain height assistance, or LOS detection [21]. Although these methods are able to target multipath phenomena near intersections and streets in static environments, the acquisition of map-aided information is impractical and too costly. Considering the disclosure of limited map resources and the location changes in dynamic scenes, this will make methods ineffective for detecting and mitigating building occlusion applications.
As a result of the above analysis and comparison, it is not necessary to obtain additional reference information and the choice of an improved angle based on the internal receiver is more appropriate. Considered from that perspective, machine learning techniques in the last two years have made great progress in the field of satellite navigation and positioning research [2,[22][23][24][25][26][27]. e first simple classifiers that applied machine learning techniques to achieve binary classification were decision trees (DT) and support vector machines (SVM). Reference [22] used DT to classify the received L1 signals into two types of LOS and MP signals, with a prediction accuracy of about 98% for the classifier. References [25,26] applied SVM to classify the receiver-related signal output for the target. But only with the help of signal strength as feature prediction evaluation, it leads to overfitting and feature bias in the training process. To improve the accuracy of feature extraction and training, [2,26] simulated an indoor virtual multipath environment and migrated a deep learning network to classify the correlated signal output with dimensional expansion processing and then compared the classification results with those of SVM, which was able to achieve an average classification accuracy of 94%. However, this environment is limited to indoor simulated occlusion environments and is not practical for application in outdoor environments.
In dense urban living areas with dynamic pedestrians or vehicles moving, the additional delay distance due to multipath and NLOS effects is an important cause of physical degradation of the on-board receiver [28]; it also generates pseudorange bias and carrier phase bias that are more difficult to compensate. For this reason, some studies have described the detection identification and compensation techniques for NLOS signals [29,30]. However, in contrast to the previous static location scenarios, for pedestrians or vehicles, the relative differential dynamic positioning technology [31] can be used to effectively detect and eliminate NLOS and MP effects. erefore, the CC method was migrated to this field. It is a method to determine whether the values measured are fitted consistently with empirical limits set in advance and then select the useful data retained. Among them, the GNSS pseudorange measurement CC method proposed by [32] tends to be mature, which helps receivers to detect and exclude faulty measurements autonomously by calculating the pseudorange residuals combined with the threshold judgment method in statistics, and finally excludes the detected NLOS and other kinds of multipath and then estimates the location of the target receiver.
In summary, the current application of machine learning algorithms for classifying correlated output signals is the most feasible and has room for improvement enhancements. First, previous studies in the literature have only been applied to a single classifier, the dimensionality of the feature vector selected before classification training is very limited, and the study of statistical distributions such as angular features and pseudodistance similarity features associated with the signal is not addressed. erefore, we propose to apply encapsulated multiple classification models for processing and training of target features. Second, most of the receiver internal validation of multipath suppression methods are fixed in static scenes, while for our slow-moving pedestrians and vehicles, the received satellite signal dataset will contain more outlier data present. erefore, it requires us to choose an integrated classifier that can effectively handle the special sample data for optimisation. ird, in the application area of the consistency checking method mentioned in the previous section, [33] was used for the detection and exclusion of NLOS and multipath signals by this method in collaboration with other error mitigation methods, but it should be noted that their validation experimental data were only screened and judged by statistical methods without preprocessing and feature conversion, making the available measurement set data redundant and interfering, which in turn affects the final deviation of the positioning accuracy. erefore, it is necessary to apply the principle of consistency checking method and combine it with a better integrated classification method to jointly detect and exclude the inconsistent NLOS signals with the internal output signal measurement set of the receiver in order to improve the robustness and generalization ability of the machine learning algorithm on the cooperative detection model.
To address the problems analyzed above, we apply a CC statistical method to improve the multiclassification learner by adding NLOS and multipath classification categories, so that the reflected signals and so on in the target output signal can be effectively filtered out. According to the previous training results, the proposed method can improve the degree of generalization of the data training process and prevent the overfitting phenomenon of parameters and narrow the range of accuracy deviation of the whole model. e innovative points of this paper are as follows.
(1) We build the process of software receiver loop output and decomposition into an overall consistency checks framework, combined with the DPDT, to collaboratively assist positioning. (2) In order to solve the problems of overfitting, the existence of missing sample values, and only serial processing in the classification process of single classifier methods in the field of machine learning, this paper preprocesses the output set of GNSS-related features and effective features transformation and applies the mature integrated classification method of the field of supervised machine learning to classify out NLOS and multipath signals. (3) Different from the previous acquisition of fixed satellite signals for different static scenes, this paper uses the proposed overall consistency checks method to a slow dynamic real environment. e remainder of this paper is organized as follows: First, in Section 2, we introduce the traditional GNSS positioning principles and the more mature positioning techniques. Section 3 presents the specific architecture and implementation process of the two-layer consistency-checks positioning model based on the XGBoost classification method. For an implementation of the algorithmic framework of Section 3 in a practical context, see Section 4. Finally, we present the discussion and analysis in Section 5.

GNSS Positioning Methods
Considering the research background of this paper, the current bottleneck is the negative interference of multipath and NLOS in the deep urban environment in real reception environments, such as tall buildings and vegetation occlusion, as shown in Figure 1. Advancing slow-moving vehicles will be disturbed by a variety of signals, such as NLOS, reflection, and diffraction in LOS. So, it can bring serious distortion of the internal loop and signal delay and then affect the accuracy of the output position.
Before solving the above problems, the traditional GNSS receiver positioning process is solved according to 3D coordinates in the static position. And we use relative parameters to evaluate the impact of position accuracy in the presence of ideal noise based on the principle of least squares (LS) method. e pseudorange measurement equation between the satellite and the receiver is as follows: where n is the number of observation satellites, i indicates satellite index, c is the speed of light, δt sV denotes the satellite clock offset, δt r is the offset time of the receiver, I is the ionospheric delay distance, and T is the tropospheric delay distance. Besides, the most critical ε ref(i) contains multipath error and noise error. In this paper, noise error is considered as zero-mean error. What is more, R (i) denotes the geometric distance between the ith visible satellite and the ground receiver. It can be calculated as follows: where (x, y, z) is the position of the receiver. e satellite position (x k , y k , z k ) is tackled with the help of ephemeris and satellite clock offsets. We solve the unknowns in (1) to minimize by the LS method conventionally. e estimation is iterated step by step starting from the initial position solution. To better solve (2), it is transformed into a linearized form containing (x + Δx, y + Δy, z + Δz) based on the first-order Taylor expansion. And we can solve the pseudorange observation equation to obtain the final positioning solution [33].

Weighted Least Squares Algorithm.
In the process of positioning, based on the satellite signal propagation time, it is necessary to select at least four satellites to obtain the target distance. e positioning process requires the selection of at least four satellites to obtain the destination distance based on the satellite signal propagation time. e satellite coordinates are known and combined with Newtonian iteration for the prediction of the reference point and smoothing of the residual. In the iterative process, the solution vector of the initial estimated position is (x, y, z), and the nonlinear objective equation f is set to International Journal of Antennas and Propagation when the mth iteration of the initial position is carried out, the linearized Taylor expansion performed at this point is After the linear processing of (4), the linear equation solution for the next update moment m is obtained as follows: And so on, the iterative process is repeated for similar m moments and (m − 1) moments until the accuracy of the current moment reaches the criterion and then stop the update iteration. For the linearized solution of (3), the LS solution Δv LS is solved with the help of the Jacobi matrix A, and the procedure is shown as in follows.
In the above formula, A is the geometric unit vector matrix between the user observation satellite and the other satellites. Δv LS is the state vector representation of the estimated user location. A T represents the transpose of the matrix A, and b represents the pseudorange difference between the measured and predicted values.
Considering that the output estimated value of the original pseudorange has different error iterative influence, each output measurement value needs to be assigned a pseudoweight to control the iterative error so that the negative effect of the low-elevation angle measurement environment between buildings can be reduced.
erefore, after setting the diagonal weight matrix Y in (8), multiply (7) by it. And we can get the error correlation between n different measurement values, and then effectively reduce the dominant measurement error in weak signal environment. e whole process is called WLS, and the principle formula is as follows:

Double-Differenced Relative Kinematic Positioning
Principle. Based on the dynamic urban mobile environment, the relative change in position between the mobile receiver and the base station generates real-time differential dynamic positioning. e phase correction values between different epochs are sent in real time, and another dynamic receiver also does the same deviation measurement and output processing to the observation satellite. e difference between the two stations is finally used to send the correction values to the user receiver in the dynamic environment in time. e reason why the differential positioning technology weakens or eliminates the noise signal is that the traditional static single-point positioning method has been unable to solve the signal redundancy and interference in the complex environment. So, the existing ionospheric delay, clock delay, and clock error can be eliminated as much as possible under the condition of multireference base station. Equation (1) is the basic pseudorange observation equation, and the parameter ε ref(i) is considered as the multipath error, which is also the target to be studied in this paper. Figure 2 illustrates the process of pseudorange differential positioning.
Suppose that GNSS receiver n in the figure is the dynamic mobile receiver used and m is a fixed receiving base station of the nearby rooftop. According to the signal frequency and time delay of satellites p and q, D pq nm denotes the observed value of the distance difference. e calculation principle is as follows: where e →p and e →q represent the visible satellite p and q unit vector distances, respectively, and the relative matrix vector between the reference base station m and the receiver n is represented by Δ r → nm . Because the relative distance between the two receivers is much smaller than the actual measured pseudoranges, the measured pseudoranges between n and m and satellites can be considered as the same. e common difference between satellite p and the base station and receiver can be eliminated by the difference calculation (10). Among them, the common ephemeris error and clock deviation of the DD calculation process are eliminated, leaving the more difficult error to deal with. e position vector Δ r → nm is obtained using (7) to help solve.
From the experimental environment studied in this paper, the MP effect in the real environment will greatly affect the pseudorange differential relative positioning technology, which needs to be effectively detected and eliminated before this and then combined with DPDT for receiver position estimation finally. erefore, we introduce the method of XGBoost integrated classifier to classify satellite signals to detect available signals and eliminate interference signals in Section 3 and how to apply it in a CC model to achieve the final pseudorange positioning.

Proposed Method
In the real scene of Figure 1, the multipath and NLOS have the most significant influence on the positioning results of the receiver. And the absolute positioning solution can no longer be obtained only by pseudorange single-point difference and basic LS method. Based on this [6], then WLS method is applied to a real environment containing MP and NLOS signals, and healthy satellite signals are selected for iterative weighting to improve the accuracy of prediction. In addition, 3D shadow matching (SDM) method used a matching function to determine the reference threshold and narrow the candidate range of the destination receiver and exclude unavailable signals as much as possible [16]. However, SDM is suitable for cross-street scenarios at crossroads, and the positioning scene along the street cannot achieve the ideal effect.
Considering that the construction of a specific 3D model requires precision instruments and detailed databases, but meanwhile, we want to achieve the purpose of mitigating multipath effect under precise positioning. After comparing and analyzing various methods, a feasible solution is to check and deal with the consistency of pseudorange measurements before WLS. erefore, this paper plans to apply a feasible solution to check and process the two-layer consistency of the pseudorange measurements before WLS after comparison and analysis of different methods. And we weight the evaluation of the residuals of measurements between fixed scenes and continuous epochs. erefore, Figure 3 shows the method framework built in this paper. After the GNSS receiver obtains original measurement values through the first layer of preprocessing to screen the effective data, the process detects the signal interference with the help of adaptive threshold residual judgment of LS and carrier-to-noise ratio (CNR). During uniform dynamic acquisition, the DD measurement residuals of the front and back epochs are retained as reference feature to engage with the second layer of the consistencychecks process (CC2).
is paper feeds the remaining measurements from the first layer into the XGBoost classifier in the second layer. After features training and testing, the classification learning classifier output results: among them, the NLOS and the presence of obvious outliers are eliminated and retain LOS and part of the corrected multipath reflection signal set. e retained signal measurements are combined with the residual to perform a fit test to obtain the International Journal of Antennas and Propagation 5 processed pseudorange positioning measurements. Finally, in order to further improve the accuracy of position estimation, the WLS method is used to calculate and fit the smooth pseudorange residual.

e First-Layer Consistency-Check Based on LS.
e purpose of the LS-based CC1 is to evaluate and detect the presence of signal interference with specific parameter thresholds at the current specific real-time ephemeral moment t. In order to observe the consistency between the pseudorange measurements of the target, we use the pseudorange residual values to evaluate the fitness of the consistency. Once the inconsistent interference is detected, the least square fitting is performed to help reduce the positioning error.
e principle of evaluation is as follows: In (12), the pseudorange error variation between the measured and predicted values is denoted by D nm ���→ . And Δ r → nm represents the current mobile user receiver state vector, so the residual relationship is constructed with the help of the unit vector matrix A → . Due to data samples in the deep urban environment are complex and highly disturbed, a single pseudorange residual cannot mitigate the fitting error accurately. erefore, the normalization in equation (12) is used to perform the consistency assessment, and the obtained pseudorange residual threshold for the reference SSE LS is expressed as follows: where Y − 1 is the inverse matrix form of the diagonal matrix of (8). In this paper, we set ε as the variance when it satisfies the normal distribution. e process of consistency checking algorithm has been mentioned in the study [10]; once a large SSE LS is obtained, it indicates that the pseudorange measurement of the path in the region is inconsistent and may be disturbed by anomalous signals such as multipath and NLOS, and further signal detection and classification processes are needed. If a lower SSE LS is obtained, it means that the signals collected at this point are healthy signal set and could not need to be screened out. In the process of repeated CC, the partial elimination of the first step will optimise the next repeated evaluation operation and screen and exclude the invalid measurement set until it reaches the range of fitting test standard [10], which can be used as the input reference of the next layer. e exclusion criteria use the classical statistical method-the chi-square goodness-of-fit test-and determine the threshold value Loss X [32]. If SSE LS < Loss X and there is a probability of nearly 99.99% false alarm, the next level of classification and exclusion can be done.

e Second-Layer Consistency Check Based on XGBoost
Classifier. e purpose of this section: due to the complexity of the types and interference of multipath in urban environment, the residual iterative exclusion with only one layer cannot detect and exclude invalid signals in low CNR occlusion environment. Considering the difficulty of obtaining coordinate data using ray tracing by previous study, we avoid the map limitation of city modeling and the complex calculation of database. erefore, we choose to apply a second layer for consistency check with weighted assessment. Before repeating CC2, the retained signal data need to be effectively classified in order to obtain a high-precision signal source so that we can distinguish NLOS and direct signals in the LOS in the low CNR environment. And then, we can assist the receiver output solving and achieving the goal of improving the positioning accuracy by correcting the existing pseudorange residual.

3.2.1.
e Principle of Applying XGBoost to Multiclassification. For the multiclassification problem of the first layer of the remaining measurement signal set, the training structure built based on the research background of this paper is shown in Figure 4. e overall structure is divided into three parts: the initially screened dataset is used as the input of XGBoost classifier. e target signal labels and the final classification results are output after the forward-backward gradient boosting.
XGBoost is an engineered implementation of a strong classifier in machine learning, based on algorithmic enhancements and optimisations of GBDT. e underlying framework is an integrated thought model formed by   International Journal of Antennas and Propagation forward gradient iterations on the structure of multiple DT [34]. Before training and predicting the sample features, we need to know the core parts of XGBoost. It is divided into the following parts.
(1) Prenormalization Process e software receiver can receive the acquired real data source and prepare for output processing. And the distorted mixed signals need to be correlated for feature preprocessing and simple denoising to obtain the remaining measurements as the input to XGBoost. e correlation set I k of k signals under n features at 1 s ephemeris is expressed as follows: where Q n,k denotes the form of each data set of the kth signal corresponding to the nth dimensional feature. Secondly, we consider the specificity of satellite correlation data form. For example, the received pseudorange measurements' order is 10 7 and that of the carrier phase observations are 10 3 . In order to ensure the order unity, the i th correlation feature variable m i needs to be normalized and preprocessed into a new reference variable f i of equal order of magnitude before input. e process is calculated as follows.
e iterative convergence speed of the objective function is improved by means of the ratio of the difference with the mean mean(m i ) and the standard deviation std (m i ).
(2) Target Function Before establishing the objective function, it is necessary to determine that the number of classified output tags is 3 under supervised learning, that is, the NLOS, the direct signal in LOS (DS(LOS)) and the MP signals, including the reflected signals. Based on the principle of XGBoost, the predicted value y m,i of a signal sample of the signal source collected in this paper is expressed as follows: where m represents the number of signal classes: 1, 2, 3 and ‖I k ‖ is the characteristic parameter output of the correlator. e prediction result of each type of signal is based on the weighted sum of the residuals of K weak classifiers f k . e loss function under n signal samples is represented by the relationship between pseudorange prediction value y i and the true value y m,i .
After determining the loss function, it needs to add the objective function to the next step of judgment. As the selected loss function (17) represents the bias, and the variance size needs to be controlled by the regularization term. It is shown as the right half of At time t of training, the predicted value of the previous moment (t − 1) and the current moment need to update learning value f t (x m,i ). It is clear from (18) that the optimisation objective is to solve f t (x m,i ). In order to simplify the objective function, we find the constant term in the function and the regularization term that affects the model complexity with the help of the Taylor expansion series and second-order derivative process. After the expansion of (18), it is shown as follows: where g m,i is the first-order derivative of the predicted value at the previous moment and h m,i is the second-order derivative. Because y m,i at moment t is known, the first term of the above equation can be regarded as a constant term C. So, the objective function can be simplified as follows: According to (20), it is only necessary to solve for the first-and second-order derivatives of the loss function and iteratively calculate the optimal objective value to obtain the GNSS signal objective function f(x) for predictive classification and finally obtain the output of the feature labels 0, 1, 2.

(3) Feature Splitting and Filtering
For the data features in this paper, the final set of features identified for the multidimensional variables are mostly closely related to the signal type, but a few redundant features are also present. Before applying this classifier, it is crucial to determine the appropriate features. In order to distinguish valid signals, the statistical contribution calculation method with reference to the previous literature [35] is used to select the target features. e following feature sets are used as the final reference values: elevation angle, azimuth angle, SNR, carrier phase error, pseudorange deviation amplitude, multipath amplitude ratio, the mean time delay variance, mean delay distance, root mean square error value, and so on.
To reduce the impact of the selected features on the training time, we need to evaluate the importance of the features after the model is trained. XGBoost's submodel is a DT construction, which relies on node recursive splitting to achieve tree generation. To find the optimal node for the next branching step, the size of the split gain needs to be calculated. As shown in the core tree splitting part in Figure 4, and the split L and R nodes are compared to do the difference.
Gain � 1 2 In the process, we define the total GNSS signal sample set as S(i), where the weighted sums of the first-order derivative and second-order derivative under the total sample set are G and H, respectively. And G L and G R represent the weighted values of the left and right subtrees. H L and H R represent the weighted values of the second-order derivatives of the left and right subtrees. λ and c are the hyperparameters of the training process as control factors.
After the continuous computation of (21), the current split target feature is determined as the significant feature until the gain reaches the threshold of optimisation. erefore, it is used as the splitting node of the tree. e results of the weights evaluated from the sample set S(i) calculation are shown in Figure 5, and it is obvious to obtain the importance of features, such as elevation angle, azimuth angle, SNR, carrier phase error, and pseudorange deviation amplitude for signal classification.

Consistency Checking under Dynamic Relative
Positioning. After the classification process in the previous subsection, NLOS signals are excluded and the reflected signals in the multipath phenomenon are corrected. Repeating the cardinality fit threshold test process, the data is processed for dynamic relative check of the results twice after high-precision classification, and the weighted residuals are used for iterative calculation to finally obtain the reference trajectory position with higher precision.
In the measured environment selected in this paper, there are more vegetation shielding and taller buildings on both sides of the D route in Figure 6, as a more typical urban environment. We get the process reference trajectory using a combined INS/GNSS navigation unit for slow dynamic homogeneous acquisition. During dynamic relative positioning acquisition, abnormal NLOS measurement signals that exceed empirical thresholds are excluded with the help of CC1 after the initial processing and calculation of the LS method.
For the next step of reference and comparison, the remaining set of measurements after CC1 is used as the training target data for the XGBoost classifier during the second layer of checks. is enables the dynamic differential positioning process to remove inconsistent measurements and achieve a refinement of the multipath signal species in the LOS, which in turn helps to smooth errors introduced by the positioning process.
Based on the above process, this paper combines the relative positioning process with the consistency checking principle, and the specific processing process is as follows.
Assume that the comparison position set matrix of the uniform dynamic moving process is shown as follows: Among them, V a−b is the matrix vector in which the results of two position errors are compared under the same path, and its value is used to measure the positioning effect of the consistency checking model under dynamic relative 8 International Journal of Antennas and Propagation positioning. r a → represents the predicted location set after the first CC, and r b → represents the final set of valid locations after optimal classification by the XGBoost classifier and CC2. To verify the accuracy of the offset of V a−b , T S is borrowed as the corresponding statistic, and ω a−b represents the normalized weighting process after the CC2 optimisation. e process is as follows: where SSE LS−a is the sum of the squared errors in (13) and n a is the amount of the remaining data set after the first layer of filtering. T S is the average calculated amount of the measured pseudorange residuals, as shown by (23). e lower the results, the smaller the deviation of CC2 against the real trajectory and the more accurate the positioning results obtained later.

Experimental Results and Analysis
In this part, we introduce the details of the parameters set and the environment built. e validation of the experimental part is divided into two parts. Firstly, we compare the classification results of XGBoost integrated classification learning with other supervised learning classification methods to evaluate their performance. Secondly, the system after adding the two-layer consistency check is compared with the previously proposed CC model with DT method to verify the performance of the proposed method in this paper.

Data Collection.
We used a laptop computer and a centric microchip for B1 IF signal acquisition in BDS. An electric vehicle is selected as an auxiliary vehicle tool to record information at a speed of 5 m/s and 1 s interval.
According to the number and height of buildings in the nearby living area, we select data for 5 min each at an interval of 2 hours in each area with a sampling frequency of 20 MHz. Figure 7 shows the relevant experimental scenarios and satellite distribution. Among them, Figure 7(a) shows the overhead trajectory of the experimental area near the selected school, divided into four routes A, B, C, and D. In addition, in order to fully verify the positioning performance of the proposed method, the E region is selected as the new test scenario. Figure 7(b) intercepts the distribution of all satellites in the whole area during a single 5 min sampling. For the visible number of some satellites, it can be seen from Figure 7(b) that the highest visible number of satellites reaches 28 when driving to the open area. However, only a few 9 visible satellites can transmit signals in the highbuilding obscured environment.

Classifier Parameter Settings.
During the training of the classifier, the fixed important parameter settings are shown in Table 1. Before training, we divide the preprocessed sample data into training set, verification set, and test set according to the ratio of 8 : 1:1, and update the gradient weight parameters with the help of SAM optimizer.

Classification Results.
In this part, we analyse the classification performance of XGBoost and compare it with several commonly used supervised learning classifiers. Considering the multipath differences effects in the selected experimental scenarios, training data under four scenarios are collected in order to make the method more robust. e training data collected in five scenarios are used to enhance the reliability of the evaluation. e datasets of the four low multipath regions of A,B,C, andD are grouped and repeatedly evaluated, and the remaining more complex E-region datasets are used to verify the average classification rates. Table 2 shows the comparison between the proposed classification method and the existing supervised machine learning classification algorithms from the point of view of classification accuracy, recall rate, and F 1 score. e remaining measurements of satellite B09 after CC1 and exclusion are selected as the data training set. e average classification accuracy values of the reflected signals (RS) in the presence of NLOS and LOS are used to measure the level of multipath (i.e., MP in the table). e results in the table show that the classification accuracy of Decision Tree method for NLOS and DS (LOS) is better than that of other methods, in which the F 1 score also reaches 94.2%, while the classification accuracy of KNN method for three types of signals is approximately the same. However, for the classification results of the multilayer perceptron method (MLP), the classification accuracy of DS (LOS) is as high as 94.2%, which is the best among the three types of signals.
Due to the strong interference from NLOS and MP, the only available signals are DS (LOS) and RS. e results of  International Journal of Antennas and Propagation the analysis show that the proposed XGBoost classification method could achieve the best classification rate among several methods for the three main classes. Among them, the F 1 score of the NLOS signal is also as high as 96.1%, the classification accuracy of the direct signal in the LOS signal is as high as 98.4%, and the classification accuracy of the MP signal is 91.6%. e F 1 function index is also higher, reaching 0.956, which proves the high performance of the classifier.
In the field of GNSS positioning, the impact of MP interference signals in an urban environment on the accuracy of a specific solution can be directly shown with the help of LS estimation. Figure 8 shows the average positioning error of the receiver after interference in the original environment.

Performance Evaluation of Multipath Mitigation.
e data in Figure 9 show that the positioning error of the interfered E original measurement is close to 25 m, and the multipath effect error is obviously lower than that of the four surrounding routes. In order to mitigate the effects of interference above, a double-layer consistency check is performed in conjunction with the signal classification    method in the first half of the paper. is paper performs verification of multipath mitigation performance from two perspectives.
(1) e Effect of DD Pseudorange Observations of MP Signals Figure 9 shows the variation of the pseudorange error under interference from MP signals for the five scenes. Because the selected multipath environment is not typical, most of the delays are short, about 1.2 m; while in the more complex E scene, the delay of DD pseudorange is larger than that of other scenarios, which is within the fluctuation range of 42-52 m. As an experimental scenario with strong multipath effects, Figure 10 compares the position errors in both horizontal and altitude dimensions under the D route and E route with conventional LS method and after using the improved CC method in this paper.
In the left figure, it can be seen that there is an obvious signal fluctuation jump around 100-200 s. e position error after the improved model still reaches an anomalous 5 m neighborhood. So, it can be inferred that the distortion of the receiver output signal is most severe during the period. e changing trend of the orange curve under D proves the feasibility of this method. Although the signals of some epoch segments are out of lock, after the optimisation of the proposed method, the error of the latter half of the epoch converges uniformly near the 2-4 m, which is about 5 m less than the original green existence error curve. e comparison trend of the error variation in the right figure clearly shows that near the E route, which is more deeply affected by the MP effect, resulting in the horizontal direction using the consistency checking method to optimise. And there are still many signals beating, such as fluctuations around 60 s, 125 s, and 200 s, but gradually become stable in the later stage, converging within the 5-10 m. Secondly, according to the changing trend of altitude error, the positioning error before consistency checking fluctuates around 22 m. After the consistency checking operation, the error variation is obviously smaller than the green curve, and the stable fluctuation is about 5 m. On the other hand, we assess the performance of the proposed method from the comparison of ablation studies. For any of the five experimental location scenes, we compare the average error values of the three methods using LS iterations and give the percentage improvement in accuracy relative to the previous method improvement. e results in Table 3 indicate that the optimisation of A and B is not significant, while the relatively more complex D has a 43.1% improvement in the percentage of reduced positioning error after CC2's optimising. And after the optimisation of the proposed method and WLS fitting, the positioning error of the supplementary E scene is controlled at 4.17 m, which is about 33% higher than that before fitting. us, validating the feasibility of the XGBoost-based integrated classification method of CC model proposed in this paper can be considered to apply the dynamic driving scenarios to improve target positioning accuracy. (

3) Comparison with Research Methods in the Same Field
Compared to the previous, some scholars have used 3D maps to assist in obtaining datasets and using a CC method for DT classified data signals [33]. is method is a newer approach in the direction of navigation processors in the field of multipath mitigation. We refer to this method as CC + DT(3DMA).  In this paper, we obtain better results for changes in the experimental scenario and improvements in the internal algorithm. In this section, two evaluation metrics, root mean square error (RMSE) and coefficient of determination (R-square), are introduced. ey are chosen to reflect the superiority of our proposed method. Figure 11 shows the error trend of the fit performance between the original data before improvements and two current research methods. As can be seen from the blue dashboard trend, the performance of CC + DT(3DMA) has improved considerably compared to the previous error profile of the ground truth. After the addition of the XGBoost classification method, the RMSE value is approximately 1.8 m, and the final RMSE value after WLS smoothing is improved by nearly 0.14 m from the previous one.
Secondly, the orange dashboard trend shows that Rsquare values are fitted around 0.5 after the optimisation of the model in this paper, which clearly reduces the level of multipath interference within the receiver. erefore, based on the fitting trends of the two types of evaluation metrics, it can be concluded that the consistent measurement set obtained with the aid of the XGBoost classification method improves the receiver position accuracy better than CC + DT(3DMA). After the second-layer checking, the final positioning enhancement is achieved by applying WLS, which optimises the positioning accuracy by almost 17% compared to CC + DT(3DMA). Compared to the previous, some scholars have used 3D maps to assist in obtaining datasets and using a CC method for DT classified data signals. is method is a newer approach in the direction of navigation processors in the field of multipath mitigation. We refer to this method as CC + DT(3DMA). In this paper, we obtain better results for changes in the experimental scenario and improvements in the internal algorithm.

Conclusion
e double-layer consistency checking model based on XGBoost algorithm proposed in this paper effectively mitigates strong interference of multipath effects in urban environments. In this method, the cardinality fit test and traditional LS estimation of position are used in the first layer to exclude inconsistency-checked measurements. And the remaining valid measurement dataset is used as the input for the second layer of consistency checking, applying the preprocessed dataset and debugged XGBoost classifier to classify signals into typically three classes. e classification results show that the XGBoost classifier is able to improve the classification accuracy of NLOS to 93.6% compared to several different supervised learning classification methods. After we achieve the classification results, it is equally important to combine DPDT to solve for the pseudorange measurements between satellite and receiver to eliminate systematic errors. Finally, two results are fitted based on the WLS smoothing method, and the results show that the RMSE value after iteration reduces to 1.668 m. It is an improvement of about 0.34 m compared to the CC + DT (3DMA) method for mitigating multipath.
Although the method proposed in the paper has a good improvement, there are still problems. For example, the level of multipath interference with the help of an electric vehicle simulating a dynamic reception scenario is not typical. erefore, we could select a more complex cellular dense building environment to verify the feasibility of the method in the next step. In addition, due to the slow movement of the electric vehicle and the limited amount of training data collected, we suggest converting typical features of GNSS signals into two-dimensional or multidimensional segmentation for more accurate evaluation with the help of deep learning methods.