A Prediction Method of Electromagnetic Environment Effects for UAV LiDAR Detection System

With the rapid development of science and technology, UAVs (Unmanned Aerial Vehicles) have become a new type of weapon in the informatization battlefield by their advantages of low loss and zero casualty rate. In recent years, UAV navigation electromagnetic decoy and electromagnetic interference crashes have activated widespread international attention. .e UAV LiDAR detection system is susceptible to electromagnetic interference in a complex electromagnetic environment, which results in inaccurate detection and causes the mission to fail. .erefore, it is very necessary to predict the effects of the electromagnetic environment. Traditional electromagnetic environment effect prediction methods mostly use a single model of mathematical model and machine learning, but the traditional prediction method has poor processing nonlinear ability and weak generalization ability. .erefore, this paper uses the Stacking fusion model algorithm in machine learning to study the electromagnetic environment effect prediction. .is paper proposes a Stacking fusion model based on machine learning to predict electromagnetic environment effects. .e method consists of Extreme Gradient Boosting algorithm (XGB), Gradient Boosting Decision Tree algorithm (GBDT), K Nearest Neighbor algorithm (KNN), and Decision Tree algorithm (DT). Experimental results show that, comprising with the other seven machine learning algorithms, the Stacking fusion model has a better classification prediction accuracy of 0.9762, a lower Hamming code distance of 0.0336, and a higher Kappa coefficient of 0.955..e fusion model proposed in this paper has a better predictive effect on electromagnetic environment effects and is of great significance for improving the accuracy and safety of UAV LiDAR detection systems under the complex electromagnetic environment on the battlefield.


Introduction
Modern warfares are information and electronic warfare. Many enemies and our radars are deployed on the battlefield, coupled with natural electromagnetic radiation and man-made electromagnetic radiation interference, making the electromagnetic environment of the battlefield more complicated [1]. e UAV LiDAR detection system plays an important role in informatization electronic warfare operations. e complex electromagnetic environment has caused serious interference to the UAV LiDAR detection system, threatening the safety and combat effectiveness of the UAV [2]. e LiDAR detection system plays an important role in the flight safety of UAV. It is easily affected by the electromagnetic environment of the complex battlefield, which makes the UAV LiDAR detection system have detection errors, affects the construction of point cloud maps, and causes inaccurate target detection. When the UAV LiDAR detection system is subjected to electromagnetic interference during the flight, to ensure safety, measures such as leaving the interference zone and returning home are generally taken, but it will have a great impact on the completion of the mission. Research is done on the prediction method of the complex electromagnetic environment effect of the battlefield, so that the UAV LiDAR detection system can realize the intelligent prediction of the electromagnetic risk area, to make intelligent decisions to avoid the electromagnetic risk area, thereby improving the detection accuracy of the UAV LiDAR detection system and safety.
At present, the more popular learning methods are machine learning and deep learning. e research content of deep learning mainly involves methods such as convolutional neural networks, recurrent neural networks, and selfencoding neural networks, which usually mimic the mechanisms of the human brain to interpret data such as images, time series, and text. Electromagnetic environment effect prediction is an artificial intelligence process to complete machine decision-making with the help of a large number of experimental data. So we chose machine learning algorithms rather than deep learning to solve this problem. e traditional prediction methods of electromagnetic environment effects are mainly mathematical models and machine learning single models, such as the method of moments and Support Vector Machine algorithm (SVM) [3]. Traditional forecasting methods have relatively simple models and weaker ability to deal with nonlinear problems, and errors will occur in the process of forecasting. Because this model has many prerequisites and conditional restrictions, this model is not universal.
Machine learning is a multifield subject, involving statistics, probability, etc. Machine learning algorithms can handle nonlinear problems better and have the advantages of fast calculation and automatic learning. In this paper, machine learning algorithms are used to predict the effects of electromagnetic environment. In the process of research, the theory of fusion algorithm is introduced, and the electromagnetic environment effect prediction model of Stacking model fusion algorithm is constructed.
In this paper, experiments will be used to demonstrate the effectiveness of the electromagnetic environment effect analysis and prediction model based on machine learning. In this experiment, from adaptive boosting algorithm (ADB), SVM, Random Forest algorithm (RF), DT, XGB, and GBDT, KNN selects the model with better prediction effect from seven algorithms to form the Stacking fusion model to predict the electromagnetic environment effect. In the rest of this paper, we will focus on the details of the method. e main contributions of this paper are as follows: (1) By analyzing the experimental data, a Stacking fusion model based on machine learning (composed of XGB, GBDT, KNN, and DT algorithms) is proposed to predict the electromagnetic environment effects of the UAV LiDAR detection system. (2) is method has proved its effectiveness by comparing it with seven other classification prediction methods of electromagnetic environment effects. Experimental results show that this method is more suitable for predicting the electromagnetic environment effects of UAV LiDAR detection systems.

Related Work
Due to the wide application of high-tech electronic technology in the military field, any military activity is under a certain electromagnetic environment. e electromagnetic radiation power of current navigation, radar, and communication equipment is increasing, and the frequency spectrum is constantly widening, making the electromagnetic environment of the battlefield increasingly complex. e emergence of electronic pulse weapons, the application of electronic warfare systems, and electromagnetic sources such as lightning and natural electromagnetic fields have made the electromagnetic environment of the battlefield worse [4]. UAV may encounter interference from radiation systems such as communications equipment, electronic interference, electronic deception, lightning, antiradiation weapons, radar, high-power microwave pulses, and nuclear battery pulses during their missions on the battlefield. e electromagnetic environment facing the drone is shown in Figure 1.
e traditional electromagnetic environment effect prediction method uses artificial mathematical modeling and single algorithm model. In 1999, Antonini et al. used numerical calculation methods to predict the electromagnetic interference of the electric drive system [5]. In 2009, Coco et al. used GRID-based methods to predict the electromagnetic field of the urban environment [6]. In 2010, Chen et al. used the entropy principle to predict complex electromagnetic signals in the battlefield [7]. In 2013, Ying et al. used statistical model methods to predict the electromagnetic environment [8]. In 2015, Alligier et al. used ridge regression and multiple linear regression methods to predict the climb of aircraft on the ground [9]. In 2016, Zhang et al. used the SVM algorithm to predict the UAV data link interference in the complex electromagnetic environment. Experiments showed that the SVM algorithm had advantages in nonlinear data prediction, but the accuracy of the prediction results needed to be improved [10]. In 2017, Yuan et al. used Bayesian networks to predict and evaluate the complex electromagnetic environment [11]. In 2019, Shu et al. used Artificial Neural Network (ANN) to predict electromagnetic interference [12]. In 2021, Zhang et al. used the GPR algorithm to predict the electromagnetic interference of the UAV dynamic data link [13]. In 2021, Kogut and Slowik used the Multilayer Perceptron (MLP) method to classify airborne laser sounding data. Compared with algorithms such as SVM and K-means, the classification accuracy had been improved to a certain extent [14].
From this, it can be seen that most of the current predictions of electromagnetic environment effects use traditional artificial mathematical models and single algorithm models, but the complex electromagnetic environment effects have nonlinearity, ambiguity, uncertainty, etc., so traditional predictions are used. e method is not effective in predicting the effects of a complex electromagnetic environment.

Stacking Integrated Learning Algorithm.
Ensemble learning is to combine different algorithm models for learning and use certain rules to merge different models to obtain better results. Integrated learning algorithms can solve problems such as classification and regression. In this experiment, the Stacking ensemble learning algorithm is used for classification prediction. Stacking integrated learning algorithm is a hierarchical heterogeneous fusion model. e individual learner is called the primary learner, and the learner that combines the results is called the secondary learner. e training data used by the secondary learner is called the secondary training set, and the second training set data comes from the primary learner. Choose the XGB, GBDT, and KNN algorithm with better prediction effects from the seven algorithms of ADB, SVC, RF, DT, XGB, GBDT, and KNN as the primary learner, and choose the DT algorithm as the secondary learner. In this experiment, the data set is divided into the training set and test set. Use the training set to train the XGB, GBDT, and KNN models to obtain three primary learners, then predict the test set, and use the output value as the input value of the next stage, and the final label as the training output value. e DT secondary learner is trained, and the trained secondary learner is used for prediction. Since the data sets used in the two times are different, overfitting can be prevented to a certain extent.

Decision Tree Algorithm.
A decision tree classification algorithm is a supervised machine learning algorithm, which trains a tree-type classification model from a given out-oforder training sample. In the process of classification training, a classification decision tree is established according to the principle of minimizing the loss function. In classification prediction, the test set data is used to predict the decision tree model. e CART algorithm is used in this experiment.
e CRTA algorithm consists of feature selection, tree generation, and pruning, with CART decision tree feature selection. e Gini coefficient is used as the basis for splitting nodes in the CART algorithm [15]. e Gini coefficient is a judgment of the impurity of the model. e larger the coefficient, the higher the impurity and the bad characteristics. On the contrary, the impurity is low and the characteristics are better, as shown in K is the number of categories; p k is the probability that the sample point belongs to the Kth category.
CART decision tree generation. Input test data set D and stop calculation conditions, and output CART decision tree.
(1) Suppose the training set is D, and calculate the Gini coefficients of all features on D. Suppose the possible value of feature A is a, test the correctness of A � a, divide the training data into D 1 and D 2 , and calculate the Gini coefficient of D 1 and D 2 when A � a.
(2) From all A and all possible cutting points a, choose the cut point and feature with the smallest Gini coefficient as the best cut point and feature. en produce two child nodes based on the best features and cut points, split the data set D, and assign it to two child nodes. (3) Recursively call formula (1) and formula (2) on the two generated child nodes until the number of samples in the node is less than the threshold or the Gini coefficient is less than the threshold. (4) Generate CART decision tree. CART decision tree pruning [16] algorithm is to subtract some subtrees from the bottom of the decision tree to make the model simple, which can improve the accuracy of predicting unknown data. Decision tree pruning is a dynamic process. Starting from the leaf node, the prediction error within the node and the prediction error after pruning are calculated from the bottom up. If the prediction error after pruning becomes smaller, then pruning is performed; otherwise, no pruning is performed. After pruning, the original nonleaf nodes inside will become leaf nodes. e category of the new leaf node is determined by the Decision Tree algorithm, and the above steps are repeated until the minimum prediction error is found. e loss function is shown in Note: α is the regularization parameter, C(T t ) is the prediction error of the training set, and |T t | is the number of leaf nodes of the subtree.

KNN Algorithm.
KNN algorithm, also known as the K nearest neighbor algorithm, is a machine learning algorithm that can solve classification and regression problems, and it is also a relatively mature algorithm in theory [17]. is experiment uses the KNN classification algorithm to classify according to the distance between different feature values [18]. e main idea of the algorithm is that when predicting a new value x, it is judged which category x belongs to according to the category of the nearest K points. In KNN, the dissimilarity between sample objects is determined by calculating the distance between objects. Generally, Manhattan distance or Euclidean distance is used to calculate the distance between sample objects [19], as shown in formulas (3) and (4):

GBDT Algorithm.
e full name of GBDT is a gradient descent tree. e main idea of the algorithm is to use an additive model to classify or regress data by continuously reducing the residuals generated during the training process [20]. is experiment uses the GBDTclassification algorithm and uses the difference between the predicted probability value and the true predicted probability value to fit the loss. e flow of the GBDT classification algorithm is as follows.
(1) Suppose the number of classifications is k, and the log-likelihood loss function is shown in (2) If the sample output category is k, then y k � 1; the expression of probability p k is shown in (3) According to formulas (5) and (6), the negative gradient error of category l corresponding to the i-th sample in the t-the round can be calculated; the negative gradient error formula is shown in (4) Generate a decision tree; the best negative gradient fitting value of each leaf node is shown in

XGBoost
Algorithm. e full name of the XGB algorithm is Extreme Gradient Boosting. It is a gradient boosting tree algorithm based on decision trees. After multiple iterations, each iteration produces a weak classifier. Each classifier is performed based on the previous round of classifier residuals, training. Weak classifiers need to meet the basic requirements of high deviation and low variance, because the process of algorithm training is to continuously reduce the deviation, thereby improving the accuracy of the final classifier. In general, the weak classifier uses the CART decision tree. Due to the simplicity and high deviation requirements, the depth of each classification tree will not be very deep. e final classifier is obtained by the weighted summation of the weak classifiers obtained in each round of training. e objective function is shown in Note: l is the loss function; Ω(f t ) is shown in

Data Sources.
e experimental data in this paper comes from the radiation interference experiment of the UAV LiDAR detection system. e UAV LiDAR detection system experiment consists of two parts, the electromagnetic interference radiation emission system and the UAV working system. In an electromagnetic radiation interference emission system, a signal generator generates electromagnetic signals, a power amplifier is used to amplify the power, and then the directional coupler feeds the radiating antenna. e power meter measures the power of the power amplifier through the directional coupler, which can accurately measure the forward output power and the backward reflected power and monitor the working status of the experimental system. Adjust the gain multiple of the power amplifier and the output level of the signal generator; the intensity of the radiated electric field can be adjusted. e radiation interference experiment of the UAV LiDAR detection system is shown in Figure 2. e radar technical indicators are shown in Table 1. In the radiation interference experiment of the UAV LiDAR detection system, use the laser detection radar in the drone as the test equipment and use strong electromagnetic fields to conduct radiation interference experiments. rough experiments, verify the interference of the equipment under test in different electromagnetic field environments. 135,658 pieces of data are obtained through experiments. e data that needs to be collected in real-time during the experiment include error, angle, frequency, and field strength.
e range of distance is 0-150 m, and the range of angle is 0°-360°. e frequency range is 1.

Data Sources.
e data preprocessing in this experiment mainly includes three aspects: abnormal point processing [21], sample equalization, and data standardization.

Handling of Abnormal Points.
e LiDAR detection system on the UAV has problems such as gaps and nonsmooth surfaces during the detection and scanning process, so abnormal points will inevitably appear. We use the K-means algorithm to deal with the abnormal points. e main idea of the algorithm is to use the elbow method to determine the number of clusters. According to the results of the clustering, calculate the distance from each point to the cluster center, and compare the distance with the threshold. e abnormal point is the abnormal point that is greater than the threshold. Click to delete it. e SSE formula is shown in formula (11). e Euclidean distance formula is shown in formula (12):

Sample Equalization.
Unbalanced sample categories will result in fewer features in the classification with small sample size, and it is difficult to find the regular pattern.
After the model is trained, it is easy to rely on a small number of data samples to cause overfitting, which makes the model predict new data. e accuracy obtained is poor, so the data set needs to be equalized. In this experiment, the SMOTE algorithm is used to solve the problem of unbalanced data set samples [22]. e SMOTE algorithm analyzes and simulates a small number of category samples and then adds the simulated data to the data set to balance the unbalanced data set. e simulation process of a few categories of samples of the SMOTE algorithm draws on the KNN algorithm. Select a sample in a minority category, use Euclidean distance to calculate the distance from this sample to all samples in the minority category sample data set, and get its K nearest neighbors. e Euclidean distance formula is shown in e sampling ratio is set according to the sample imbalance ratio, and then the sampling magnification n is determined, and several samples are randomly selected from the K nearest neighbors of each minority category. Randomly select a number from [0, 1], multiply it by the randomly selected neighbor, and add x. e formula is shown in e SMOTE algorithm does not use random oversampling, which effectively prevents the problem of overfitting and makes the model have better generalization [23]. e sample before sampling is shown in Figure 3. Figure 4 shows the sample after sampling. It can be seen from Figure 3 that the data set has a sample imbalance. It can be seen from Figure 4 that the sample data set has reached equilibrium after sample equalization using the SMOTE algorithm.

Evaluation Index.
is paper uses accuracy, Kappa coefficient, and Hamming distance to evaluate the prediction effect of electromagnetic environment effects.
TP means that the classifier identified the sample correctly, and the classifier considered the sample as positive. TN means that the classifier identified the sample correctly, and the classifier considered the sample as negative. FP means that the classifier identified the sample incorrectly, and the classifier considered the sample as positive; therefore, the sample is actually negative. FN means that the classifier identified the sample incorrectly, and the classifier considered the sample as negative; therefore, the sample is actually positive. e sample is actually a positive sample.
P 0 represents the total classification accuracy; P e represents (the number of real samples of the i-th type multiplied by the number of predicted samples)/the square of the total number of samples.

Hamming Distance.
e Hamming distance is used to measure the distance between the predicted label and the real label, and the value range is [0, 1]. e distance is 0, indicating that the real result is the same as the predicted result. If the distance is 1, it means that the actual result is      XOR Y i,j , P i,j L . (17) Note: N represents the number of samples, L represents the number of tags, Y i,j represents the true value of the j-th component in the i-th prediction result, P i,j represents the predicted value of the i-th component in the j-th prediction result, and XOR represents exclusive OR.

Model Flow Chart.
e main voting algorithms in machine learning are the bagging algorithm and the boosting algorithm. e bagging algorithm and the boosting algorithm are relatively simple to average or vote on the results of the basic model, and there may be large learning errors. erefore, this article uses another learning method, Stacking model fusion algorithm. e Stacking model fusion algorithm does not perform simple logic processing on the results of the model but adds a layer outside the model. ere are two layers of models in total. e first layer model is established through the prediction training set, and then the result of the training set prediction model is used as input, and then the second layer new model is trained to obtain the final result. Stacking model fusion algorithm can reduce the deviation of bagging algorithm or boosting algorithm.
From the model prediction results of ADB algorithm, SVC algorithm, RF algorithm, DT algorithm, XGB algorithm, GBDT algorithm, and KNN algorithm, we can see that DT algorithm, XGB algorithm, GBDT algorithm, and KNN algorithm have better prediction results, so using these four models as the base model of Stacking, the algorithm and input of the metamodel have an important impact on Stacking. e input features of the metamodel are the combination of all the prediction results of the base model, and the method splices all the features of the base model without missing and fully uses all the data. e metamodel usually selects the best prediction result from the base model, so the KNN algorithm is chosen as the metamodel in this experiment to ensure the accuracy of Stacking model prediction.
Flowchart of seven models is shown in Figure 5. e main process is as follows: (1) Data preprocessing. ere are unbalanced sample category distribution and abnormal points in the data set, and the SMOTE algorithm is used for sample category equalization processing. Use the K-means clustering algorithm to find outliers and delete them.
e Z-score algorithm is used to standardize the data set.
(2) Model selection, training, and prediction. Choose a machine learning model from the seven models, in turn, use the training set to train the model, and then use the test set to test and predict the model. Stacking (DXGK) fusion model flowchart is shown in Figure 6. e main process is as follows: (1) Data preprocessing. ere are unbalanced sample category distribution and abnormal points in the data set, and the SMOTE algorithm is used for sample category equalization processing. Use K-means clustering algorithm to find outliers and delete them.
e Z-score algorithm is used to standardize the data set.
(2) Model building and training. In this experiment, ADB algorithm, SVC algorithm, RF algorithm, DT algorithm, XGB algorithm, GBDT algorithm, and KNN algorithm are used to predict the electromagnetic environment effect. rough the comparison of the prediction results of each model, it can be seen that the DT algorithm, the XGBoost algorithm, the GBDT algorithm, and the KNN algorithm have good prediction results. In the experiment, a two-layer Stacking fusion algorithm model is constructed. e first layer is composed of multiple basic learners, and the second layer of metamodel is based on the output of the first layer of basic learners as features and added to the training set for retraining, thereby obtaining Complete Stacking model. (1) Electromagnetic interference comparison chart of UAV LiDAR detection system. e complex electromagnetic environment will cause electromagnetic interference to the UAV LiDAR detection system. e electromagnetic interference comparison diagram of the UAV LiDAR detection system is shown in Figure 7.
e red line in Figure 7 represents the data before electromagnetic interference, and the blue line represents the data after electromagnetic interference. It can be seen that electromagnetic interference will cause strong interference to the UAV LiDAR detection system.

Complexity
(2) Use the grid search method to optimize each parameter, and the parameter optimization is shown in Table 3. From Figures 8-15, it can be seen that, among the eight prediction methods, the true value and predicted value fit     Table 4. e comparison results of the evaluation indicators of the eight algorithm models are shown in Figure 16.
It can be seen from Table 4 and Figure 16 that the performance of the eight algorithm models is ranked from low to high, ADB, SVC, RF, DT, XGB, GBDT, KNN, and Stacking (DXGK). From the evaluation results of the algorithm model, it can be seen that the accuracy of the ADB model is 0.7360, the Hamming distance is 0.2639, and the Kappa coefficient is 0.6480. e various model evaluation indicators of the ADB model are the lowest among the eight algorithm models. e Stacking (DXGK) fusion model accuracy rate is 0.9762, Hamming distance is 0.0336, and Kappa coefficient is 0.9552. e Stacking (DXGK) model is compared with the other seven models, and it can be concluded that the various evaluation indicators are good. erefore, if you choose a machine learning algorithm to predict the electromagnetic environment effects and improve the detection accuracy and safety of the UAV LiDAR detection system in the complex battlefield electromagnetic environment, the Stacking (DXGK) model is a more suitable one.

Conclusions
In this paper, a Stacking (DXGK) fusion model based on machine learning is proposed to predict electromagnetic environment effects. Compared with traditional mathematical modeling methods and single models, the fusion model has a higher accuracy rate, better robustness, and generalization ability. e experimental results show that the Stacking (DXGK) fusion model has a better prediction effect than the ADB, SVC, RF, DT, XGB, GBDT, and KNN. It can be seen that the classification prediction accuracy of the single model is low, and the use of a multiple fusion model can effectively improve the accuracy of the classification prediction. erefore, the Stacking (DXGK) fusion model is more suitable for predicting electromagnetic environment effects and can provide a corresponding reference for improving the detection accuracy and safety of the UAV LiDAR detection system. e future research work mainly focuses on how to adjust the model parameter values to further improve the prediction accuracy. It is necessary to further study the antielectromagnetic interference of UAV LiDAR detection system in the complex electromagnetic environment.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.