Multispectral Remote Sensing Data Analysis Based on KNNLC Algorithm and Multimedia Image

In order to combine multimedia imagery and multispectral remote sensing data to analyze information, preprocessing becomes a necessary part of it. It is found that the KNN algorithm is one of the classic algorithms of data mining. As one of the most important branches in the ﬁ eld of data analysis, it is widely used in many ﬁ elds such as classi ﬁ cation, regression, missing value ﬁ lling, and machine learning. As a lazy algorithm, this method requires no prior statistical knowledge and no additional data to train description rules and is easy to implement. However, the algorithm inevitably has many problems, such as how to determine the appropriate K value, the unsatisfactory e ﬀ ect of data processing for some special distributions, and the unacceptable computational complexity of high-dimensional data. In order to solve these shortcomings, the researchers proposed the KNNLC algorithm. Then, taking the classi ﬁ cation experiment as an example, through the comparison of the experimental results on di ﬀ erent data sets, it is proved that the average level of the classi ﬁ cation performance of the KNNLC algorithm is better than the classic KNN classi ﬁ cation algorithm. The KNNLC algorithm shows better performance in most cases, with an accuracy rate of 2 to 5 percentage points higher. An improved algorithm is proposed for the nearest neighbor selection strategy of the traditional KNN algorithm. First, in theory, combined with the theory of sparse coding and locally constrained linear coding, the classical KNN algorithm is improved, and the KNNLC algorithm is proposed. The comparison of the experimental results on the data set proves that the average level of the KNNLC algorithm is better than the classical KNN classi ﬁ cation algorithm in terms of classi ﬁ cation performance.


Introduction
Remote sensing image fusion is a technology that combines multisource remote sensing images through advanced image processing. It makes full use of the different characteristics of a variety of data, so that the image has a higher spectral and spatial resolution at the same time, and improves the vision of the image. The effect and accuracy of image feature recognition and classification accuracy are shown in Figure 1 [1]. Remote sensing image fusion is a hot research topic in the international remote sensing community in recent years. In the method of image fusion, there are some classic algorithms, such as HIS transformation method, COS transformation method, HIS transformation method, and HSV transformation method. In recent years, with the introduction of wavelet transform into the field of image processing, image fusion methods based on wavelet transform have attracted people's attention. The fusion of SPOT panchromatic image and multispectral image based on 2-ary and 3-ary wavelet is studied, respectively. However, these two algorithms simply replace low-resolution images with high-resolution remote sensing images for low-frequency components after wavelet decomposition, without considering the loss of image features; although the feature-based binary wavelet image is studied fusion, but without considering the resolution of the image to be fused, the fusion effect is not very good [2]. Based on the in-depth study of wavelet transform fusion method, a new fusion method is proposed, a feature-based multiband wavelet fusion. The fusion results of SPOT image and TM5, 4, 3 image, SPOT panchromatic image, and SPOT multispectral band image are given and compared with other fusion methods [3]. The experimental results show that the method in this paper has obvious advantages compared with other fusion methods. Although the KNN algorithm has a good effect on applications such as classification and prediction in many data sets, it inevitably produces many problems that need to be solved, such as the high time complexity and space complexity of the algorithm, and the K value. The processing effect is not ideal for some special distributed data and the computational complexity of high dimensional data is unacceptable. These shortcomings must be solved for a mature algorithm. Therefore, experts and scholars who are interested in this direction have done a lot of research and obtained many optimization algorithms.

Literature Review
Jin, R. et al. systematically analyzed the effects of traditional spectral parameters and two-band normalization and ratio vegetation index under different observation angles in estimating wheat leaf nitrogen content (LNC), thereby establishing a multiangle quantitative monitoring of wheat canopy leaf nitrogen content model [4]. Du, JH et al. found that the canopy reflectance and the coefficient of determination of 40 conventional spectral vegetation indices and LNC decreased with the increase of the observation angle, regardless of the forward or backward observation direction, and reached the maximum at -20 in the backward direction value [5]. RI-1dB and EVI-1 have the closest relationship with LNC at -20°backward and vertical angle, respectively. The areas with good correlation between the ND and SR parameters of the original spectral reflectance combination of the two bands and LNC are mainly concentrated in the blue-red light band, the green-red band, and the red-side red band combination range. This sensitive area varies with the spectrum observation. The angle is different. The new Multi-Angle Vegetation Index (MAVI), which uses the combination of sensitive spectral parameters and observation angles, can better estimate LNC. After indepen-dent data testing at different years, the MAVIsR model is most sensitive to leaf nitrogen content. By systematically analyzing the angular sensitivity characteristics of different wavebands and spectral parameters, Fang, X. et al. studied the quantitative relationship between suitable characteristic parameters extracted by different spectral analysis techniques and the nitrogen content of leaves. The results show that the correlation between spectral vegetation index and leaf nitrogen content is better than that of vertical and forward observation angles in the backward observation angle. The red edge parameters mND705, GND (750, 550), NDRE, and RI-1dB are compared with LNC. The relationship is the closest, but the difference is large under different experimental factors, especially when the leaf nitrogen content is high (>4.5%), the spectral parameters tend to be saturated [6]. Yang, FC and others found that the newly constructed angle insensitive parameter (AIVI) reduces the influence of different test factors. In the range of -10°~40°o bservation angle, AIVI can establish a unified and stable monitoring model, and it has been independently tested. According to the data test, it is the best to construct a monitoring model of wheat canopy leaf nitrogen content based on AIVI, which has strong angle adaptability [7]. Ren, J. et al. found that the inversion accuracy of wheat leaf nitrogen content based on FA-BPNN analysis was significantly higher than that of conventional spectral parameters under different observation angles [8]. Therefore, both the new vegetation index AIVI and FA-BPNN can reliably monitor the nitrogen content of wheat leaves under different experimental conditions. By comparing the relationship between various spectral analysis methods and LAI under different observation angles, the appropriate band sensitive to changes in LAI can be extracted, and the observation angle; thus, a quantitative monitoring model for wheat LAI    Journal of Sensors spectral analysis methods are more suitable for monitoring LAI (leaf area index) near the vertical angle. The spectral reflectance and the correlation between spectral parameters and LAI (leaf area index) in the backward observation direction are higher than those in the forward observation direction. Li, J. et al.
found that the two-band ratio (SR) and normalized index (ND) under different observation angles did not show outstanding monitoring advantages, but the SR effect was better than the ND method [9]. Using factor analysis technology, it is found that the load of the green light band decreases with the increase of the observation angle in the first factor and increases with the increase of the observation angle in the second factor. After independent test data in different years, the wheat LAI (leaf area index) monitoring model established with the spectral parameter VIopt as a variable has good test results and can be used for accurate estimation of wheat LAI (leaf area index). Torra, V. et al. analyzed and compared the saturation, angle sensitivity, and variety sensitivity of commonly used vegetation indices for estimating LAI (leaf area index). The results of wide-angle adaptability show that the accuracy of LAI (leaf area index) estimation of spectral parameters is better for erect varieties than for discrete varieties. Nonperpendicular observation angles did not significantly improve the ability of spectral parameters to estimate LAI (leaf area index) [10]. Except for EVI and TVI, the spectral parameters NDVI, SAVI, OSAVI, MSAVI, WDRVI, MTVI, and mND705 all tend to be saturated when LAI is greater than 4. KA Zweig et al. found that the angle reduction coefficient Kf constructed based on green light and near-infrared bands is closely related to LAI (leaf area index). The product of VIs and Kf effectively alleviates the saturation and variety sensitivity of LAI (leaf area index) estimation at different observation angles, and significantly improves the monitoring accuracy and adaptability of LAI (leaf area index) [11]. V., Subramaniyaswam et al. found that in the abovementioned spectral parameters, except for WDRVI, EVI, and TVI, the other spectral parameters and the Kf product established a unified monitoring model at all observation angles.
The prediction model based on mND705 and OSAVI spectral parameters is more accurate and reliable [12]. The effects of various spectral processing methods to estimate chlorophyll density were analyzed by integrating spectral data from different observation angles, and a multi-angle remote sensing monitoring model for wheat leaf pigment density was established. The results show that the spectral reflectance that has a good correlation with chlorophyll density is mainly concentrated in the red edge and the near-infrared region (720-900 nm). The spectral parameters VOG1, RI-1dB, NDRE, SDr/SDb, and DD are closely related to the chlorophyll density. The sensitive bands of the normalized and ratio vegetation index of the two bands in the backward observation direction are mainly concentrated in the red area, and in the forward observation direction, they are mainly concentrated in the blue and red light areas.
Research by Hu, J. et al. found that the first factor of the FA-BPNN model is mainly concentrated in the blue and red bands under different observation angles, and the second factor is mainly concentrated in the near-infrared region. Backward observation close to the vertical observation angle is beneficial to improve the prediction accuracy of chlorophyll density. The spectral parameters SDr/SDb, DD, ND (720, 760), and ND (732, 738) are the most effective for monitoring wheat chlorophyll density [13]. K nearest neighbors (KNN for short) is an extension of the nearest neighbor method and is a lazy learning method based on instance statistical classification. The working principle of the KNN classification algorithm is to use a similarity measure to compare each attribute or feature of the test sample with the attributes or features corresponding to all training samples in the training sample set, and arrange the test samples corresponding to the similarity in descending order. According to this, the first K most similar (measured nearest) training samples (K nearest neighbors) can be found in the training set. Generally, K is selected as an integer not greater than 20. Finally, sort the number of occurrences of the class labels of the K training samples in descending order, and the label corresponding to the first place in the sequence is the class label of the test sample [14,15]. First, for a given data set, if any data in the set has a class label, Figure 2 is a classic example of the KNN classification process. In Figure 2, the training sample includes two types of triangles (Angle) and squares (Square). For the sake of simplicity and clarity of description, we use T and S to represent their numbers, respectively. The dots in the figure are test samples, and K is the number of training samples closest to the test sample, that is, the number of nearest neighbors. When K =3, T =2, S =1, and T > S in the small circle in the dotted line as shown in the figure. According to the principle described above, the test sample is assigned to the triangle type at this time. When K =5 is adjusted, T =2, S =3, and T < S within the large dashed circle in the figure; at this time, the test sample label is judged to be a square. Figure 3 is a flowchart of KNN classification. The specific steps involved in calculating the similarity measurement, selecting the nearest neighbor, and classifying when executing the algorithm will be described in detail later. The flow of the algorithm is described as follows: Supposing that there are m samples in the training set, and the 3 Journal of Sensors number of attributes of each sample is n, then the training set can be recorded as The set consisting of the class label of each sample in the set can be denoted as L = fC i ji = 1, 2, ⋯, mg. The test sample set is denoted as S = fS i = ðs i1 , s i2 , ⋯s in Þji = 1, 2, ⋯, ag, where a is the number of test samples [16,17]. Then, KNN classification calculation, where a is the number of test samples. The KNN classification algorithm can be described in Figure 2.
Simply put, the similarity between the test sample and each training sample in the KNN algorithm is measured by calculating the distance. For different data, using an appropriate distance metric is the premise to obtain a good data processing effect. The distance metrics commonly used in the KNN algorithm are Euclidean Distance, Manhattan Distance, MinKowsKi Distance, and Hamming Distance. Given training samplesX = ðx1, x2, ⋯xnÞand test samplesS = ðs1, s2, ⋯, snÞ, then the distancedistðX, SÞbetween them is calculated with the following formula.
See formula (1) for Euclidean distance: See formula (2) for Mahalanobis distance: See formula (3) for Ming's distance: The Hamming distance is shown in formula (4): Minmax normalization is the most common data normalization processing method. The principle of this method is to use a mapping function to project attributes or eigenvalues into the [0,1] interval. Minmax normalization can be expressed by the following formula (5): Among them, a is the original value, a ′ is the value mapped in the [0,1] interval, and min F and max F are the lower and upper bounds of the values belonging to the same attribute or feature, respectively. There are many commonly used normalization methods, but the basic principles are similar. For example, the form introduced below is shown in formula (6): The rules of this voting mechanism will also be used in this study. The mathematical representation of the most frequent rule is shown in Equation (7): Since attributes naturally have such distribution characteristics, in the algorithm, in order to make the processing results better reflect the objective facts, weights are generally assigned to each attribute. In the KNN algorithm, the weighting formula of the label can be used as formula (8) form representation: Enter test sample data Calculated distance measure We get K nearest neighbors based on the similarity measure The label that appears most frequently in the nearest neighbor i the label of the test sample

Journal of Sensors
Among the many weighting methods, the weighting method based on the reciprocal similarity is the most classic, and it is also used consistently by the KNN algorithm. Its mathematical expression is shown in Equation (9): Among them, P is the weighted power exponent, usually taken as P = 2, see formula (10): 3.2. Data Analysis and Utilization. A Locally Constrained Linear Coding Based on Related Research (Locality-constrained Linear Coding, LLC) to improve the classic KNN algorithm. Multiangle remote sensing data has the characteristics of rich information and large amount of data. The selection of spectral absorption characteristic parameters, sensitive angles, and optimal calculation methods are important issues in the study of hyperspectral remote sensing. In addition to the data processing methods commonly used in conventional vertical remote sensing, this study also adopted normalization (ND), ratio (SR), and neural network (BP) analysis methods. The specific methods are as follows: Figure 4 shows typical reflections of vegetation spectral curves, which form absorption valleys at 455 (blue light), 680 (red light), 980, 1200, and 1468 nm, and reflection peaks at 550 (green light), 1090, 1285, 1685, and 2200 nm [18,19]. Factor analysis is a statistical method for extracting common factors from multiple variables for the purpose of dimensionality reduction. Factor analysis can make factor variables more interpretable through rotation, as shown in Figure 5. Use SPSS software to perform factor analysis on the standardized spectrum data, select the critical factor numbers whose cumulative contribution rate exceeds 99%, and output the factor data. The BPNN model is provided by Matlab's Neural Network Toolbox. The network is divided into an input layer, a hidden layer, and an output layer. In this study, the input vector is I and the learning goal is T. The input layer is a comprehensive factor with a large contribution rate obtained after factor analysis: the number of neurons in the middle layer is the number of comprehensive factors, and the activation function of the middle hidden layer is "TANSIG"; the neurons in the output layer are 1, and the activation function is "PURELN"; the training function uses the TRAINLM function [20,21].

Results and Analysis
In the classification experiment, we select 4 typical sample sets from the UCI data set as experimental materials. The basic situation of the sample data is shown in Table 1. Among them, Australian and Magic are data sets with only two labels, respectively, and the experiments on them belong to the binary classification experiment, while the experiments on the two data sets of Mpgdata and Ins belong to the case of multiple classification [21][22][23]. In the four data sets, Iris is a common data set prone to overfitting. The sample is quoted only to show the feasibility of the algorithm and the degree of improvement compared with the more classic algorithms. Its classification effect is obviously difficult to reach in practice. However, the amount of data in the Magic data set is relatively large. Although it is not as good as the big data standard, it is still not general. From this, the potential of the improved algorithm in processing high-capacity data sets compared to traditional algorithms can be seen.
For algorithm classification performance, regardless of the sample size and the number of label classes, the establishment of the corresponding confusion matrix (confusion1matrix) is a relatively common and objective evaluation method [24,25]. The following briefly introduces the confusion matrix. In the  Table 2. Table 2 shows a quantitative comparison of the two algorithms on the four data sets. The use of the mean ± standard deviation makes the results more accurate. For classification algorithms, accuracy is often one of the most concerned evaluation criteria. Compared with the classic KNN classification algorithm, the KNNLC algorithm shows better performance in most cases, with an accuracy rate of 2 to 5 percentage points higher [26]. Absolutely, the classic algorithm is classic because of its extensive effectiveness. For data with different distributions, the adaptability of classic algorithms may be more common. For the classification of Mpgdata data sets, sometimes the effect of improved algorithms is not satisfying. It shows that the KNNLC algorithm is sensitive to specific distributed data and needs further improvement. The results on high-capacity sample data sets show that to a certain extent, the KNNLC algorithm may be more suitable for processing high-dimensional data, and its classification accuracy and stability are significantly better than the classic KNN algorithm. On the whole, in the classification problem, the KNNLC classification algorithm has better performance than the classic KNN classification algorithm, and the higher the data dimension, the more obvious this advantage. In fact, this is because KNNLC uses local coding to obtain neighbor samples, which meets the expectations of theoretical research, as shown in Table 3. Table 3 shows the quantitative comparison of the experiments of the two algorithms on the four data sets, respectively, using the form of mean ± standard deviation to make the results more accurate. For classification algorithms, accuracy is often one of the most concerned evaluation criteria. As can be seen from the above table, compared to the classic KNN classification algorithm, the KNNLC algorithm shows better performance in most cases, with an accuracy rate of 2 to 5 percentage points higher.

Conclusion
Research on the improvement of KNN algorithm and its application in the field of image processing. This article mainly    Journal of Sensors discusses and improves the traditional KNN algorithm and the mean filtering algorithm from the perspective of the nearest neighbor selection strategy. Therefore, in the process of obtaining the neighbors, the neighbors with higher similarity can be captured. At the same time, based on this idea, the template selection of the mean filter is regarded as the neighbor selection, and the membership function is combined to obtain a more effective filter template, and experiments have proved the advantages of the improved algorithm [27]. Aiming at the problem that traditional KNN algorithm is sensitive to data distribution, combined with sparse coding and LLC theory, using the nearest neighbor selection strategy of local coding, KNNLC algorithm is proposed, which improves the effect of classic KNN algorithm. Through experiments on multiple representative data sets, it is proved that the KNNLC algorithm has great advantages and potential compared with the classic KNN algorithm in classification performance. The KNNLC algorithm shows better performance in most cases, with an accuracy rate of 2 to 5 percentage points higher.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare no conflicts of interest.