Parameter Estimation for the Field Strength of Radio Environment Maps

The parameters of a radio environment map play an important role in radio management and cognitive radio. In this paper, a method for estimating the parameters of the radio environment map based on the sensing data of monitoring nodes is presented. According to the principles of radio transmission signal intensity losses, a theoretical variogram model based on a propagation model is proposed, and the improved theoretical variation function is more in line with the attenuation of radio signal propagation. Furthermore, a weight variogram fitting method is proposed based on the characteristics of field strength parameter estimation. In contrast to the traditional method, this method is more closely related to the physical characteristics of the electromagnetic environment parameters, and the design of the variogram and fitting method is more in line with the spatial distribution of electromagnetic environment parameters. Experiments on real and simulation data show that the proposedmethod performs better than the state-of-the-art method.


Introduction
The radio environment map, which was first proposed by Zhao et al. [1], is mainly used in cognitive radio [2,3].A radio environment map is an integrated database that is used to describe the electromagnetic environment.Because of its wide range of applications for radio [4][5][6][7][8], it has been further extended.The radio environment map is a comprehensive database that contains many fields of information, such as the available spectrum profile, geographical features, rules, relevant laws and regulations, radio equipment situation, and expert experience [9].The radio environment map is fundamental to the construction of a communication network, improving operation efficiency, and managing radio resources.Ojaniemi et al. [10] believe that the core content of the radio environment map is the field strength estimation, so accurately estimating the field strength of the radio signal in the geospatial space with a certain granularity is key.Pesko et al. [11] called this problem the construction of the radio frequency layer.In this paper, we call it radio environment map parameter estimation.This problem, especially since 2012, has been increasingly studied in depth [11][12][13][14][15].
Methods for estimating the parameters of the radio environment map can be divided into three categories [11].The first are based on direct spatial interpolation, the second are based on the propagation model, and the third are the hybrid combinations of the first two methods.Propagationbased methods require a large amount of information, including the signal transmission source, latitude and longitude coordinates, antenna height, transmitting power, and even geographical information and climate information on the propagation path, which greatly limits the scope of application of this method.At the same time, because most propagation models are empirical models based on radio transmission, their universality is not strong.Ojaniemi et al. [10] showed that, under certain conditions, this method has lower prediction accuracy than spatial interpolation methods.
In recent years, the focus of research on radio environment map parameter estimation has transferred to spatial interpolation-based methods, especially methods based on geostatistics.In this kind of method, the measured values of ground truth are obtained by radio monitoring sensors, and then the spatial environment parameters of the remaining 2 Wireless Communications and Mobile Computing locations are obtained using spatial interpolation estimation.Comparative studies of spatial interpolation methods were made in [15][16][17].Comparative studies of the inverse distance weighted (IDW) method, Spline interpolation method, and Kriging interpolation method were presented in [15,16,18], and the IDW, gradient plus inverse distance squared (GIDS), and Kriging methods were compared in [17].Prediction experiments on indoor and outdoor electromagnetic environments showed that the IDW method is more robust, while the Kriging method is the most accurate method.Reference [19] presented an approach that uses the spatial dependence of ground truth data and constructs the signal intensity map using Kriging.In [20], several spatial interpolation methods based on IDW were analyzed and used to estimate the spatial distribution of radio field strength.Reference [21] proposed a geostatistical method for the radio environment map and through an actual case study demonstrated that the method is superior to the method based on path loss model and data fitting.At the same time, this kind of method relies on the data collected by the monitoring sensor, so the distribution and quantity of the monitoring sensors affect the predication accuracy of the radio environment map parameters.In [22], the relationship between the number of sensors and construction error of the radio environment map is analyzed in detail.
Existing studies show that the Kriging method is the best way to estimate the parameters of the radio environment map.However, the radio transmission process is affected by various factors such as the number of transmitting stations, geographical environment, and weather.In practice, the number of monitoring sensors is limited, so the data sampling points are sparsely distributed, which increases the difficulty of estimating the parameter space distribution.At the same time, because the Kriging algorithm is based on a variogram, its linear quadratic optimization is based on the assumption that the data set conforms to the normal distribution and meets the second-order stationary hypothesis or quasisecond-order stationary assumption.Therefore, a nonnormal distribution will affect the stability of the data and cause the variogram to produce a proportional effect.That is, it will improve the sill and nugget values and increase the estimation error [23].To solve this problem, we propose a method to estimate the parameters of the radio environment map based on the radio propagation model and the Kriging method.This method retains the advantages of both the propagation model and the Kriging method and hence obtains better parameter space prediction precision than the single method.The main contributions of this paper are as follows: (1) Using the radio propagation model to improve the variogram of the Kriging algorithm, a new theoretical variogram model for radio environment map parameter estimation is proposed.(2) Based on the characteristics of radio signal propagation and data acquisition, a weighted optimization of the variogram is proposed, and particle swarm optimization (PSO) is applied to fit the modified variogram.The modified Kriging algorithm can hence be better adapted to the spatial distribution of the radio environment parameters.
The rest of the paper is organized as follows.Section 2 describes the related research on interpolation-based prediction of the radio environment.Section 3 introduces the improved Kriging method based on the electromagnetic propagation model and PSO-based weighting variogram fitting.Section 4 presents the results of some comparative experiments on real and simulation data to examine the effectiveness of the proposed method.Finally, Section 5 concludes this work.

Related Works
The IDW has been considered for radio environment map parameter space estimation in many studies [14-18, 20, 21, 24].The estimated value of the forecast point parameter   can be calculated by the weighted sum of the actual observation values of nearby observation points.This method considers that the contributions of the observation points closer to the prediction point are greater; otherwise, the contribution is smaller, which can be expressed as follows: where   () is a sampled value of the actual parameter at the th observation point,   is the Euclidean distance between the th observation point and the predicted point, and  is a strength parameter that defines the decrease in weight as the distance increases.When  equals one, the method is called IDW, and when it equals two, the method is called the inverse distance squared weight.Spline interpolation is another widely used method for estimating the parameters of the radio environment map [16,20,25].In these methods, the Spline is generated using the actual measured values of all observation points to guarantee global smoothness, and then the parameter values of the predicted points are calculated using polynomial fitting.
The Kriging method is a method based on the spatial analysis of a variogram, which is an unbiased optimal estimation of regionalized variables over a finite area, and is considered to be the best method for estimating the parameters of the radio environment map [11][12][13][13][14][15][16][17][18][19][20][21].The Kriging method is divided into ordinary Kriging and universal Kriging depending on the existence of space field drift.Ordinary Kriging is more commonly used than universal Kriging [11].The following is a description of the ordinary Kriging method.
For regionalized variable (), the sample values for a series of observation points  1 ,  2 , . . .,   are ( 1 ), ( 2 ), . . ., (  ).Then, the estimated value (  ) of grid point   in a region can be estimated by a linear combination; that is, where   is the th weighting coefficient.According to the principle of optimal unbiased estimation, the value of   should satisfy the following conditions: where   (  ) is the real sample value.Assuming that () satisfies the intrinsic hypothesis, then according to the Lagrange theorem, the ordinary Kriging equations can be expressed as follows: where (  ,   ) is the value of the variogram between sampling points   and   and  is the Lagrange constant.
Weighting coefficient   can be calculated by (4).When   is substituted into (2), the estimation value (  ) of grid point   can be obtained.The process of solving (  ) shows that the key of Kriging interpolation is how to obtain the best estimate of variogram (ℎ).

Proposed Method
where (ℎ) is the number of pairs of observation data points with lag distance ℎ, (  ) is the value of the regionalized variable at position   , and (  + ℎ) is the value of the regionalized variable at a distance ℎ from   .When the data distribution is relatively uniform, the basic lag distance can be equal to or slightly larger than the minimum distance between the observed data points.Alternatively, the basic lag distance can be obtained by comparing and analyzing the variability and stability of the experimental variogram of several candidate basic lag distances.
In practice, the most important parameter of the radio environment map is the signal radiation level in units of decibels (dB).If the Kriging algorithm is used directly, the expression of (ℎ) can be simply obtained by (5) in units of dB 2 .However, this is not consistent with the dB units of propagation loss that are calculated by the radio propagation model.Hence, the variogram is not dimensionally consistent with the propagation model.We believe that the transmission loss of the propagation model represents the correlation between the two radio environment parameters.Therefore, to combine the variogram with the propagation model, the definition of the variation function used in traditional geostatistics is modified as follows: The newly defined variogram is called the parameter estimation variogram of the radio environment map, and the dimensions of the value calculated by the new variogram are consistent with the dimensions of the transmission loss obtained by the propagation model.

Theoretical Variogram Model Based on Propagation
Model.It is necessary to use the theoretical variogram model to fit the actual variogram.The commonly used theoretical models for a variogram are the Gaussian, exponential, and spherical models.In practice, the most commonly used model is the spherical model proposed by Pesko et al. [11].
In this paper, two new theoretical variogram models are proposed based on the Longley-Rice model: one uses the Longley-Rice to model the theoretical variogram directly, and the other introduces free space transmission loss into the first model.The Longley-Rice model, also called the irregular terrain model [7], is mainly used to predict the median path loss over irregular terrain.The median value of the propagation loss in free space for different path lengths is calculated as follows: where  min ≤  <   is the visual distance spread,   ≤  <   is the diffraction propagation distance, and  ≥   is the scattering propagation distance.In addition,   ,   , and   are propagation losses for sight, diffraction, and scattering in free space, respectively,  1 and  2 are propagation loss coefficients, and   and   are loss coefficients for diffraction and scattering, respectively.If the influence of the free space transmission loss is not taken into account, the loss on the whole transmission path can be expressed by (7).In this paper, the loss prediction function of visual distance spread is used, and hence the propagation loss can be expressed as follows: where all other variables are defined as in (7).
For given parameters such as the heights of the transmitting and receiving antennae, the value of this function is only related to distance .Hence, the loss prediction can be rewritten as follows: where ℎ is the distance of two data points and  is a very small constant, which prevents division by zero.Note that, in practice, the two data sampling points may be in the same coordinate position, and ℎ is equal to 0 in this case.
Coefficients  1 ,  2 , and  3 are coefficients to be determined.The value of  is set to 14,000, which is used to simulate the distance of sight.
If the effect of free space propagation loss is taken into account, the overall loss across the propagation path is where 32.45 + 20 lg  + 20 lg  is the loss of free space propagation,  is the propagation distance, and  is the emissive frequency.According to the same ideas above, (11) can be rewritten as where  4 is a coefficient to be determined and the other variables are defined as in (9).Equations ( 9) and ( 11) are the two theoretical variogram models proposed in this paper.The new models are more consistent with the parameter change behaviors in the radio environment map and more accurately reflect the relationship between parameter space changes.

Weighted Fitting Algorithm for the Theoretical Variogram.
Using the ground truth data, the theoretical variogram is fitted and the undetermined coefficients in the model are obtained.In traditional methods, the least-squares method is mainly used to fit the function.Its fitness function is where () is the fitness function value of the th variable, ℎ , is the th lag of the th variable,  * (ℎ , ) is the estimated value of variogram at position ℎ , , and (ℎ , ) is the real value of the variogram at position ℎ , .The disadvantage of this method is that it considers the contribution of all data to be equal without considering outliers and specific data points as well as the specificity of the radio environment parameters.
In practice, because of building occlusion and the effects of an uneven distribution of sampling nodes, abnormal noise exists.To overcome this problem, the method proposed in this paper increases the corresponding weight coefficient of the fitness function to strengthen or reduce some environmental factors or meet the distribution characteristics of the variogram.To address the problem of uneven sampling point distributions of the radio environment parameters, the first weight coefficient  1 = /  is introduced, where   is the number of sample point pairs that correspond to a certain lag distance and  is the total number of sample point pairs.The second weight addresses the inconsistencies and inaccuracies in the sampled point data, which is caused by the electromagnetic shadowing of buildings and reflections, multipaths, and radio propagation diffraction.For example, there could be some abnormally large or unusually small sampled values.To reduce the impacts of unreasonable sample points on the fitness function, the weight coefficient is  2 = (ℎ)/(ℎ  ), where (ℎ) is the mean value of the variogram and (ℎ  ) is the value of the variogram at lag distance ℎ  .The third weight is added because point pairs with smaller lag distances often better reflect the degree of variability of regionalized variables.To increase the contribution of data point pairs with small lag distances, the proposed method adds weight coefficient  3 = ℎ/ℎ  , where ℎ is the mean value of the lag distance and ℎ  is the corresponding lag distance.
The final weight coefficient is the product  =  1 ⋅  2 ⋅  3 .Then,   can be computed by Hence, a new fitness function is obtained, expressed as follows: For the fitness function defined by (14), the PSO algorithm is used to fit the weighted variogram.In the particle swarm, the position of the ith particle can be expressed as   = ( 1 ,  2 , . . .,   ),  = 1, . . ., , where  is the dimension of the solution space and  is the number of particles.The previous most optimal position of the th particle is denoted as  best = ( 1 ,  2 , . . .,   ) and the optimal position of the swarm is denoted as  best = ( 1 ,  2 , . . .,   ).Each particle has a moving speed, and the moving speed of the th particle is At each iteration, the particle velocity and position changes are updated by the following equation: where  is the number of iterations and C 1 and C 2 are learning factors (or acceleration coefficients) that determine the learning ability of each iteration of the algorithm.

Radio Environment Map Field Strength Estimation Algorithm.
In this paper, an improved Kriging estimation algorithm for radio environment map parameters is proposed using the new variogram in (6) and the modified theoretical variogram model in (9) and (11).The algorithm includes the following main steps: (i) calculating the value of the variogram by sampling data; (ii) fitting the theoretical variogram The PSO algorithm is used to fit the theoretical variogram models  itm and  itmf using the theoretical variogram models in Equations ( 9) and ( 12), (1) Initialization: the position and velocity of a particle in -dimensional problem space is randomly generated.
(2) Evaluation of particles: the fitness value of each particle is calculated using Equation ( 14).
(3) Updating  best : the particle fitness values are compared with the population optimal value  best , and if the current value is better than  best , the position of  best is set to the current particle position.(4) Updating the particle: the velocities and positions of all particles are updated using Equations ( 15).
Step 3 (1)  and   are calculated using the theoretical variogram model.
(3) The estimated value is calculated by Equation ( 2 In the algorithm, lag is the basic lag distance, lag max is the maximum multiple of the lag distance, and matrix vectors , , and   are expressed, respectively, as follows: where   is the value of the variogram between sampling points   and   and  is the Lagrange constant.

Objective Evaluation Indexes.
Five kinds of objective evaluation indexes are used to compare and analyze the estimation results of the various algorithms for the parameters of radio environment map, which are the maximum error (MAX ERR), the average error (AVE ERR), the average estimation error percentage (PAEE), the relative mean square error (RMSE), and the root mean square error (RMSPE).
(1) PAEE where z is the mean value of all the sampling points,    is the predicted value at the position , and   is the sample value at the position .
(2) RMSE where  2 is the variance of all sample data.(3) RMSPE 4.2.Experimental Data.Two sets of data are used to validate all these methods.One is real measured level data of FM radio FM 99.8 MHz and the max level data of bands 87-108 MHz and 1800-1900 MHz.The measuring terminal is a vehicle radio monitoring receiver, and the measurement location is located in Chengdu.The distribution of the sampling points was shown in Figure 1, and the specific parameters for real measured data were shown in Table 1.Another is radio signal simulation data which using the free space propagation model and the shadow model is a log normal model.The specific parameters for simulation data were shown in Table 2.
The real data set contains a total of 256 sampling points, and the simulation data set contains 1024 level data with range of 100 square kilometers.In order to compare and analyze the estimation results of different algorithms at different sampling granularities, we used 1/2 and 1/4 of the total data as the training data and the remaining data as the validation data.That is, when we used 1/4 data as the training data, the remaining 3/4 data was the validation test data.

Analysis of Experiment Results
(1) 99.8 MHz Real Sampling Data.This is the comparative testing on the actual acquisition level data of frequency 99.8 MHz.One-half of the data (128 sampling points in total) is used for training the model, and the remaining 1/2 of the data is used for testing and verification.The 5 evaluation results of the five kinds of algorithms are shown in Table 3.A quarter of the data (64 samples in total) is used to train the model, and the remaining 3/4 of the data is used for test validation.The results are shown in Table 4.
In order to reflect the estimation results of various algorithms directly, all the five kinds of algorithms use the same 1/4 training data, the results of which are compared with the same measured data.The comparisons of various algorithms were shown in Figure 2.
In the 1/2 data for training, there are 128 sampling points in the range of about 784 square kilometers.A sampling point covers an average of about 6 square kilometers.As could be seen from Table 3, the results of the prediction of our two methods and the Kriging method are relatively close, and the worst method is the IDW.For Spline and IDW, the former is significantly better than the latter on all indexes except the maximum error.The  itmf method of our methods has achieved the best results on all the evaluation indexes.Compared with other methods, our methods have obvious advantages both in prediction accuracy and in prediction stability.The best average estimation error of our algorithms was 3.561 Db.According to the research results of [6], it shows that our methods are very competitive.While in the 1/4 data for training, a sampling point covers about 12 square kilometers.From the results in Table 4, the methods based on Kriging system have better prediction and estimation effects.In particular, the prediction errors of Kriging based methods are all about 4 Db, indicating the effectiveness of these methods.Compared with the 1/2 training data, the increase of maximum error is more obvious.The second experiment showed that  itmf still has the best prediction results in all algorithms.As could be seen from Figure 2, although the IDW had obvious smoothing effect reflecting the overall trend of the data distribution, the prediction accuracy is the lowest and the estimation errors are very large in the ranges of 40-60 and 160-180.The prediction of Spline has two obvious outlier points, which indicated that the algorithm is sensitive to noise data.Compared with Kriging based methods, our methods yield significant improvements in the range of 140-180.(2) 87-108 MHz Maximum Level of Real Sampling Data.This is a comparative testing on the actual acquisition max level data of band 87-108 MHz.One-half of the data (128 sampling points in total) was used for training the model, and the remaining was used for testing and verification.The 5 evaluation results of the five kinds of algorithms are shown in Table 5.

Kriging results
Measurement results

itm results
Measurement results A quarter of the data (64 samples in total) is used to train the model, and the remaining 3/4 of the data is used for test validation.The results are shown in Table 6.

itmf results
In order to reflect the estimation results of various algorithms directly, all the five kinds of algorithms use the same 1/4 training data, the results of which are compared with the same measured data.The comparisons of various algorithms are shown in Figure 3.
The maximum signal strength in the frequency band is an important parameter of the radio environment map.In the large scale space, many signal sources constitute the spatial distribution of the maximum signal strength, so it is difficult to make estimation by using the radio propagation model.The spatial interpolation method is more advantageous.The experimental results of 1/2 training data are shown in Table 5.As could be seen from the table, all the methods have got good results.Surprisingly, compared to the radiation estimation of single source in experiments (1), the estimation of the maximum signal strength had a better accuracy, which was about 10 Db.Furthermore, for the average prediction error, the results of our methods are close to 2.6 Db.Similarly, for all the evaluation indexes, the proposed method  itmf achieves the best results, and our method increases about 10% compared to the Kriging.For the estimation of the maximum signal strength in frequency band, the prediction accuracy is not significantly reduced with the training data reduction (seen from Table 6).For the maximum error, the  itm algorithm gets the best result, and the  itmf algorithm achieves the best result on the rest of indexes.The Spline and IDW have a large maximum error.As could be seen from Figure 3, the Spline exhibits a significant error, which is consistent with the previous conclusion that the method has a weak ability to overcome noise.The proposed methods have the best prediction and estimation effect.
(3) 1800-1900 MHz Maximum Level of Real Sampling Data.This is a comparative testing on the actual acquisition max level data of band 1800-1900 MHz.One-half of the data (128 sampling points in total) is used for training the model, and the remaining is used for testing and verification.The 5 evaluation results of the five kinds of algorithms are shown in Table 7.
A quarter of the data (64 samples in total) is used to train the model, and the remaining 3/4 of the data is used for test validation.The results are shown in Table 8.Same as the above, all the five kinds of algorithms use the same 1/4 training data, and the remaining data as the same measured data.The comparisons of various algorithms are shown in Figure 4.
The main services in the 1800-1900 MHz band are mobile communications, of which band has more complex electromagnetic environment, and the estimation of the parameters  8).Among them, the predication performance of the Kriging based methods is reduced by nearly 20%.This is because the power of the mobile communication station is small, and the transmission distance is close, which causes the correlation between the sparse sampling points to be weak.However, in this case, our approach is significantly better than the traditional Kriging based approach, with all the indicators increasing by nearly 15%.It is worth mentioning that the IDW algorithm also obtains good results, indicating that the algorithm has good stability.As could be seen from Figure 4, the data set is difficult to predict, and in some individual positions, all methods have large errors.But the proposed methods have the best prediction and estimation effect, especially in the vicinity of the test point numbered 140, and the proposed methods are significantly better than the results of the Kriging method.
(4) 101.7 MHz Simulation Data.This was a comparative testing on the actual acquisition level data of frequency 101.7 MHz.One-half of the data (512 sampling points in total) is used for training the model and the remaining is used for testing and verification.The 5 evaluation results of the five kinds of algorithm are shown in Table 9.
A quarter of the data (256 samples in total) is used to train the model, and the remaining (768 testing points) is used for testing and validation.The results are shown in Table 10.In order to reflect the estimation results of various algorithms directly, all the five kinds of algorithms use the same 1/4 training data, the results of which are compared with the same measured data.The comparisons of various algorithms are shown in Figure 5.

Measurement
We use 1/2 the simulation data for training model in our experiment, and each sampling point covers about 0.5 square kilometers.As can be seen from Table 9, the IDW has a very high maximum prediction error, and the Kriging obtains the best maximum error evaluation, while for the rest of the evaluation indexes, the  itmf is about 10% higher than the Kriging and about 20% higher than Spline and IDW.As can be seen from Table 10, the Kriging yields the best maximum error valuation, while, for the rest of the evaluation indexes, the  itmf method obtains the best results.In 1/4 training data situation, the predictive performance of Spline declines more severely.As can be seen from Figure 5, IDW has a significant prediction error, while methods based on Kriging system have better prediction accuracy.In all compared methods, the proposed methods based on propagation model have achieved the highest accuracy.Because of the introduction of the shadow model, it is difficult to have a high prediction and estimation on the simulated data.It can be seen that the prediction accuracy of the simulation data is equivalent to that of the maximum signal strength of the frequency band.

Conclusion
In this paper, we proposed a spatial distribution prediction method for radio environment map parameters.It was shown that the IDW, Spline, and Kriging methods are the most effective methods to solve this problem, and, of these, Kriging is the best method.The main parameters of the radio environment map are the signal strength and other parameters are affected by it.In this study, based on the Kriging approach, the definition of a variogram was improved based on the loss characteristics of radio propagation.A new variogram theoretical model was proposed in combination with a radio propagation model.Based on the characteristics of data sampling and signal propagation, a new weighted fitting method for variograms was also proposed.The new method is more suitable for the actual characteristics of radio environment map parameter prediction.Moreover, the proposed model is better adapted to the spatial correlation of radio environment parameters and has better prediction accuracy.Experiments on the signal strength data of a single frequency and the maximum signal strength data of a frequency band and simulation data prove these conclusions.

Algorithm 1 :
):   * = ∑  =1   ⋅   .Radio environment map field strength estimation.curve equation using PSO; and (iii) calculating the test weight parameters using the theoretical variogram curve equation.The complete process is shown in Algorithm 1.The inputs of the algorithm are sample point coordinates   * 1 and   * 1 , sampling value   * 1 , and coordinates (  ,   ) of the point  to be estimated.The output of the algorithm is the estimated value   * .

Figure 1 :
Figure 1: Data sampling position distribution map.

Table 1 :
Parameters for real measured data.

Table 2 :
Parameters for simulation data.

Table 7 .
As could be seen from the table, all the methods have good results.The method Kriging has the best performance for MAX ERR index.For the remaining evaluation indexes, our method  itmf obtained the best results.In general, the results of the three methods of Kriging,  itm , and  itmf are relatively close, and the improved methods based on propagation model are superior to the traditional Kriging method.When the training set is reduced by half, the performance of all methods is significantly reduced (seen from Table