Selection Method for Kernel Function in Nonparametric Extrapolation Based on Multicriteria Decision-Making Technology

Selecting the most appropriate kernel function to extrapolate a load set is the paramount step in compiling load spectrum, as it affects the results of nonparametric extrapolation largely. Aiming at this issue, this paper provides a new approach in selecting kernel function for the nonparametric extrapolation. To solve the complexity and uncertainty of nonparametric extrapolation, characteristics of four kernel functions and their effects on the results are explained in the “from-to” diagram obtained by rainflow counting. Multicriteria decision-making (MCDM) is then applied to solve the selection problem of kernel function. To evaluate the dispersion degrees of the mean and amplitude of a load set accurately, their range, standard deviation, and interquartile range are selected as the evaluation criteria. The weight of each criterion, which represents the impact degree on the selection of the kernel function, is calculated separately using the eigenvector and entropy method. The comprehensive weights are obtained by applying the optimization theory and Jaynes’ maximum entropy principle. Finally, the importance of each criterion is discussed according to their calculated comprehensive weights, and the selectionmethod for kernel functions is obtained, which is illustrated by extrapolating the output torque of the power split device of hybrid electrical vehicles.


Introduction
Load extrapolation is a key issue in compiling a load spectrum, which attempts to obtain a full life load spectrum from the limited load time histories, as shown in Figure 1.Proper extrapolation can achieve an accurate estimation of the overall load, particularly, the large loads that could not be measured in a short test period.To compile a full life load spectrum, load extrapolation has been applied in various areas, such as wind turbines, tractors, and gliders.Peeringa [1] estimated the extreme load of a wind turbine using parametric extrapolation, in which two different distribution functions are selected to fit the test load.The safety factor [2], a new extrapolation method, is applied in a wind turbine, in which the subjectivity and empiricism of parametric extrapolation are avoided.Rodzewicz [3] predicted the long-term loads of a glider accurately by reintegrating tested data.To predict bridge life [4], load extrapolation is applied to estimate the test load and calculate full life efficiently.Load extrapolation methods for fatigue life prediction are also used in other large equipment, such as wheel loader [5] and mining dump trucks [6].However, the above extrapolation processes also show the inherent complexity and uncertainty of load extrapolation in load spectrum compiling.
In parametric extrapolation methods, the mean and amplitude of loads are fitted by a distribution function.After testing their correlation, the joint probability density function (PDF) of loads can be calculated.However, subjective human factors are produced when using distribution function to fit the load data.Aiming to this issue, nonparametric estimation methods [7] are applied in the extrapolation of the load spectrum to eliminate the errors caused by subjective human factors.Drebler et al. [8] proposed a nonparametric extrapolation method and estimated the PDF of nonergodic loads in vehicles by introducing the kernel function and applying the adaptive bandwidth.Johannesson and Thomas [9] proposed a rainflow intensity algorithm to extrapolate the limit rainflow matrix which is smoothed by kernel function based on Miner theory.As nonparametric estimation is a method that estimates the parameters of an unknown distribution which does not rely on assumptions about the type of distribution, it can avoid the subjectivity of parametric extrapolation.Therefore, this paper proposes that nonparametric extrapolation should be an appropriate method to compile the load spectrum for loads of hybrid electrical vehicles (HEVs) [10].However, nonparametric extrapolation results may vary greatly due to the different selection of kernel functions.Multicriteria decision-making (MCDM) technology may be a good solution to the above issue, as it has been widely used in evaluating multiple objectives [11].The multi-property decision-making and multiobjective decision-making of MCDM play important roles in the comprehensive evaluation of objectives.Xiong et al. [12] achieved a scientific evaluation of the ecological environment by applying the eigenvector method of MCDM.Zhao et al. [13] evaluated air quality after calculating the weight values of indicators using the entropy method, which is commonly used to calculate objective weight.In other areas, Cristóbal [14] used MCDM technology combined with analytic hierarchy process (AHP) to select a renewable energy project.Kannan et al. [15] applied MCDM technology to determine the best green suppliers in a green supply chain.
In view of the superiority of MCDM in rendering objective judgment, MCDM technology is used to evaluate the dispersion degree of the mean and amplitude of the load in this paper.In order to select an appropriate kernel function to extrapolate the test load, a new method for selection problem of kernel functions in nonparametric extrapolation is developed based on MCDM technology.First, the basic principle of nonparametric extrapolation is introduced, and then the characteristics and application conditions of four kernel functions are analyzed.Meanwhile, three criteria to evaluate the dispersion degree of the mean and amplitude of loads are selected.Finally, the selection method of kernel functions is obtained, and an example to illustrate the availability of the proposed method is given.

Nonparametric Extrapolation Based on Kernel Functions
2.1.Superiority of Nonparametric Extrapolation.Rainflow counting method is an algorithm that maintains the consistency between the counting results and the material stressstrain hysteresis loops.Each element in the rainflow matrix represents a stress-strain hysteresis loop.The parametric method, which is usually required to count the mean and amplitude of the load using rainflow counting method separately, will destroy the structure of the hysteresis loop.The extreme obtained by parametric method is different from the actual load largely sometime.Nonparametric extrapolation can obtain the frequency of each cycle that may appear in full life and ensure that each hysteresis loop is not broken.Therefore, nonparametric extrapolation is suitable to extrapolate HEV loads which do not show large fluctuation generally, and it will avoid the big fitting error [16].

Application of Kernel Function.
Kernel functions can be applied to the nonparametric estimation method [17].
Meanwhile, the rules-of-thumb algorithm, which is used for the bandwidth selection of kernel functions, can improve the accuracy of the nonparametric estimation results.The local likelihood method is also introduced to nonparametric estimation [18], and this approach improved nonparametric estimation greatly.The specific algorithm of nonparametric extrapolation is expressed as follows [8].Suppose that   is the function value of data point   ,  = 1, 2, . . ., , along with the measurement error   , and then the following equation is obtained: If () is a continuous smooth function, its estimation function () in neighborhood () of any point  can be expressed as where   is the cardinality of the set { :   ∈ ()}.Taking the effect of the different neighborhood   on  into account, the estimation function () can be weighted as where Then, a kernel function () is chosen.After selecting the appropriate bandwidth ℎ, the kernel function can be scaled as follows: and transferred length : Then, the normalized weighting function   (; ℎ) can be written as By inserting (7) into (3), the estimation function can be estimated as follows: Using the above method, full life load could be estimated.Common kernel functions contain Gaussian and Epanechnikov kernels.They are divided into one-dimensional and two-dimensional, and their one-dimensional expressions are: Gaussian kernel function: Epanechnikov kernel function: The density estimation principle of two-dimensional kernel function is the same as the one-dimension, and the level sets of the Gaussian will always be ellipses.For illustrative purposes, the density estimation of one-dimensional Gaussian kernel is shown in Figure 2. In the extrapolation process, two-dimensional kernel function is needed to fit the mean and amplitude of the load.
After selecting suitable bandwidth and kernel function [19,20], PDF of the load data can be estimated effectively.

Variation of the Load Data in Rainflow Matrix.
In nonparametric extrapolation, the obtained rainflow matrix can be expressed in the form of "from-to" diagram, as shown in Figure 3.The actual variation trend of the load is concluded from large numbers of calculation.The number of load cycles is zero on the main diagonal of "from-to" diagram.The variation trends of the mean and amplitude are indicated by arrows separately in Figure 4. Mean shifts are parallel to the main diagonal.Amplitude shifts parallel to the minor diagonal and increases away from the main diagonal.

Characteristics of Four Kernel Functions.
Both type selection and determination of bandwidth of kernel function affect the accuracy of the extrapolation.However, the effect of the determination of bandwidth can be reduced using the adaptive bandwidth [20].Therefore, the type selection of kernel function needs to be explored.Focused on this issue, common kernel functions, which mainly include the rangebased ellipse, mean-based ellipse, circular, and Epanechnikov kernels, are studied to reduce the extrapolation errors.Their forms are shown in Figure 5.
The application condition of the kernel function in nonparametric extrapolation is based on the characteristics of the load data.The range-based ellipse is usually applied when the load data, distributed along the minor diagonal in the "fromto" diagram, as shown in Figure 5(a), Thus, the range-based ellipse can be used in a case of a large amplitude fluctuation under a fixed mean in one loop.By contrast, the mean-based ellipse is mainly used when the data distribution is along the main diagonal in the "from-to" diagram, as shown in Figure 5(b).Epanechnikov kernel function is also a common kernel function.Due to the presence of boundary conditions, the square shape of Epanechnikov kernel function is shown in Figure 5(c).This function can be applied when a balanced distribution exists for the mean and amplitude of the load data in the rainflow diagram.Circular kernel function, in Figure 5(d), similar to Epanechnikov kernel function, can also be applied when the mean and amplitude of the load data are equally significant; that is, there is not a large difference between their extrapolation results.

Comparison of Extrapolation Results Based on Different
Kernel Functions.The results obtained by extrapolating a load set using the four kernel functions mentioned above are shown in Figure 6.Seen from Figure 6, the highest frequencies and extremes of the load after extrapolation are different.In particular, the obtained maximums of the mean and amplitude of the load are greatly different.However, the effect of maximum on fatigue life is very large.Therefore, in order to improve the extrapolation results, selecting an appropriate kernel function is of vital importance.When the circular and Epanechekov kernel functions are used to extrapolate the load data, the mean and amplitude of the load data are usually treated with equal importance.However, when the mean-based ellipse and range-based ellipse are used to extrapolate a load set, the characteristics of the load data are fully taken into account.Meanwhile, the data variation characteristics are reflected more clearly by comparing the dispersion degrees of the mean and amplitude.Therefore, the mean-based ellipse and range-based ellipse are frequently applied in nonparametric extrapolation.

Criteria Selection to Evaluate the Dispersion Degree of the
Load.Currently, a new and systematical method is needed to select kernel functions.The dispersion degree of the mean and amplitude of the load in the "from-to" diagram can be the judgment condition for selecting the mean-based ellipse or the range-based ellipse.If the dispersion degree of the mean is larger, the mean-based ellipse should be selected.Similarly, when the dispersion of the range is larger, the range-based ellipse should be selected.Hence, the kernel function can be determined by studying the dispersion degree of the mean and amplitude of the load according to its distribution characteristics in the "from-to" diagram.The dispersion degree reflects the distribution characteristics of the load data.Many indicators, such as range, variance, standard deviation, and quartile range, can be utilized to evaluate the dispersion degree.However, using only one indicator to evaluate the dispersion degree of the load data is insufficient to reduce the judgment errors.This paper proposes to use MCDM technology to evaluate the dispersion degree of the load data.The range, standard deviation, and quartile range of the mean and amplitude are selected as criteria to identify the larger dispersion degree between the mean and amplitude.The weight of each criterion is calculated by the eigenvector and entropy methods.Furthermore, the comprehensive weight values can be obtained using optimization theory and Jaynes' maximum entropy principle.After comparing the dispersion degrees of mean and amplitude, the appropriate kernel function is selected, and the flowchart of kernel function selection is shown in Figure 7.
3.1.1.Range.Range () is the difference between the maximum and minimum values of a load set.For example, the  1 and  2 are the range of two sets of loads, respectively, as shown in Figure 8.The larger the range is, the larger the variability will be and vice versa.Using the range to reflect the dispersion degree of the load is acceptable for the small sample.
3.1.2.Quartile Range.Quartile range () of the load is the difference between the upper quartile  U (P75) and the lower quartile  L (P25), for example, the  1 and  2 , as shown in Figure 8. Quartile range can be regarded as the range of middle 50% of the load data.The larger the value is, the larger the variation degree will be and vice versa.Quartile range is often preferred to the range, as it is not affected by the maximum or minimum values of a load set.deviation (), which requires calculating the differences between each load  and the mean  of the load data, is used to measure the dispersion degree of the load data, as shown in Figure 8.A large standard deviation indicates that the dispersion degree of the load is large and vice versa.Standard deviation can be calculated as follows: where   =   /  ,   = 1.Matrix A is called the judgment matrix.The weight vector w is introduced, and the following equation is derived: In the previous formula, w is the feature vector of matrix A, and  is the eigenvalue of matrix A. Then the nonzero largest eigenvalue  max is taken to the homogeneous linear equation: The eigenvector w = [ 1 ,  2 ,  3 ] of  max can be obtained from the above equation.
If the consistency of the judgment matrix A is sufficient, the obtained eigenvector can represent the weight vector.To verify the consistency of the judgment matrix A, the consistency index CI is defined as follows: where  is the number of criteria, here  = 3.The larger CI is the worse consistency the judgment matrix A has.The consistency ratio CR is defined as follows:  RI is the rank of the judgment matrix A. If the value of CR is less than 0.1 [12], the consistency of the judgment matrix A is acceptable.Otherwise, the judgment matrix A should be re-selected.Owing that to the eigenvector is not unique, the weights of the range, quartile range, and standard deviation should be normalized as follows: where

Objective Weights of Criteria.
The entropy method [23] can avoid subjectivity in choosing the judgment matrix A as the values of objects in judgment matrix are calculated by considering a variety of vehicle models, driving conditions, and driver characteristics.Then the objective weights of the above three criteria are calculated.
If the values of different objects under each criterion in the evaluation matrix are not very different, the criterion is not very important.Conversely, if the values are greatly different, the criterion is given more importance.In evaluating  objects, where each object contains three evaluation criteria, the evaluation matrix is established as follows: R = (  ) ×3 , ( = 1, 2, . . ., ;  = 1, 2, 3) , (19) where   is the value of th object under th criterion.
The ratio of the th evaluation object in the th criteria can be expressed as follows:    Entropy is calculated using the following formula: The weight of each criterion is obtained.Consider where 0 ≤  2 ≤ 1, Therefore, weight vector can be expressed as As the entropy method can eliminate subjective bias in the weight distribution problem, the objective weights of three criteria can be calculated using the previous equation.To judge the dispersion degree of the load, the comprehensive weight is necessary to synthesize the subjective weight and objective weight.

Comprehensive Weights of Different Criteria.
To obtain the comprehensive weight, two coefficients of subjective and objective weights are assigned [24], respectively.Thus, the comprehensive weights are calculated using the following formula: where  1 +  2 = 1 and  1 ,  2 ≥ 0.
To calculate the parameters in formula ( 25), a preferred coefficient, which is used to synthesize the subjective and objective weights, can be selected directly based on the optimization theory [25].Furthermore, a linear combination assigning method, which can eliminate the bias of the decision maker toward the subjective and objective weights, is proposed based on the optimization theory and Jaynes' maximum entropy principle [26].To minimize the weighted generalized distance between the objective project and ideal project, the single objective optimization function is constructed as follows: where the equilibrium coefficient of the two objects is 0 <  < 1 and  is the number of the weight coefficient, here  = 2.
The above optimization problem has a unique solution: where, The comprehensive weights of three criteria are calculated by (25).Then the dispersion degrees of the mean and amplitude of the load can be calculated through three criteria multiplied by their weights.The appropriate kernel function for extrapolation can be selected, through the comparison of dispersion degrees.

Case Study and Discussions
In prior research, the project team developed a power split device (PSD) [27] of series-parallel HEV based on the differential velocity principle of differential.The output torque of PSD in HEV is simulated in ADVISOR software.The initial conditions of ADVISOR simulation are shown in Table 1.To simulate real driving conditions and enable a wide speed range, a combined driving condition of the WVUCITY (West Virginia University City), CSHVR (City Suburban Heavy Vehicle Route), and WVUINTER (West Virginia University Interstate) is selected in accordance with the proportion of  55 : 28 : 17.The above three driving conditions are verified and accepted by many research institutions, and the combination of these conditions is adequate to simulate actual conditions.The speeds required under the above three simulation conditions are shown in Figure 9.The simulated output torque of PSD under the combined driving condition is obtained, as shown in Figure 10.
To determine the subjective weight of each criterion according to their relative importance, the relative importance of the range, quartile range and standard deviation are set that  1 / 2 = 1/2 and  1 / 3 = 1/3.The judgment matrix A is then obtained as follows: The largest eigenvalue can be solved using ( 14):  max = 3.The eigenvector is then obtained according to (14): w = [1, 2, 3]  .To determine the consistency of the matrix, CR is calculated according to (16): CR = 0 < 0.1.So the judgment matrix A is acceptable.The subjective weight vector is obtained according to (17) as follows:  To determine the objective weight of each criterion, the output torque of PSD under conditions of WVUCITY, CSHVR, and WVUINTER is simulated, respectively, in ADVISOR software.The range, quartile range and standard deviation of the mean and amplitude of each load set are shown in Table 2.The evaluation matrix R is established according to (19) The weight vector of each criterion is calculated using the entropy method according to (22) as follows: According to (25), the subjective weight and objective weight are synthesized.To optimize the weights by (26),  = 0.5 is assumed, and then the optimization results show that  1 = 0.509 and  2 = 0.491.The comprehensive weights of the range, quartile range, and standard deviation are calculated as follows: w = [0.2366,0.3377, 0.4257]  . (36) Seen from the calculation results, the standard deviation whose weight value is close to 0.5 has the greatest effect on the dispersion degree of the mean and amplitude of the load.It can be concluded that the standard deviation will affect the selection of kernel function to a large extent.Then the minimal effect criterion is the range.
The calculation results conform to the analysis of the effect degree of the three criteria.Therefore, the calculated weights of three criteria are reasonable.When the range, quartile range, and standard deviation of the mean and amplitude of the load are obtained, the dispersion degree of the mean and amplitude of the load can be calculated through three criteria multiplied by their weights.If the dispersion degree of the mean is larger, the mean-based ellipse kernel function should be selected.If the dispersion degree of the amplitude is larger, the range-based ellipse kernel function should be selected.
The load data of HEV under the combined driving conditions is extrapolated using the improved nonparametric extrapolation method.The rainflow matrix is obtained after rainflow counting, as shown in Figure 11(a).The dispersion degree of the mean, which shifts along the main diagonal, is calculated as 156.61Nm, and the dispersion degree of the amplitude is 398.69Nm.After the comparison of two above dispersion degrees, the fluctuation of the amplitude data is found to be larger.Therefore, in order to be consistent with the characteristics of the load, the range-based ellipse kernel function is used to extrapolate the load.The result of the extrapolation is shown in Figure 11(b).
The dispersion degree of the amplitude of the overall load after extrapolation is larger than the mean, which indicates that the extrapolated load characteristic is consistent with the initial characteristic before extrapolation.The selecting method of the appropriate kernel function for nonparametric extrapolation according to the load characteristics can predict long-mileage load data appropriately.As nonparametric extrapolation has inherent advantages in extrapolation and the proposed selection method of kernel functions is based on MCDM technology, the obtained extrapolation result is more reasonable.

Conclusions
To avoid subjectivity in the load extrapolation, the nonparametric extrapolation is introduced to compile the load spectrum.It is an urgent topic to explore the type selection method of kernel function in nonparametric extrapolation.In this paper, the characteristics of the rainflow matrix are analyzed, and the data distribution of the nonparametric  extrapolation is studied in the form of "from-to" diagram.The types and characteristics of four kernel functions are introduced in detail, and the effects of kernel function on the extrapolation results are studied by comparing the results, which are obtained by extrapolating the load using four kernel functions, respectively.The range, standard deviation, and quartile range are then selected as criteria to judge the dispersion of the amplitude and mean of the load.The weights of each criterion are calculated using the eigenvector and entropy methods of MCDM, and the comprehensive weight of each criterion is calculated by optimization theory and Jaynes' maximum entropy principle.The results show that the greatest affection criterion to the dispersion degree of the mean and amplitude is the standard deviation, and then following effect criteria are the quartile range and the range.The calculation results conform to the analysis of importance of three criteria, which verifies the objectivity of this method.The selection method between the two kernel functions is obtained.Through comparing the dispersion degrees of the mean and amplitude using three criteria multiplied by their weights, the kernel function selection problem concerning nonparametric extrapolation is solved.The load data of PSD in HEV under the combined driving condition is extrapolated using the improved nonparametric extrapolation method.As the kernel function selection is based on MCDM technology and nonparametric extrapolation has its inherent advantages, the results of extrapolation are more reasonable.

Figure 1 :
Figure 1: Flowchart of different load extrapolation methods.

Figure 4 :
Figure 4: Shift trends of the mean and amplitude.

3. 1 . 3 .
Standard Deviation.The range and quartile range do not take all load data into account.Hence, the standard

Figure 5 :
Figure 5: Forms of four kernel functions.

Figure 9 :
Figure 9: Speed variation curves under different driving conditions.

Figure 10 :
Figure 10: Simulated output torque of PSD under the combined driving condition.

Table 1 :
Initial conditions of ADVISOR simulation.

Table 2 :
Rainflow matrix characteristics of three driving conditions.
as follows: