An Optimized Neural Network Classification Method Based on Kernel Holistic Learning and Division

. An optimized neural network classiﬁcation method based on kernel holistic learning and division (KHLD) is presented. The proposed method is based on the learned radial basis function (RBF) kernel as the research object. The kernel proposed here can be considered a subspace region consisting of the same pattern category in the training sample space. By extending the region of the sample space of the original instances, relevant information between instances can be obtained from the subspace, and the classiﬁer’s boundary can be far from the original instances; thus, the robustness and generalization performance of the classiﬁer are enhanced. In concrete implementation, a new pattern vector is generated within each RBF kernel according to the instance optimization and screening method to characterize KHLD. Experiments on artiﬁcial datasets and several UCI benchmark datasets show the eﬀectiveness of our method.


Introduction
In the field of pattern recognition, set classification [1][2][3] is a common classification task.It is widely applied in text classification, speech recognition, image recognition, and multiple other fields.Taking the classification task based on image set as an example, each image set is composed of a class of image frames with a certain number of similar features.Due to the use of relevant information from adjacent frames, image changes can be effectively explored in the actual conditions.e main challenge is how to effectively integrate the information from all existing images to reach a reliable decision.A typical approach is to establish an optimized representation of different subsets of images and to achieve effective measurements between different subsets.
Different from the set classification methods mentioned above, almost all the current neural network [4][5][6][7] optimization algorithms and models are based on the training and classification of instances instead of learning and partitioning the subspace region containing those instances.Because the classification surface of the network classifier is essentially determined by the probability distribution of the training samples, if the size of the training sample set is too small or the dimension of the classified dataset is too high, the error in the final classification will be relatively large, which leads to the reduction in the generalization performance of the neural network classifier.
To improve this problem effectively, inspired by the idea of set classification, this paper attempts to introduce the idea of set classification into the neural network and presents an optimized neural network classification method based on kernel holistic learning and division (KHLD), which can improve the performance of the neural network classifier under a given sample set.Different from set classification, the method of KHLD is based on the effective coverage of a local region of the sample space, so the kernel proposed here can be considered a subspace region consisting of the same pattern category in the training sample space.ough it might not obtain the spatial distribution directly, relevant information between instances can be obtained from the subspace.e main reason is that the instances of the same pattern category are relatively close to each other in the spatial distribution and can be considered to have some similarity.Compared to single-pattern vector classification, KHLD considers the similarity information of the local region in the sample space.Due to the expansion of the region presented by the original pattern vector in the sample space, it can be improved to a certain extent when the size of the sample set is too small or when the dimension of the sample space is too high.On the other hand, KHLD can make the classifier's boundary farther away from that of the original sample, which can further strengthen the robustness and generalization ability of the classifier.e primary task of achieving KHLD is the establishment and representation of the kernel.In this paper, considering the local characteristics of a certain region covered by the sample space, we take the Gaussian distribution function under different parameters as the representative to establish the corresponding subspace set.Moreover, to integrate the subkernel with different parameters and the mapping effect into the original sample space, we first construct the corresponding RBF kernel by learning the original sample space to realize the local mapping of the different regions of the sample space.en, RBF kernels with different parameters are further studied and divided.us, the KHLD presented in this work has two meanings: the establishment of different RBF kernel parameters and the holistic division of the coverage region.
Typical optimization algorithms for establishing RBF kernels with different parameters include K-mean clustering [8], fuzzy clustering [9,10], orthogonal forward selection [11], evolutionary algorithm [12], particle swarm optimization [13], and other algorithms [14][15][16].It is worth noting that the above methods for optimizing the RBF kernel parameters effectively combine the holistic information of the training sample space, but the number of hidden nodes in the RBF network cannot be determined automatically, which may lead to poor adaptability for different sample sets.To automatically estimate the number of RBF kernel parameters, several sequence learning RBF network kernel parameters, including minimum resource allocation network (MRAN) [17], sequential learning algorithm for growing and pruning the RBF (GAP-RBF) [18], and other incremental design of radial basis function networks [19][20][21], can be used.However, the holistic information of the sample space is not taken into account in these methods, and the classification performance will be affected to some extent.
To generate the optimal number and parameters of the RBF kernel, in our previous work, an incremental learning algorithm for the hybrid RBF-BP network (ILRBF-BP) [22] and a hybrid structure adaptive RBF-ELM network (HSARBF-ELM) [23] are presented.In ILRBF-BP, the method of potential density clustering is presented to generate RBF kernels automatically, which utilizes the global distribution information of sample space.However, the local adaptability of each RBF kernel parameter in ILRBF-BP is not fully considered; this disadvantage is overcome by HSARBF-ELM.By combining the potential density clustering and the center-oriented heterogeneous sample repulsive force, the density information of different regions of the sample space and the neighborhood information of the region covered by the initial hidden nodes of the RBF can be used effectively.e optimal number and parameters of the RBF kernel can be generated adaptively according to the distribution of the sample space.However, when the size of the training sample set is too small or the dimension of the sample set is too high, the distribution of the sample set will be very sparse, which leads to the failure of the optimization algorithm to some extent, and the generalization performance of the neural network classifier will be reduced.To solve this problem, an optimal neural network based on KHLD is proposed.e premise of the method of KHLD is to establish the optimized kernel parameters.e geometry of these kernels is a regular hypersphere, and the optimization of the number and parameters of RBF kernels in HSARBF-ELM is just in line with this requirement.us, the RBF kernel established in HSARBF-ELM is the research object of this study.
When the number and parameters of the optimized RBF kernel are established, the subsequent task is to realize KHLD.In practice, the training of the weights of network classifiers is carried out in a single instance.When all the RBF kernels are established, according to the probability density distribution of the pattern vectors in each subkernel, we consider generating new pattern vectors within each RBF kernel, which is equivalent to extending the existing pattern vector subset in the current RBF kernel, to characterize KHLD.Intuitively, when the number of samples generated in the region covered by the kernel is sufficient, the covered region can be approximated.In this way, the KHLD is transformed for training and dividing more pattern vectors.On the basis of generating a suitable sample set size, the existing network classifier is used for training and classification; thus, the final classification surface can be modified to improve the generalization performance of the network classifier.
To achieve the effective expansion of the pattern vector in the region covered by the RBF kernel, a suitable sample probability distribution model is first needed to generate new pattern vectors.For this problem, we consider that the effective region covered by the RBF subkernel contains a certain number of original pattern vectors.In the region near the center, the probability density is relatively dense, and the probability density near the boundary is relatively sparse; thus, it can be considered that these pattern vectors similarly obey the multivariate Gaussian distribution with the current RBF kernel as the parameter.Moreover, the new pattern vectors should be constrained by the region covered by the current RBF kernel, and the initial filling of the RBF kernel can be accomplished in this way.Second, we need to measure the density of the region of the original pattern vectors in each RBF kernel.In the dense region of the sample space, the number of generated pattern instances is relatively large; conversely, in the sparse region of the sample space, the number of generated pattern instances is relatively small.When the generated instances are in the mixed region covered by different pattern classes, the probability of preserving the sample is further reduced.In this way, by combining the density and location information of the region, the optimal selection of the generated pattern instances can be completed without changing the probability density distribution of the original sample space.
According to the above methods, we take the idea of KHLD as the prototype and approximate the idea of KHLD 2 Mathematical Problems in Engineering by filling and screening the pattern vector of each kernel.On the other hand, the KHLD of each RBF kernel is converted to learning and division of more pattern vectors, which can improve the sparse sample spatial distribution caused by a sample size that is too small or sample space dimension that is too high, and the classification accuracy of the classifier can be enhanced.Note that, due to the inhomogeneity of the sample distribution inside the kernel, the approximation of the idea of KHLD by filling and screening the pattern vector of each kernel can be considered a soft partition; that is, the final classified surface can pass through the kernel to improve the overlap of different pattern subclasses effectively.us, it is more conducive to the adjustment of actual classification surface parameters.
In summary, the main contributions of this work are as follows: (1) e idea of KHLD is introduced into the neural network classifier, and its characteristics are analyzed (2) e internal sample generation and optimization screening mechanism of the RBF kernel is designed to achieve the approximation of KHLD (3) e performance of KHLD is combined with existing classification algorithms and compared with these algorithms in two artificial datasets and several benchmark datasets, and the experimental results show the superiority of the proposed method

e Establishment of KHLD.
Considering that the method of KHLD is based on the RBF kernel of HSARBF-ELM, here, we give the optimization of RBF kernels in HSARBF-ELM, which is ready for the optimization method of kernel holistic learning and division.
For the input sample x, when it passes through the RBF kernel function, its output can be expressed as where μ k and σ k are the center and width of k-th RBF kernels.
In HSARBF-ELM, by combining the methods of density clustering with a potential function and center-oriented unidirectional repulsive force, the numbers of RBF kernels and parameters can be effectively generated.
e main methods are as follows.
Given a training set S, where S � ∪ h i�1 S i , S i is the i-th pattern category set, S i � x i 1 , . . ., x i L i  , h is the number of pattern categories, and L i is the number of samples in the ith pattern analogy, for each pattern category set S i : (1) Compute the potential value of x i v according to where T is the distance weighting factor and d(x i u , x i v ) is the distance measure between x i u and x i v .
(2) Determine the sample with the maximum potential as the center of the hidden nodes of the generated RBF, and set the sample with the maximum potential to be x i p ; the corresponding expression is as follows: ( (3) Adjust the center where where F x j q denotes the heterogeneous repulsive force from x j q to μ k , x j q is a heterogeneous sample covered by the current hidden nodes of the RBF:x j q ∉ S i and ‖x j q − μ m ‖ < λ • σ, where λ is the width covering factor, α is the repulsive force control factor, and M is the iteration step.M i and M j denote the number of samples covered by the current hidden nodes of the RBF before updating.M i ′ and M j ′ denote the number of samples covered by the current RBF hidden nodes after updating.(4) e width is adjusted as follows: where β is the width constraint factor, σ min is the constrained minimum width parameter, and σ is the initial width.is adjustment ensures the relative diversity of each generated RBF hidden node, which can achieve a balance between the coverage effect and the generalization performance.
(5) Counteract each sample potential of the region covered by the current RBF hidden node and find the sample with the maximum potential to generate the next RBF hidden node Mathematical Problems in Engineering where ρ ′ (x i n ) is the updated potential value of x i n .(6) Set the iteration termination condition as follows: Else e process of learning the current pattern category is complete.Go to learn other pattern categories.EndIf According to the above steps, the number of RBF hidden nodes and the center and width, which can be denoted as , can be generated optimally.For HSRBF-ELM, once the optimized RBF hidden nodes are generated, the output g(x) � (g 1 (x), g 2 (x), . . ., g j (x), . . ., g K (x)) can be the input of the subsequent ELM network.e update of the ELM network weights is based on the existing ELM [24] learning algorithm.

Main Idea.
To explain the characteristics and advantages of the method based on KHLD, Figure 1 gives a diagram of the direct classification and comparison of KHLD and direct pattern vector classification.e method of KHLD is transformed for training and dividing more pattern vectors.
To realize KHLD, it is necessary to establish a suitable RBF kernel to complete the effective coverage of the different regions in the original sample space.en, to ensure the validity of the generated samples, the newly generated samples in the kernel should be approximately consistent with the original pattern vector distribution, and the number of newly generated samples should be proportional to the distribution density of the original sample region.In addition, when the kernels of different pattern categories overlap, it is necessary to further screen the generated pattern vector in overlapping regions.To this end, the following steps need to be completed: Step 1: the optimal coverage of the original sample space is completed by the potential function density clustering and center-oriented heterogeneous sample repulsive force; the appropriate RBF kernel parameters, including the number, center, and width of the RBF kernels, can be determined adaptively according to the distribution of the sample space Step 2: with the center and the width of each RBF kernel as constraints, a probability distribution similar to the original sample is set up to generate a new pattern vector in the effective region covered by each RBF kernel Step 3: the newly generated pattern vector is judged to determine whether it is retained or not, and finally, a new pattern vector subset is formed Step 4: w new set of samples is formed by combining the original sample with all the screened pattern vectors that are eventually retained to train the weights of the output classifier e difficulty of realizing the above steps lies in Step 3, that is, to establish the appropriate standard to measure the relationship between the newly generated pattern vector and the original sample density and to determine whether the kernels with different pattern categories overlap each other to complete the optimization screening of the newly generated pattern vector.

e Implementation.
In this section, we first give the definitions of KHLD, overlapping region samples, and nonoverlapping region samples to prepare for the description and implementation of subsequent algorithms.
( According to the definition, Figure 2 gives the schematic diagrams of the overlapping region samples and the nonoverlapping region samples, which represent the valid regions covered by two different RBF kernels.In Figure 2 where C is the overlapping region, sample 1 and sample 2 are the overlapping region samples, and the other samples are nonoverlapping region samples.
To realize the classification method based on the kernel holistic division and the selection of generated samples, it is necessary to establish each RBF kernel as the research object and randomly generate the pattern category samples within each kernel to further optimize the screening process.To this end, two factors need to be considered: (1) To facilitate the optimization of subsequent generated pattern samples, the probability distribution of the initial generated pattern samples should be approximately the same as that of the original sample.(2) In the process of sample screening, the probability of the generated sample being retained should be proportional to the density of the original sample region.It is also necessary to consider whether the sample is an overlapping region sample and, if so, further reduce the probability that the sample is retained.
For case (1), since the establishment of each RBF kernel parameter is based on the potential function density clustering, overall, the probability density of the region near the center of the original sample is relatively large, and the probability density of the region near the boundary of the original sample is relatively small, it can be considered that the probability density of pattern vectors in these kernels approximately obeys a multivariate Gaussian distribution with the current RBF kernel as the parameter, and it can be taken as the new pattern vector probability distribution model.

4
Mathematical Problems in Engineering For case (2), the key is to establish an appropriate measure to determine the density of the region where each generated sample is located and determine whether the generated sample is retained.If the generated sample is retained, it is necessary to determine whether the generated samples are in the overlapping region and further complete secondary optimization.
According to the above description, given a dataset , where L is the number of training samples, y i is the category labelsx i ∈ R l , and y i ∈ R h , let S i be the training sample set of the i-th pattern category, For each training sample category, the number and parameters of the RBF kernels are optimized by the potential function density and the repulsive force between heterogeneous samples, expressed as , where μ i k , σ i k are the center and width of the kth RBF kernel, respectively, t i k is the ithpattern category label of the RBF kernel, K i is the number of RBF kernels generated under each pattern category, and K �  h i�1 K i is the number of RBF kernels.When all the RBF kernels are built, the effective coverage of the different regions of the original sample space is completed.To achieve sample filling for each RBF kernel, it is necessary to establish a suitable sample probability distribution model to generate new pattern vectors.For the current kth RBF kernel, the probability distribution f(z) for generating arbitrary pattern vectors zobeys the Gaussian distribution with μ i k being the mean and Moreover, the newly generated pattern vectors should be in the effective region covered by the RBF kernel, which is given by According to the above method, for the kth RBF kernel in the ith pattern category, let W i k be the generated initial vector set in the kernel; here, , N k is the number of generated samples in the k-th kernel.After the initial pattern vectors are generated, they need to be optimized and screened.During the screening process, in the dense region of the sample space, the number of generated pattern instances should be relatively large; conversely, in the sparse region of the sample space, the number of generated pattern instances should be relatively small.In this way, the probability distribution of the sample space can be combined with the density of the region where the pattern vector is generated, and the validity of the resulting pattern vector can be enhanced.Let C i k be the initial sample set of the kthRBF kernel in the current ithpattern category and P k be the number of C i k .For each initial pattern vector x, when x ∈ S i and ‖x , the probability density of each new pattern vector can be estimated, which is given as where θ k is the width of the corresponding Parzen window in the k-th RBF kernel.
To achieve this metric while preserving the randomness of sample generation, we consider generating a uniformly distributed random number r between 0 and 1, which is used to for comparison with the probability density of each newly generated pattern vector.If r ≤  p(z i m ), z i m is retained; otherwise, z i m is eliminated.erefore, in the region where the original samples are relatively densely distributed, the probability that the newly generated samples will be retained is relatively high.
Due to the complexity of different sample sets, heterogeneous samples are often mixed into the generated RBF kernel.us, it is necessary to further improve the sample screening in the overlapping region.When the generated sample is in the overlapping region, two factors need to be considered: (1) e probability of the sample being retained should be reduced.(2) It is necessary to consider the sample spatial distribution density under the current pattern category and other pattern categories at the same time.According to the principle of inhibiting the probability density of heterogeneous samples, when the spatial distribution density of the sample in the current pattern category is higher than that in other pattern categories, the probability of the sample being retained is relatively large.
Combined with the above two factors, for the sample z i m generated in the kth RBF kernel, we can get m can be considered the sample in the overlapping region between the k-th and the n-th RBF kernel.
When the samples in the overlapping region are determined, it is necessary to further screen the samples.Let C j n be the initial sample set contained in the j-th pattern category by the kernel μ

􏽮
.For the sample z i m in the overlapping region intersecting the k-th and the n-th RBF kernel, the probability density estimations of the heterogeneous sample regions are expressed as According to the above method, for a randomly generated number r between 0 and 1, when  p(z i m ) ≥ λr and  p(z i m ) ≥ c q(z i m ), the sample z i m in the overlapping region is retained; otherwise, z i m is removed.Here, λ > 1 and c ≥ 1.Combined with the above description, Algorithm 1 gives the concrete implementation of the classification method based on kernel holistic learning and kernel interior sample generation.

e Computational Complexity Analysis of KHLD.
In this study, a method of the potential density clustering and the center-oriented heterogeneous sample repulsive force is used to generate optimized kernel parameters.en, a method of optimized sample filling and screening can realize the effective approximation of KHLD.Assume that the number of samples in the initial training set S is L, and the initial training set contains two pattern categories; the number of samples are L 1 and L 2 , respectively.Here, L 1 + L 2 � L. e computational complexity of the proposed method is analyzed as follows: (1) e optimal kernel parameters are generated by the combination of potential density clustering and heterogeneous sample repulsive force.In the process of quantifying the sample potential value by potential function density clustering, the label information of each category of samples is considered.e calculation of the sample potential value needs to traverse all other samples in the current pattern category.en, Gaussian kernels with different parameters are needed to cover the sample subspace to update the sample potential.e computational complexity is O((L 1 − 1) 2 + (L 2 − 1) 2 ).Set the number of kernels as K; in the process of optimizing the kernel parameters, the distance between all samples and the center should be considered; the computational complexity is O(LK).After merging, the computational complexity of this part is (2) e process of sample generation and screening will also take a certain amount of time.Let the number of samples generated in all kernels be P, where the number of samples generated in the k-th kernel is P k ; thus, P �  K k�1 P k .In the process of calculating the density measurement of the generated sample, the distance between the generated sample and the center of the current kernel should be considered; here, the computational complexity of the generated sample in the k-th kernel is O(P k ).e computational complexity of all kernel generated samples is combined, which can be expressed as O( K k�1 P k ); here, O( K k�1 P k ) � O(P).en, in the process of sample screening, we need to further consider whether the generated samples in the current kernel are overlapping region samples, which requires us to 6 Mathematical Problems in Engineering compare the distance between these samples and all other centers, and the computational complexity is O((K − 1)P k ).e computational complexity of all kernel screening samples is combined, which can be expressed as O( K k�1 (K − 1) • P k ).us, the computational complexity of sample generation and screening in all established kernels is O(P) + O ( K k�1 (K − 1) • P k ), which can be simplified as O(KP).
Combined with Steps 1 and 2, the computational complexity of the proposed KHLD is O(L 2 − 2L 1 L 2 − 2L +LK + KP).en, the generated training samples and the original training samples are combined to complete the training of the existing algorithms.

Results and Discussion
In this section, the performance of KHLD is evaluated with two artificial datasets: Double Moon (DM) [25] and Concrete Circle (CC); 8 UCI benchmark datasets [26]: Blood, Climate, Heart Disease (HD), Sonar, SPECT Heart (SH), Image Segmentation (IS), Forest, and Wilt; and 1 LIBSVM benchmark dataset [27].Figure 3 shows the graphical display of two artificial datasets.Except for the DM, CC, and IS datasets, all benchmark datasets are imbalanced datasets.In each dataset, the inputs to all the classifiers are scaled to appropriately [−1, 1]; the classification performance of each network is measured by the overall (η o ) and average (η a ) per-category classification accuracies [23].Table 1 gives the description of the classification datasets.e performance of KHLD is combined with existing classification algorithms and compared with these algorithms, including SVM [27], ELM [24], HSARBF-ELM, a constrained optimization method based on BP neural network (CO-BP) [28], and an optimized RBF network based on fractional order gradient descent with momentum (FOGDM-RBF) [29].For SVM, the simulations are implemented with LIBSVM [27].All these simulations are conducted in MATLAB R2013b running on a PC with 3.2 GHz CPU and 4G RAM.Each algorithm is conducted in 20 trials.

Artificial Datasets: DM and CC.
In this section, two artificial datasets are used to verify the graphical and intuitive characteristics of KHLD.In the phase of classification performance comparison, KHLD is combined with HSARBF-ELM and compared with HSARBF-ELM.Figures 4(a e combination of KHLD and HSARBF-ELM can fill the training sample space and effectively improve the classification performance of the method of HSARBF-ELM.
Figures 5 and 6 show the optimization effect of the kernels and the samples generated in each kernel after adjusting parameters σ and θ k , respectively, which shows that the adjustment of parameters σ and θ k has good adaptability to the sample space on DM dataset.
Use (10) to estimate the probability density  q(z i m ) belonging to the jth pattern category;

Mathematical Problems in Engineering 7
Figure 7 compares the number of generated and the classification accuracy under different initial training sets, where θ k � δ k /10.It can be seen that, under the condition of small number of training samples, when the initial kernel width is too small, the established kernel cannot effectively cover the sample space, which leads to the decline of network generalization performance; when the kernel width is large and the number of training samples is sufficient, the performance of the proposed method will also show a certain degree of decline, which shows that the method of KHLD has certain restrictions on the number of training samples and the selection of kernel width parameter.
Figure 8 shows the learning and classification comparison of the HASRBF-ELM network classifier based on the original training set and KHLD under the CC dataset.It can be seen that when the generated kernels of different categories overlap each other seriously, the proposed method can still generate new samples in different kernels and improve the classification performance of the original HASRBF-ELM network classifier, which shows the effectiveness of KHLD method for complex classification problems.
Figure 9 shows the learning effect of the proposed method on the training set as the initial kernel width varies.
By changing different kernel width parameters, the method of KHLD can optimize the selection of samples in each kernel.When the kernel width increases, the generated kernels may cover the heterogeneous samples, resulting in the increase of the overlapping of the samples of different pattern categories in the kernel.
Figure 10 further shows the number of generated training samples and the performance comparison of the classification accuracy under different initial training sets, where θ k � δ k /5.When the width parameters of the RBF kernels are in a certain range, the method of KHLD has a good classification effect.Similar to Figure 7(b), when the initial kernel width is too small, the testing accuracy of the proposed method is greatly reduced, which means that the failure of the initial RBF kernel may invalidate the method of kernel holistic learning and further deteriorate the final classification performance.us, it is necessary to avoid such a situation.is condition is also a restrictive condition for KHLD in this study.
Figures 11 and 12 show that the combination of KHLD and HSARBF-ELM increases the training time.However, the proposed method improves the network classification performance of HSARBF-ELM, especially when the number of      training samples is small.When the number of training is sufficient, the proposed method will reduce the performance of HSARBF-ELM to a certain extent, which shows that this method of KHLD is suitable for the situation with less training samples or sparse spatial distribution of samples. 2 and 3 give the comparisons of the classification performance of the proposed method and other learning algorithms under the benchmark sample datasets.It can be seen that, in highdimensional small sample datasets, the combination of KHLD and other classification algorithms increases the training time.Although the testing results of different classification algorithms on different datasets are different, the combination of KHLD and these classification algorithms can improve the testing accuracy of these algorithms to varying degrees.As an auxiliary method, KHLD is an effective method when the spatial distribution of samples is sparse.

UCI Benchmark Datasets. Tables
e effectiveness of the proposed method can be further verified.However, for the benchmark large sample datasets, the combination of KHLD and existing algorithms reduces the test performance of these algorithms, which further shows that the method of KHLD in this study is not suitable for large sample set learning and classification.

Discussion of KHLD.
In this study, under the premise of the given initial kernel width σ, according to the optimization method of RBF kernel parameters in HSARBF-ELM, the parameters K, μ k , σ k   K k�1 of KHLD are automatically generated according to the distribution of sample space, where σ k is chosen in σ − 0.2 ≤ σ k ≤ σ.When each kernel parameter is established, the main parameter affecting KHLD is θ k , which determines the number of samples generated in the kernel.θ k is chosen in the [σ k /5, σ k /10, σ k /20].us, we mainly discuss the influence of parameters of σ and θ k on KHLD. Figure 13 shows the stress test when KHLD is combined with HSARBF-ELM in the Climate high-dimensional dataset.In general, when σ and θ k are in a certain range, the combination of KHLD and HSARBF-ELM can improve the network performance of HSARBF-ELM.When σ is too low, for example, σ is set as 0.1 or 0.2, the classification performance of KHLD combined with HSARBF-ELM is poor.e main reason is that the   generated kernel cannot effectively cover the sample space, so the effectiveness of the kernel cannot be guaranteed, which leads to the performance degradation of the proposed method.When σ is too or θ k is too small, the probability of overlapping samples in the generated kernel will increase, which leads to the performance degradation of the proposed method.
From experiments on multiple datasets, the method of KHLD improves the problem of network generalization performance degradation when the sample size is too small or the sample space distribution is too sparse.
However, when the number of training samples is sufficient or the spatial distribution of training samples is dense, the network performance of the proposed method shows a certain degree of decline compared with the direct training of the classifier. is situation shows that when the constructed kernel can be effectively represented by the existing training samples, the generated samples in the kernels are equivalent to increasing the noise samples, which leads to the redundancy of network training and is not conducive to the improvement of the boundary partition surface.us, the proposed method is not suitable for classification problems with sufficient number of training samples or dense spatial distribution of samples.In the selection of parameters, the kernel width should be chosen so that it is not too small or too large.If the kernel width is too small, the validity of the e number of generated RBF kernels and the number of kernels/support vectors in each classifier.
14 Mathematical Problems in Engineering established kernel may not be guaranteed, which makes the method of KHLD ineffective to a certain extent.If the kernel width is too large, the overlapping degree between the samples generated the kernel and the heterogeneous samples increases, which also leads to the performance degradation of the proposed method.

Conclusion
An optimized neural network classifier based on KHLD is presented.e established kernels in KHLD are based on the generated RBF kernel parameters in the HSARBF-ELM algorithm.An optimized sample filling and screening method can realize the effective approximation of KHLD in different classification problems.Combining KHLD with other algorithms can effectively improve the network performance of these algorithms, especially when the sample space distribution is sparse.Experiments on artificial datasets and benchmark datasets further verify the effectiveness of our method.One of the main shortcomings of this work is the representation of kernels.In this study, for the convenience of problem description, the representation of the kernel is a regular hypersphere.e proposed method is mainly suitable for the case of sparse spatial distribution of samples but is not suitable for large sample set learning and classification.
e establishment and representation of kernel are worthy of further study.Exploring more optimized kernel representation and combining it with KHLD are our future work.Mathematical Problems in Engineering

Figure 1 :Figure 2 :
Figure 1: A schematic comparison between the kernel holistic partition and the direct pattern vector classification.(a) Directly partitioning the original sample set.(b) Density clustering of the original sample set and establishing the corresponding RBF kernels to complete the coverage of the original sample space.(c) Filling each subkernel pattern class to establish a new pattern vector to partition the whole kernel.(d) Dividing the original sample and the new filled sample into new sample sets to obtain a new classification surface.(e) Comparing the modified classification surface with the original one.

2 ,
and set the number of samples in C j n as P n .For an arbitrary pattern vector x, when x ∈ S j and ‖x − μ j n ‖ ≤ σ . . ., x j P n )-4(d) give a comparison of the learning and classification effects based on the original training set and the KHLD under the DM dataset.It can be seen that the RBF kernel generated in HSARBF-ELM can effectively cover the sample space.

Figure 4 :Figure 5 :Figure 7 :Figure 8 :
Figure 4: e learning and classification comparison of the HSARBF-ELM network classifier based on the original training set and kernel holistic learning and division in the DM dataset, where the number of original training sets is 100 and the initial kernel width is 0.1.(a) Learning the original training set to generate different RBF kernels.(b) Further learning and screening on the basis of each RBF kernel to generate new sample vectors.(c) Classification effect obtained by learning the parameters of the classifier using the original training set.(d) Classification effect obtained by merging the original sample set with the newly generated sample set and learning the classifier parameters.

Figure 8 :
Figure 8: Sample generation of the original training set under the CC dataset and the comparison of the classification results on the test set.(a) Learning the original training set to generate the subkernel.(b) Further learning and screening to generate new sample vectors.(c) Classification results obtained by using the original training set to learn the classifier parameters.(d) Classifying the original training set with the newly generated training set and then learning the classifier parameters.

Figure 12 :
Figure 12: Performance comparison of the proposed method and HSARBF-ELM in the CC dataset.(a) Number of original training samples, training time.(b) Number of original training samples, overall testing accuracy (%).

Figure 11 :
Figure 11: Performance comparison of the proposed method and HSARBF-ELM in the DM dataset.(a) Number of original training samples, training time.(b) Number of original training samples, overall testing accuracy (%).
; for i � 1: h % h is the number of pattern categories for k � 1: K i Count the number of initial samples P k belonging to the ith pattern category covered by each RBF hidden node;

Table 1 :
Descriptions of the classification datasets.

Table 2 :
Performance comparison on benchmark small sample datasets.

Table 3 :
Performance comparison on benchmark large sample datasets.