Clustering Optimization Algorithm for Data Mining Based on Artificial Intelligence Neural Network

Social production and life have become increasingly prominent. Cluster analysis is the basis for further processing of the data.*e concept of data mining and the application of neural networks in data mining are introduced. According to the related technology of data mining, this article introduces in detail the two-layer perceptron, backpropagation (BP) neural network, RBF radial basis function network for processing classification problems, and self-organizing map (SOM) self-organizing neural network for unsupervised clustering problems. According to the characteristics of self-adaptive and self-organizing capabilities of these algorithms, we learn and design and implement data mining clustering optimization algorithms. In this paper, the neural network-based data mining process consists of three stages: data preparation, rule extraction, and rule evaluation. *is paper studies the teaching-type and decomposition-type rule extraction algorithms. After analyzing the BP decomposition-type algorithm, the correlation method is used to calculate the correlation of the input and output neurons. After sorting by the degree of correlation, the RBF neural network is used for node selection. *is can greatly reduce the number of input nodes of the neural network, simplify the network structure, reduce the number of recursive splits of the subnet, and improve calculation efficiency. Taking the model as an example, the training error is calculated through data mining technology and clustering algorithm. Data mining clustering optimization algorithm mainly improves the popular neural network from two aspects: finer model design and model pruning, and simulates model complexity, computational complexity, and errors through simulation experiments.*e rate is measured, and finally, the simulation experiment is performed. *e results show that the proposed algorithm for differential distributed data mining has higher accuracy and stronger convergence ability and overcomes the shortcomings and shortcomings of several original genetic algorithm optimization neural network data mining models; it can effectively improve the searchability and search accuracy of the algorithm and improve the efficiency of data mining. Accuracy and accuracy have a wide range of applications.


Introduction
e amount of data on the Internet is exploding, and the impact of data on many areas of social production and life is becoming more and more prominent, using traditional data analysis methods. Based on this, data mining techniques and clustering optimization algorithms are generated. Firstly, we determine the mining task and then select the corresponding mining algorithm to implement the data mining operation. e mining process is a process of human-computer interaction and repeated many times. It mainly includes defining problems, establishing data mining libraries, analyzing data, preparing data, establishing models, evaluating models, and implementing them. e whole process of data mining is inseparable from professional knowledge in the application field, database, data warehouse, or other information repositories.
Its research goal is to divide the collection of limited data objects in a database or data warehouse into a group of clusters. eoretical analysis shows that the data mining clustering algorithm is very suitable for using neural computing. A cluster composed of clusters is a collection of a set of data objects, which are similar to objects in the same cluster and different from objects in other clusters. e analysis results can not only reveal the internal connections and differences between the data but also provide an important basis for further data analysis and knowledge discovery. After the global analysis of the similarity between data objects, the similarity will be high. Data objects are grouped together in the same class while data objects with low similarity are grouped into different classes. Commonly used techniques include probability analysis and correlation analysis. e learning and training of sets result in the required patterns or parameters. Since various methods have their own functional characteristics and applicable fields, the choice of data mining technology will affect the quality and effect of the final results. In the actual application process, multiple technologies are usually combined to form complementary advantages.

Neural Network.
It is an algorithmic mathematical model that imitates the behavioral characteristics of animal neural networks and performs distributed and parallel information processing. is kind of network relies on the complexity of the system and achieves the purpose of processing information by adjusting the interconnection between large numbers of internal nodes. A machine learning method learns from samples. e purpose of neural characteristics (expressed in the form of conditional distribution P(t|x)) of the laws produces samples. It is mainly composed of three basic components: (1) Random sample generator, which is used to extract random sample x independently from a fixed but unknown distribution P(x). Generally, sample x is a multidimensional vector, and distribution P(x) is a multidimensional random distribution. (2) System internal mapping: according to the mapping, input a random vector x and return output y with a certain probability. Input vector x and output vector y obey a fixed but unknown conditional distribution P(y|x). e definition of this mapping considers the influence of noise, and it is actually a kind of random mapping. (3) Learning machine (or algorithm), which can realize a certain kind of function f(x, a) to approximate the internal mapping of the system.
A common BP neural network model is usually composed of an input layer, hidden layer, and output layer, as shown in Figure 1.
Each layer contains an unequal number of neuron nodes, where W ij represents the connection weight between neurons. Each node neuron has multiple inputs and one output, which can be expressed as where T i represents the output value of node i on a layer.
Input a training sample P from the input layer. rough system analysis, the single sample training error of the BP neural network can be defined as In the formula, the expected output value is 2 Wireless Communications and Mobile Computing With the deepening of research, more and more researchers tend to use evolutionary programming to study evolutionary neural networks and think that it is more appropriate to use evolutionary programming to study evolutionary neural networks. erefore, the combination of evolutionary programming and the neural network model is a more effective model. e combination of evolutionary programming and neural network model can better imitate and evolve learning behavior. Based on this analysis, a typical EPNet evolutionary neural network model has been proposed, which has strong representativeness and pertinence. As a relatively mature neural network evolution system, EPNet has the following characteristics: first, EPNet emphasizes the plan of ANN behavior and uses some technologies, such as local training after each structural variation and node splitting, to maintain the behavior relationship between the parent and its offspring, while some previous EP systems have little emphasis on this behavior relationship. e usual way of structural variation is to randomly add or delete a hidden layer neuron or connection. Obviously, this method tends to destroy the behavior that the parents have learned and weaken the behavior relationship between the parents and the children. Secondly, EPNet uses different mutation operations according to a certain priority level and gives higher priority to mutation operations that can generate a simplified network structure. Take structural variation as an example. Before adding nodes, always try to delete nodes or connections first. If the deletion can increase the fitness of individuals, the subsequent mutation operation will not be used. Compared with the existing methods, which limit the network scale by adding the penalty term of network complexity in the fitness function, this method can avoid the laborious attempt to find the parameter of the penalty term. Finally, in order to eliminate the influence of permutation problems, the EP algorithm without a crossover operator is used in the EPNet system. e basic flow of the EPNet model is shown in Figure 2.

Data Mining.
In order to realize the differential distribution, calculate the global kernel function and the hybrid kernel function, adopt the hybrid particle swarm optimization method, use the differential distributed data of limited samples for training, and pass a nonlinear mapping:

Wireless Communications and Mobile Computing
Each particle in the particle swarm represents a possible solution to a problem. e intelligence of problem-solving is realized through the simple behavior of individual particles and the information interaction within the swarm. Due to its simple operation and fast convergence speed, PSO has been widely used in many fields such as function optimization, image processing, and geodesy. e nonlinear time series of differential distributed data is projected to high-dimensional space f by the global call method of inertia weight, assuming that the training sample set of differential distributed data, x i ∈ R n , is the input vector for mining control of differentiated distributed data, and y i ∈ R n is the target value of particle swarm optimization.
us, the total standard value of load balance of differential distributed data output in big data information base is obtained as follows: , , Among them, ζ (n) is the modulation error of data mining, Φk is the data fusion degree, and ωk is the characteristic scale of distributed data. e data mining clustering in this study is shown in Figure 3. e main data mining algorithms are as follows. (1) Association analysis: in nature, there are many related relationships among events, some of which are often known, and some of which are not easy to be found. For example, in the shopping basket, bread and milk are well-known collocations. When they are put together to promote sales, they can promote each other's sales. Without data analysis, people do not know the relationship between beer and diapers. Putting them together can also promote sales. In short, association analysis is the mining of association rules. For example, website designers can discover the relationship between visitors' habits and website pages according to visitors' logs; e-commerce can analyze customers' preferences according to customers' browsing records and stay time, so as to make targeted recommendations. Association analysis is the basis of other data mining research and has achieved good results in practical application. (2) Sequential pattern mining: the core of sequential pattern association analysis is to find out the pre-and postrelevance of the development of things, so as to dig out the laws with certain causal properties. Association analysis generally only considers simple association relations, while sequential pattern mining should consider time, space, and other factors. For example, after buying a new mobile phone, we will generally consider buying accessories such as film, which is the sequence relationship. Generally, we will not buy accessories before buying a mobile phone! is kind of sequence relation is very obvious, which is a typical sequence relation. e main algorithms are the Apriori algorithm and pattern growth framework. (3) Classification algorithm: the classification algorithm is a very important mining algorithm. Its main idea is to establish a classification model, then input data, and then use the classification model to predict its category. Classification mining is usually represented by a predicate. e typical applications of classification algorithms include credit rating, curative effect diagnosis, and customer rating. e main algorithms are k-nearest neighbor classification, decision tree classification, Bayesian classification, and so on. (4) Clustering algorithm: the clustering algorithm is also called group analysis. In classification, the class of samples is predictable. ere are much other text mining, web mining, and so on. e design of each module of data mining is shown in Figure 4.
Data mining process is as follows: (1) Data preprocessing: generally, the data is incomplete and polluted, and sometimes, there are inconsistencies in the data, such as code or name differences. e quality of data will determine the quality of mining. Generally speaking, low-quality data will produce poor quality data mining results. erefore, before data mining, data preprocessing is generally needed to improve the quality of mining data.
(2) Mining process: after data preprocessing, select the appropriate data mining algorithm according to the mining purpose and task. ere are also many data mining algorithms. Different mining algorithms have different characteristics, and the scope of application is different. e mining results are different. No one mining algorithm is suitable for all types of mining, and different mining algorithms will produce different mining results. (3) Pattern evaluation: generally, the mining algorithm should remove useless output, provide the mining results to users in a way that users are interested in and easy to understand, and store the mining results effectively. at is, by setting a reasonable threshold of user interest to select the mode of user interest, it can effectively prevent the useful mode from drowning in the mode of many users not interested.

Wireless Communications and Mobile Computing
Finally, data mining will output knowledge. Generally, this knowledge cannot be found by our intuition. Some knowledge even goes against our intuition, which is unexpected. But the more such knowledge, the more valuable it may be. e detailed process of data mining is shown in Figure 5.

Cluster Optimization.
We call the process of segmentation a clustering process and the method of segmentation a clustering algorithm. In clustering analysis, data are divided according to certain rules. e result of its function is to divide the data into classes so that the similarity between classes is small and the similarity within classes is large. At present, although there are many kinds of clustering algorithms, they all have their own characteristics and applicability. Taking the definition forms of five basic clustering  (1) Partition method: this method can find the spherical mutually exclusive clusters, and the center of clusters is represented by mean value or center point. is algorithm is suitable for those clustering problems with a fixed number of clusters and a small data set. Among these methods, K-means and K-median are the most classical ones. (2) Hierarchical method: this method is based on the idea of hierarchical decomposition clustering. e disadvantage of this method is that it cannot be corrected, such as the error of merging or splitting, and this kind of method can carry out multilevel clustering at different granularity. at is to say, if you want to process very complex data, you have to know how to summarize and count these data in a systematic and purposeful way.
(3) Density-based method: the cluster density described in this clustering method refers to the minimum number of samples in a single sample space. is algorithm can find clusters with different regular shapes without forcing the shape of clustering. It is suitable for clustering with an irregular number and random shape and has the advantage of reducing or even eliminating noise. (4) Which can analyze the validity of the data model, such as data fitting? It is suitable for data distribution that has been classified. (5) Grid-based method: the algorithm clusters the quantitative grid space, speed, and a strong computing advantage.
e boundary between different types of clustering algorithms is usually not very clear. Taking mean shift algorithm as an example, its basic idea is to move sample points from areas with low density to areas with high density: from the perspective of density estimation and density gradient estimation, it can be regarded as a density clustering algorithm; however, some K-means algorithms can be regarded as mean shift using special kernel function At the same time, the maximum entropy clustering algorithm based on physical model can also be regarded as mean shift algorithm with a special kernel function. Five clustering methods are introduced as follows.

Segmentation and
Clustering. K-means and K-means are two typical segmentation and clustering methods, which usually need the number of clusters input by users. rough continuous iterative optimization, the minimum distance within the cluster is the maximum distance between the clusters. Randomly select k clustering centers: μ1, μ2, and μk; repeat the following steps to convergence; for each sample, calculate its class: e K-means algorithm is more efficient for small-scale data sets, while the K-Medoids algorithm has better performance for large-scale data, but it has poor scalability. In addition, the K-means algorithm has two main defects: second, the clustering results are affected by the initial cluster center. ere are many researches on the improvement of K-means in the scientific community. K-means + + is a representative one. And the difference lies in the selection strategy of the clustering center: the clustering center of K-means is located at the average of coordinates of all points in the cluster, while the clustering center of K-means must be a sample point in the cluster, and the cluster distance of all points in the cluster with it as the center is the smallest. Compared with K-means, the complexity of the segmentation clustering algorithm is low, and it is suitable to deal with largescale data. CLARANS algorithm is a representative algorithm. It makes large-scale data clustering have high efficiency and good scalability through random search strategy. e segmentation clustering algorithm is usually easy to parallelize, and it is active in big data processing platforms in recent years.

Hierarchical Clustering Algorithm.
According to the similarity between data points, the hierarchical clustering algorithm decomposes hierarchically and creates a nested clustering tree with a hierarchical structure. e bottom-up hierarchical decomposition corresponds to the agglomerative method and the top-down hierarchical decomposition corresponds to the splitting method. e basic flow of aggregation method and splitting method is shown in Figure 6. For example, in the first step, {a, b, c, d, e} is a cluster, in the first step, B and C elements which are both quadrilateral are clustered into a cluster, in the second step, D and e elements which are both circular are clustered into a cluster, in the third step, polygon elements are clustered into a cluster, and in the fourth step, all elements are clustered into a cluster. According to the different distance measurement methods between clusters, SL hierarchical clustering has a single linkage), CL hierarchical clustering has complete linkage), and Al hierarchical clustering has an average linkage). e typical hierarchical clustering algorithms are birch, cure, and chameleon. In recent years, some improved algorithms improve their efficiency and robustness. e earliest idea of clustering based on density may come from the DBSCAN algorithm: the algorithm divides the regions with sufficient density into clusters and finds clusters of arbitrary shapes in the noisy spatial database. It defines the clusters as the largest collection of points with connected density, according to the local density of sample points; it can be divided into core points. DBSCAN is very sensitive to parameters, and slight changes of parameters may lead to abrupt changes in clustering results, which seems to be caused by the sensitivity of core points and boundary points to local density thresholds. Nonspherical clusters, GDBSCAN, ENDBS-CAN, options, cancel, clusters, and other density clustering algorithms can identify clusters of any shape, besides DBSCAN like clustering algorithm, mean shift clustering algorithm, and density peak.

Grid Clustering.
e grid clustering method divides the space into several grids and analyzes the data on the grid. e complexity of the clustering process is usually related to the number of grids and the number of sample points, so it is more efficient in processing some data. Common grid clustering methods include sting, wave cluster, clique, optigrid, and enclus. Sting algorithm is a multiresolution clustering method, the data space is divided into several rectangular cells, and each high-level rectangular cell is nested with many fine-grained rectangular cells. Generally speaking, the grid clustering method should consider how to divide the cells, how to choose the appropriate cell size, and how to store the updated cell information. If the grid cells are not refined enough, the accuracy will be lost, and if the grid cells are too refined, the computation cost will be increased. e scalability of the grid clustering method depends on the strategy of storing and updating grid cells to a great extent. Because the boundary of grid cells is horizontal or vertical, the grid clustering method can only find grid-like clusters, not inclined boundaries.

Model Clustering.
e model clustering algorithm assumes that the data is mixed according to a specific probability distribution, which is dedicated to finding the best fit between the data and the given model: statistical learning/ machine learning methods (such as cobweb) and artificial neural network methods (such as SOM). Taking the SOM algorithm as an example, it is a neural network composed of the input layer and competition layer, as shown in Figure 7.
In the process of clustering, each node first initializes its own parameters randomly, then matches the best node for each input data, then updates the adjacent nodes according to the activated nodes, and finally updates the node parameters according to the gradient descent method, and then repeats the iterative updating until convergence. In the process of the development of model clustering, many researchers put forward the improved algorithm, which is applied to text data.

Subjects.
ree classic data sets Iris, wine, and zoo are for experimental verification. e accuracy and convergence are analyzed and verified. e characteristics of the test data set are described in Table 1.

Agglomerative Method
The Zeroth Step The First Step The Second Step The Third Step The The Fourth Step The Third Step The Second Step The First Step The Zeroth Step Splitting Method

Experimental Setup.
At present, most clustering effect analysis often uses F-measure, which includes recall and precision. Recall and precision, respectively, examine the completeness and accuracy of experimental analysis. e definitions are as follows.
where i is the known category; see the following formula: e commonly used measurement method of cluster analysis is the weighted average value of category i: e average value is the final F-measure measurement value, as shown in Table 2.
It can be seen from Table 2 that the ADPSO-k-mean algorithm has higher accuracy than the traditional k-mean algorithm and the PSO-k-mean algorithm and has relatively large optimization effect; especially in the Iris data set, the experimental accuracy has increased by 19.5% and 7%, respectively, with the most obvious effect.

Test of Each Fitness Value When Each Algorithm Converges
Stably. Also, Iris, wine, and zoo are used to test the stability of the algorithm. e fitness values (f min , f max , and f ave ) of each algorithm are recorded when the algorithm converges stably, and the f (x) on three kinds of data sets are, respectively, in the following formula: From 103, 105, and 102, ζ is the constant d (X i , C j ), which represents the Euclidean metric distance from sample X i to the corresponding cluster center C j . Tested many times, the average value of all similar test data is taken as the final value (for example, by all tests, it is taken as the maximum fitness value). e test records are shown in Table 3.
According to the analysis of test results in Table 3: on the whole, the improved ADPSO-IKM algorithm has a relatively small f (x) fluctuation range in these three types of data sets. e PSO-k algorithm and ADPSO-IKM algorithm improve 9.95%, 12.44%, and 20.85%, respectively, in three data sets. And the improvement of the center K-means algorithm itself ensures the effective search and better convergence performance of the algorithm. In order to further illustrate the convergence of ADPSO-IKM algorithms and the fitness value of each algorithm with the increase of iterations, the convergence graphs on three datasets are drawn. e format design of the training sample of the digit recognizer is shown in Table 4.
After training the network, we test the function approximation ability of the network with the test sample set of an unknown result. e output of the test sample set in the network model is shown in Figure 8.
It can be seen from the final test results that the network model can reach an accuracy rate of approximately 78.5%. e drug is used as a classification attribute, and there are 5 categories in total; others are used as input attributes. All nominal attributes must be processed numerically; for example, BP is converted to 0: high; 1: low; and 2: normal. We design an RBF network so that it can correctly reflect the drug classification of the sample data after training. e training sample data is shown in Table 5.
We take the drug as the classification attribute, a total of 5 categories, and others as input attributes. All nominal attributes must be processed numerically; for example, BP is converted to 0: high, 1: low, and 2: normal. We design an RBF network so that it can correctly reflect the drug classification of the sample data after training. e training sample data is shown in Table 6. e attributes of these animals are mapped to the twodimensional output plane of SOM, and the self-organizing  clustering process is used to test the rule that the attributes of samples between adjacent clusters are similar. ere are 13 kinds of animals in the training set, and each animal is represented by a 9-dimensional vector. e training samples are shown in Table 7. After 10,000 times of network training, the SOM network maps the pattern features of the high-dimensional space input data to the two-dimensional output plane in an orderly manner. e training results show that 15 neurons arranged in a 5 * 3 rectangular (Gridtop) structure finally form 9 effective clusters. e clustering results are shown in Table 8.
e learning error change of the BP network combined with the genetic algorithm for the approximation of the sine function is shown in Figure 9. Because the genetic algorithm has a global search property, the optimized initial weight generated iteratively is compared with the real solution space in the direction of the optimal solution is closer; this makes the error of the BP network training process based on the gradient descent method tend to drop quickly at the beginning. e change of the learning error of the pure BP algorithm is shown in Figure 10. e error convergence effect of BP network learning combined with the genetic algorithm is still better than the pure BP algorithm.
We reduced the network training times by 2 K/time, and the training results are shown in Table 9.
e attribute field includes the content of certain chemical substances in the DFM sample, the average daily alcohol consumption, etc. Take 180 of the data set samples as the training sample set and 50 as the test sample set. Part of the data is shown in Table 10.
Display the clustering results in the form of statistical histograms: this is mainly done by using the open-source chart drawing toolkit JfreeChart. e visual display of its clustering statistical histogram is shown in Figure 11.
In the system's BP neural network parameter settings, set the target error to 0.001, the number of training times to be 1,000, and the learning rate to be 0.01, respectively. e "weight" method reads the initial connection weight selected before and then starts the formal network training. After system operation analysis, the output error curves are shown in Figure 12.

Convergence of Each Algorithm on Iris.
In the Iris data set, from the iterative trend line within 20 times of PSO-Kmeans algorithm, it can be seen that the ADPSO algorithm can significantly expand the overall optimization space of the algorithm between 20 and 30 times of iteration and avoid falling into the local extreme situation, and the iteration to 30 times has become stable convergence state. e convergence on Iris is shown in Figure 13.   e high latitude wind data set, from the overall trend chart of the algorithm, has the same convergence trend, and compared with the k-mean algorithm, the convergence effect is obvious. ADPSO-IKM algorithm iterates 40 times to reach a stable convergence state. e convergence on wine is shown in Figure 14.

Convergence of Each Algorithm on Zoo.
In the data set zoo, the ADPSO-IKM algorithm approximately iterates 30 times and converges to the best situation. e introduction of the IKM algorithm can obviously accelerate the clustering speed of iterations between 25 and 35 times and also can quickly find excellent solutions. e convergence on wine is shown in Figure 15.          On the whole, the algorithm iterates on the dataset and tends to be stable and convergent. Compared with other algorithms, the ADPSO-IKM algorithm has better convergence performance and can reduce the influence of initial cluster center selection on the volatility of the K-means algorithm. Moreover, the improved neighborhood fusion idea can make the algorithm extend to the effective search area. In a small iteration range, it can get a good clustering  convergence effect. According to the overall trend chart of the three data sets, the ADPSO-IKM algorithm has good optimization performance and can quickly converge to the clustering effect of a fixed value. In the ADPSO-k average algorithm and the introduction of the IKM algorithm, the overall clustering optimization effect of the algorithm has not been significantly improved, and the research in this area needs to be improved.
e new data classification result of the drug classification problem is shown in Figure 16. e prediction result based on the improved algorithm is shown in Figure 17.

Conclusions
In this era of massive data, data mining is extremely important, its application is more and more extensive, and its importance is more and more obvious. As long as an enterprise has a data warehouse or database with analysis value and demand, it carries out purposeful data mining. Data mining will mean a new wave of productivity growth and the arrival of the consumer surplus wave. In the data mining project, if a reasonable network model can not be determined in advance when dealing with some complex problems with the BP model, used for global optimal search, this algorithm is very effective for improving the accuracy and accuracy of data mining in CRM and obtaining a lot of valuable data. e traditional clustering algorithm is difficult to deal with data with multidimensional and uncorrelated characteristics. e selection of clustering methods directly determines the quality of data mining. In order to improve the quality of clustering, people continue to explore and explore better clustering analysis methods. e group intelligence, self-adaptability, and robustness are shown by the group intelligence optimization algorithm; combined with the group intelligence optimization, cluster analysis develops rapidly. e generalization ability of particle swarm learning is used to calculate the clustering center of data mining, to realize data mining optimization. e clustering algorithm of neural network data mining proposed in this paper can realize the clustering completed by the K-means method. At the same time, the improved neural network algorithm can automatically merge the clustering results with smaller granularity according to the preset warning value, thus effectively preventing the occurrence of unreasonable clustering results caused by too many specified clustering numbers. Because the artificial neural network has the characteristics of highly nonlinear to noisy data, BP neural network is more popular. Because the artificial neural network has the characteristics of highly nonlinear learning ability and fault tolerance to noisy data, the artificial neural network and BP neural network are more popular. e ability of artificial neural networks to extract rule knowledge needs to be further strengthened. e artificial neural network uses the "black box" model to process data and mine knowledge. In some data mining applications, people often expect the system to express deep-level knowledge of laws in an intuitive way similar to "if-then." erefore, we need to know how to break this black box and explicitly present the useful knowledge hidden.

Data Availability
No data were used to support this study.

Conflicts of Interest
e authors declare that they have no conflicts of interest.