Multimedia Technology of Spatial Data Mining Based on Genetic Algorithm

In order to make key decisions more conveniently according to the massive data information obtained, a spatial data mining technology based on a genetic algorithm is proposed, which is combined with the k-means algorithm. The immune principle and adaptive genetic algorithm are introduced to optimize the traditional genetic algorithm, and the K-means, GK, and IGK algorithms are compared and analyzed. The results show that, in two different datasets, the objective functions obtained by the K-means algorithm are 94.05822 and 4.10373 (×106), respectively, while the objective functions obtained by the GK and IGK algorithms are 89.8619 and 3.9088 (×106), respectively. The difference between the three algorithms can also be reflected in the data comparison of the number of iterations. The number of iterations required for k-means to reach the optimal solution is 8.21 and 8.4, respectively, which is the most among the three algorithms, while the number of iterations required for IGK to reach the optimal solution is 5.84 and 4.9, respectively, which is the least. Although the time required for K-means is short, by comparison, the IGK algorithm we use can get the optimal solution in relatively less time.


Introduction
Spatial data mining is data mining and knowledge discovery in a spatial database. It belongs to a branch of data mining. It mainly obtains some spatial features and patterns that users are interested in, the relationship between spatial data and nonspatial data, and the characteristics of universal data hidden in the database. However, there are many differences between spatial data mining and conventional data mining, which are also different from general affairs' data mining. Spatial data mining expands many spatial scale dimensions in the spatial theory of state discovery [1]. Because spatial data itself has complex and diverse characteristics, its mining technology is very different from the conventional thing data mining technology. e main characteristics of this technology are as follows: spatial data mining technology has rich data sources, huge amount of data, complex access methods, and various data types; spatial data mining has a wide range of application fields, which are closely related to all the data about spatial location; the algorithms and methods of spatial data mining technology are unusual, and their algorithms are usually difficult and complex; there are many ways to express the knowledge of spatial data mining technology, which mainly depends on people's understanding and cognition of the technology. Genetic algorithm (GA) is a computational model that simulates the biological evolution process of Darwin's genetic selection and natural elimination. It was first proposed by Professor Holland J. of Michigan University in 1975. In the sense of simulating the evolution process, GA is a slow-moving structure, which can adopt a variety of different composition schemes.
is is mainly reflected in parameter coding and genetic operation technology, which are related [2]. At present, the main coding techniques of the genetic algorithm include onedimensional chromosome coding, multiparameter mapping coding, discretization, variable chromosome length coding, and two-dimensional chromosome coding. Genetic manipulation technology is embodied in three aspects: selection probability, crossover probability, and mutation probability [3]. e basic steps of the genetic algorithm are shown in Figure 1.

Literature Review
Data mining is to analyze the observed datasets (often huge), in order to find unknown relationships and summarize data in a novel way that data owners can understand. e value of data mining is shown in Figure 2. Saha et al. said that, from a technical point of view, data mining is a process of extracting hidden, unknown but potentially useful information and knowledge from a large number of incomplete, noisy, fuzzy, and random actual data [4]. Fallahpour et al. said that, from the perspective of business, data mining is a new business information processing technology. Its main feature is to extract, transform, analyze, and model a large number of business data from the business database so as to extract the key knowledge to assist business decision-making, that is, to find relevant business models from a database [5]. e research of Zhang et al. shows that the emergence of largescale database, advanced computer technology, the actual needs of operation and management, and the deep computing ability of these data promote the birth, rapid development, and wide application of data mining [6]. Renić et al. said that data mining is actually the result of the gradual evolution of information technology and the result of people's long-term research and development of database technology [7]. Mahbuby et al. found that, with the development and maturity of three basic technologies, massive data collection, powerful multiprocessor computer, and data mining algorithm, data mining technology began to receive extensive attention in commercial applications [8]. Balasubramanian and Rajendran said that genetic algorithm is the most important technology among many mining technologies. It can filter information from a large amount of data in the data warehouse, find possible operation modes in the market, and mine facts that people do not know [9]. Xu et al. said that databases generally have thousands of tables and attributes and millions of tuples. e gigabit level data in the database are no longer rare because the trillion level number database has been born, replacing the gigabit level database [10]. Liu    Computational Intelligence and Neuroscience massive database in high-dimensional space not only makes the search space larger but also makes it easier to find pattern errors. erefore, make full use of relevant knowledge to change the dimension, reduce the dimension, and delete redundant data so as to make the data mining algorithm more efficient [11]. Oliva et al. said that the algorithm for providing knowledge from massive spatial data should be testable and efficient [12]. Zhang et al. proposed that polynomial and exponential algorithms have no practical value, but if the algorithm is changed into a specific model with limited data to obtain appropriate parameters, the value will be considerable [13].

Coding Scheme and Population Initialization.
Firstly, how to encode is the first step in the evolutionary design of a genetic algorithm. For the clustering analysis of the K-means algorithm, the amount of data is large and multidimensional. If binary coding is adopted, the search space of data will increase sharply, which will greatly reduce the efficiency of calculation. erefore, this study adopts real coding [14]. e initialization of the K-means algorithm is the initialization of the cluster center, and we should set the appropriate cluster center. First, we encode the cluster center. Now, assuming that the coordinates of the cluster center are d-dimensional, for K clusters, the length of each chromosome is k * d and the chromosome is . . , p j d randomly selects K from n objects as the center coordinates of the initial cluster for each corresponding chromosome. We are now going to cluster huge data. eoretically, if the number of initial populations is large, there will be many problems. However, in order to diversify the data elements, we should set the population number as large as possible under the conditions allowed by various facilities [15]. In fact, the specific initialization process is to select k of the N-target classification objects as a solution to the problem and encode them into a chromosome. Repeat this for m times, select m chromosomes, obtain the initial population, and complete the population initialization [16]. e ultimate goal of clustering is that, in the obtained clustering, the smaller the distance between similar objects, the better the clustering effect. On the contrary, the greater the distance between different objects, the better the clustering effect. According to such requirements, we select the following fitness function: From the above formula, we can see that f is nonnegative, and when f increases, the criterion function E decreases, and the clustering effect is obviously improved. On the contrary, when f decreases, the criterion function E increases, and the clustering effect is obviously poor. e so-called fitness function can well reflect the quality of clustering results, so as to screen the optimal solution. After determining the fitness function, considering that, in the early stages of evolution, if the offspring of excellent individuals will account for a large proportion of the whole population, the fitness values will be relatively close in the later stages of evolution. At this time, the advantages of the offspring of excellent individuals are not highlighted, and the evolution of the whole population is almost stopped when making selection operations [17]. In view of the problems of premature convergence (premature) in the early stage and slow convergence in the late stage, we also need to process it. Now, we combine genetic algorithm with simulated annealing algorithm and stretch the fitness appropriately with simulated annealing algorithm so that the dominant individuals can be in a very obvious position in the whole evolution process, so as to ensure the smooth progress of the evolution process. e specific stretching method can be seen as follows: where f i is the fitness of the ith individual, M is the population size, g is genetic algebra, T is the temperature, and T 0 is the initial temperature.

Selection Operation.
In order to better carry out the selection operation and overcome the premature problem of genetic algorithm, we adopt a selection operation based on the immune principle to adjust the selection probability [18]. e immune optimization algorithm draws lessons from the evolutionary mechanism of the biological immune system and combines the idea of traditional genetic algorithm, which can accurately improve the performance of the algorithm to a certain extent. When introducing an immune mechanism to solve the problem, we can first take the objective function and constraint conditions as the antigen input, then generate the initial antibody population, then calculate a series of genetic operations and antibody affinity, and finally find the antibody against the antigen while maintaining the antibody diversity: where N m is the number of identical individuals in the population and p size is the size of the population. Find out the m individuals with the largest individual concentration in the population, set as 1, 2, . . . , m; then, the individual concentration probability of these m individuals: e concentration probabilities of the remaining individuals are P, and the sum of the concentration probabilities of all individuals is 1. If the fitness of an individual is f i and the probability of the individual being selected is p f i , then where i � 1, 2, 3, . . . , p size ; through the above definition of individual concentration, it is more convenient to select the Computational Intelligence and Neuroscience 3 best individual and improve the accuracy of selection. It can be seen that the greater the fitness of the individual, the greater the probability of being selected, which accelerates the convergence of the algorithm. It can also be seen that the greater the individual concentration, the smaller the probability of being selected, so as to ensure the diversity of all individuals in the evolutionary population and prevent premature convergence. e parameter crossover probability p c and mutation probability p m of genetic algorithm are very important to the effect of genetic algorithm, which will directly affect the convergence of the algorithm. Too large or too small crossover probability p c and mutation probability p m will greatly affect the performance of the algorithm.
erefore, it requires repeated experiments to determine the crossover probability p c and mutation probability p m for different optimization problems, so as to find the best crossover probability p c and mutation probability p m suitable for specific problems [19]. Kalliamvakou et al. proposed an adaptive genetic algorithm. Now, we can use it to solve the above problems [20]. In the adaptive algorithm, and it can be adjusted dynamically with the fitness. If the average fitness value of the population is lower than the fitness value, the individual corresponds to a lower crossover probability p c and mutation probability p m , and the individual is protected and enters the next generation. On the contrary, if the average fitness value of the population is higher than the fitness value of the individual, the individual corresponds to a higher crossover probability p c and mutation probability p m , and the individual will be eliminated. erefore, the crossover probability p c and mutation probability p m adjusted by the adaptive algorithm provide the best crossover probability p c and mutation probability p m relative to a solution, which not only maintains the population diversity but also ensures the convergence of the genetic algorithm [21].

Multimedia Technology.
e meaning of multimedia is to combine the TV type audio-visual information dissemination ability with the computer's interactive control function. Creating a new information processing model integrating text, pictures, sound, and image, making the computer have the functions of digital, fully dynamic, and full video broadcasting, editing and creating multimedia information, and controlling and transmitting multimedia e-mail, video conference, and other video transmission functions will make the standardization and practicability of the computer a major topic of this new technological revolution. e use and high-speed transmission of digital audio and video data have become a symbol of a country's technical level and economic strength [22]. e world is large, but it is also small, which is true. Multimedia technology can enable people to understand astronomy and geography, history and culture, high and new technology, and local customs across time and space. From the national government to the ordinary people, they are dealing with multimedia technology every day. People's living standards have improved, which can also be seen from the development of multimedia. Communication, digital audio-visual technology, network television, 3G, MP4, MP5, etc., all can reflect the progress of human civilization [23].

Results and Analysis
e crossover probability p c and mutation probability p m are adaptively transformed according to the following formula: where f max is the maximum fitness value of the population, f avg is the average fitness value of each generation of the population, f min is the minimum fitness value of the population, e is the larger fitness value of the two individuals to be crossed, and f ′ is the fitness value of the individuals to be mutated. k 1 , k 2 , k 3 , k 4 take the value of (0,1) interval. If there is no clear basis for the definition of k 1 , k 2 , k 3 , k 4 , we can set the initialization value, analyze the crossover probability p c and mutation probability p m , and compare the evolutionary algebra under the same conditions. e crossover probability p c and the mutation probability p m with fewer evolutionary algebras are better [24]. Similarly, in order to better overcome the premature problem, the original adaptive formula is now improved, and the crossover probability p c and mutation probability p m are adjusted according to the following formula: where parameter f max , f avg , f min , f ′ , andf are the same as the previous one. e value range of p c1 > p c2 > p c3 , p m1 > p m2 > p m3 (0,1) can be dynamically adjusted during evolution. e specific meaning is shown in the figures, which reflects the change of crossover probability p c and mutation probability p m with fitness shown in Figures 3 and 4 [25]. 4 Computational Intelligence and Neuroscience According to the basic idea of the algorithm described above, the basic flow of the algorithm is shown in Figure 5.
It is mainly divided into three algorithms to test, which are the classical k-means algorithm, the general genetic optimization k-means algorithm (GK), and the improved genetic k-means algorithm (IGK). e test environment is programmed and tested with MATLAB on an ordinary PC. e experimental data adopts two classic datasets of machine learning (UCI Machine Learning Repository), Iris and Glass. e above datasets are internationally recognized typical datasets for testing the performance of clustering methods [26].
e Iris dataset has 150 sample information, each sample information includes 4 attributes, a total of 3 categories, and each category has 50 samples.
ere are 214 sample information in the Glass dataset, and each sample information includes 9 attributes, a total of 6 categories.
Some parameters of the test experiment are set as follows: the number of clusters k � 3, the number of iterations t � 100, and the initial temperature T 0 � 10. Population size M � 30, the individual crossover probability corresponding to the minimum fitness value p c1 � 0.7, the individual crossover probability corresponding to the average fitness value p c2 � 0.85, the individual crossover probability corresponding to the maximum fitness value p c3 � 0.95, the individual variation probability corresponding to the minimum fitness value p m1 � 0.05, the individual variation probability corresponding to the average fitness value p m2 � 0.1, and the individual variation probability corresponding to the maximum fitness value p m3 � 0.15. e traditional K-means algorithm, the traditional genetic algorithm's k-means algorithm (GK), and the improved algorithm (IGK) in this study are used to test the two datasets, respectively. All the algorithms run 20 times, respectively, and then compare and analyze the performance of the algorithm from the aspects of criterion function value, iteration times, and running time. e specific experimental results are shown in Figure 6. For the Iris dataset, it can be seen that these three algorithms can finally get the optimal solution at 89.8619, but in the 20 times of operation, the k-means algorithm reaches the optimal solution only ten times, and its objective function falls into the local minimum for other times, while the other two algorithms can reach the global optimal solution every time in the 20 times of operation. For the Glass dataset, it can be seen that these three algorithms can also achieve the optimal solution, but the clustering objective function falls into the local minimum four times in the 20 times of the K-means algorithm. e other operations can converge to the optimal solution 3.9088 × 10 6 every time. Similarly, the other two algorithms can reach the optimal solution every time [27].
It can be seen that the unit in the graph is millions [28]. erefore, we can also conclude that GK and IGK converge faster than the k-means algorithm, and the IGK algorithm is faster than GK and k-means. From the above analysis, we can see that the convergence speed of the IGK algorithm in the K-means clustering stage is the fastest among the three algorithms. erefore, the improved genetic algorithm is more suitable for the k-means clustering problem. Let us synthesize the experimental results and compare the performance of the three algorithms from three aspects: objective function value, iteration times, and running time.    Computational Intelligence and Neuroscience Table 1 shows the average results of 20 clustering experiments of the three algorithms on two experimental datasets.

Conclusion
Firstly, we look at the objective function. In the two groups of experiments, the Iris dataset and the Glass dataset, the objective functions obtained by the K-means algorithm are 94.05822 and 4.10373 (×10 6 ), respectively, which are slightly different from the optimal solution. e objective functions obtained by the other two algorithms are 89.8619 and 3.9088 (×10 6 ), which are very accurate and reach the optimal solution. In the Glass dataset, the gap is more obvious and smaller than the optimal solution because K-means is easy to fall into a local minimum. Looking at the number of iterations, in the Iris dataset or Glass dataset, the number of iterations required for the k-means to reach the optimal solution is 8.21 and 8.4, respectively, followed by GK algorithm, which is 6.1 and 6.58, respectively, e least number of iterations is our algorithm (IGK), which is 5.84 and 4.9, respectively. erefore, the genetically optimized K-means can not only ensure the optimal solution but also make the K-means converge quickly. Finally, comparing the running times of each algorithm, the K-means algorithm takes the least time because GK and IGK spend a lot of time in the process of finding the initial cluster center, but compared with this algorithm (IGK), the time required is relatively small under the condition of ensuring the global optimal solution. Aiming at the classical algorithm of clustering analysis technology of data mining, the K-means algorithm is analyzed in detail, and it is improved by combining it with the genetic algorithm. At the same time, it improves the defects of traditional genetic algorithms. Combined with the idea of simulated annealing algorithm, immune mechanism, and adaptive algorithm, the premature problem of genetic algorithm is well optimized to a certain extent. Finally, the improved algorithm (IGK), the standard k-means algorithm, and the traditional genetic algorithm (GK) are compared.
rough the analysis of the experimental results, it is proved that the improved algorithm has show that although the three algorithms finally converge to the value of the same objective function, it is obvious that GK and IGK converge faster than the k-means algorithm, and the IGK algorithm is faster than the GK and K-means algorithm. (b) e experimental results of the Glass dataset test; similarly, it can be seen from the graph that the final convergence values of the three algorithms are very similar, but the average objective function value of the k-means algorithm is actually much larger than the GK amount IGK. Computational Intelligence and Neuroscience better clustering effect and the performance is also the best.
Research and introduce more methods to improve genetic algorithms, such as the gradient method, the hill-climbing method, or the heuristic algorithm with problem-related heuristic knowledge. If the ideas of these optimization methods are integrated, a hybrid genetic algorithm with stronger performance will be formed. It is also a means to improve the efficiency and accuracy of the genetic algorithm.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e author declares that there are no conflicts of interest.