A Mobile Bayesian Network Structure Learning Method Using Genetic Incremental K2 Algorithm and Random Attribute Order Technology

*e application of existing datasets to construct a probabilistic network has always been the primary research focus for mobile Bayesian networks, particularly when the dataset size is large. In this study, we improve the K2 algorithm. First, we relax the K2 algorithm requirements for node order and generate the node order randomly to obtain the best result in multiple random node order. Second, a genetic incremental K2 learning method is used to learn the Bayesian network structure. *e training dataset is divided into two groups, and the standard K2 algorithm is used to find the optimal value for the first set of training data; simultaneously, three similar suboptimal values are recorded. To avoid falling into the local optimum, these four optimal values are mutated into a new genetic optimal value. When the second set of training data is used, only the best Bayesian network structure within the five abovementioned optimal values is identified. *e experimental results indicate that the genetic incremental K2 algorithm based on random attribute order achieves higher computational efficiency and accuracy than the standard algorithm. *e new algorithm is especially suitable for building Bayesian network structures in cases where the dataset and number of nodes are large.


Introduction
e effective expression of uncertain knowledge is an important content of knowledge intelligent learning. In this research field, Bayesian network has always been the focus of attention. Bayesian network is a probability graph model. It represents the dependency relationship between a group of random variables through a directed acyclic graph. e conditional probability table (CPT) formed by each variable represents the probability relationship between variables [1]. It has strong uncertainty reasoning ability, which can realize top-down prediction analysis and bottom-up diagnostic inference [2]. Machine learning technologies have become more and more important in many applications, such as medicine [3], e-commerce [4,5], transportation [6], and image denoising [7]. As one of the machine learning technologies, Bayesian network is widely used in many fields, such as machine vision, biomedicine, classification, fault diagnosis, prediction, natural language processing, and data mining [8].
e Bayesian network is primarily based on Bayesian network learning, which is divided into two steps: structural learning and parameter learning. Structure learning is to obtain a directed acyclic graph that can represent attribute dependencies based on training data and a priori knowledge. Parameter learning is to obtain the conditional probability of each node based on the directed acyclic graph. It is usually called conditional probability table. In these two learning, structural learning is more difficult, and it is also a research hotspot. It mainly focuses on how to avoid falling into local optimization and find the best structure when there are many attributes and few sample data [9,10]. e Bayesian network structure learning algorithm can be divided into a method based on scoring and searching, the conditional independence test, and a hybrid of the two methods.
(1) Method based on scoring and searching: it uses the scoring function to measure the matching degree between the Bayesian network structure and the training sample set. After defining the scoring function, apply the search strategy to find the network structure with the highest score. K2 algorithm is the common one [11]. Due to the constraint of node order, K2 algorithm can effectively avoid the problem of likelihood equivalence and is better than most classical algorithms in running speed and accuracy. However, in most cases, the node order is unknown and usually needs to be determined according to expert knowledge. e difference of expert knowledge is large, which cannot ensure objectivity and accuracy, and it is difficult to achieve when there are a large number of nodes. erefore, researchers have put forward many solutions. It is proposed that conditional frequency is used to determine the node ranking of K2 algorithm. e algorithm does not need complex search strategy and effectively reduces the time complexity. However, it has high requirements for the quality of data set and is not easy to obtain the accurate model [10]. Combining the maximum spanning tree and ant colony algorithm, the MUST-ACO-K2 (MAK) algorithm is proposed to search the node order, but the algorithm needs to substitute the obtained node order into K2 algorithm to get the network structure before scoring, resulting that its running time is too long [10].
(2) Method based on conditional independence test: this method abstracts the learning process of the Bayesian network structure as the process of discovering a set of variables hidden in the network structure that satisfy the independence condition test. Spirtes proposed the SGS algorithm in 1989 [12], which uses conditional independence to test the existence and direction of edges and eliminates the prior constraint of the K2 algorithm that requires a given order of nodes, but the cost of the test calculation is exponential. In the second year, he proposed a PC algorithm [13], which improved upon the search strategy of the SGS algorithm. It requires less computational load when learning a sparse network structure, and it was used by Tsagris [14].
Cheng combined the idea of information theory with an independent testing method [15], and the proposed learning method exhibited good performance in structural learning. (3) Hybrid algorithm: because the method based on score search and the method based on constraint have their own advantages and disadvantages, the hybrid optimization algorithm combining the two has gradually become the mainstream of research. e improved whale optimization algorithm is used to optimize the structure of Bayesian network. e optimization efficiency and accuracy of this method are good, but the complexity is very high [16]. e improved particle swarm optimization (PSO) is proposed to learn the Bayesian network structure. After the initial network is constrained by mutual information, the improved PSO algorithm is used to search the optimal Bayesian network, which improves the optimization efficiency. However, due to the instability of the algorithm, the accuracy of the structure cannot be guaranteed [17]. e bird swarm algorithm is used as the search strategy to improve the search strategy, which makes the search ability stronger and the convergence further improved [18].
For the K2 algorithm, as stated by Cooper, the K2 algorithm can reconstruct a moderately complex belief network rapidly, but it is sensitive to the ordering of the nodes [11]. Further information can be found in [19,20].
is paper presents a new Bayesian network structure learning method based on random node order and genetic incremental search for an optimal path and compares this method with the K2 algorithm. Experiments demonstrate that the method of random node order can yield a better Bayesian network structure without expert knowledge, and the genetic incremental structure learning method can greatly improve the computational efficiency when tested on big datasets, especially when the number of samples and nodes are large. It always exhibits a runtime that is shorter than that of the K2 algorithm.

K2 Algorithm
e K2 algorithm [21] effectively integrates prior information in the search process and exhibits good time performance. It is a classic structure learning algorithm based on scoring search.

Scoring Function.
e node sequence is given in advance, and each node greedily searches its parent node set from its predecessor node according to the Bayesian scoring function and finally obtains the network structure with the best score. e learning of the network structure can be attributed to the given dataset D and finds a network structure B s with the largest a posteriori probability, that is, B s is set to maximise P(B s |D). Additionally, P(B s |D) � P(B s |D)/P(D), the denominator P(D)is unrelated to B s , so the ultimate objective is to find the maximum B s of P(B s |D). rough a series of derivations (for the specific derivation process, see [6]), we obtain where P(B s )is the priori probability of B s . is is the probability set for each structure without providing data. N ijk denotes the jth unique instantiation of π i relative to D.

Scientific Programming
Suppose there are q i such unique instantiations of π i . Define N ijk to be the number of cases in D in which variable x i has the value v ik and π i is instantiated as w ij ; the value of N ij can be obtained using For equation (1) It can be assumed that the probability of each structure obeys a uniform distribution, that is, the probability P(B s ) is the same constant as c. Using constant c to replace P(B s ), equation (1) changes to e objective is to obtain B s that can maximise the posterior probability, as follows: As can be seen from the above equation, as long as the local maximum for each variable is provided, the overall maximum can be obtained. e component of each variable is presented as a new scoring function, as follows:

Search
Strategy. e core of Bayesian network structure optimization is to narrow the search scope through search strategy after determining the scoring function. Greedy search algorithm is the most commonly used method. But it is easy to fall into local optimization. In 2017, the authors of [1] proposed adding disturbance factor to local greedy search, and using the idea of genetic algorithm, the metaheuristic method was used to improve the performance of local greedy search. In 2020, the authors of [22] introduced microbial genetic algorithm into Bayesian network structure learning. e undirected graph with most correct edges is calculated by the maximum information coefficient, and it is used as the initial population, and then the excellent individuals in the initial population are retained by using the operator of microbial genetic algorithm. rough the combination of the two, the purpose of learning close to the real network structure from a small amount of data can be achieved.
Search strategy assumes the nodes are ordered. If x i precedes x j , there can be no edges from x j to x i . Simultaneously, it assumes that the maximum number of parent variables per variable is u. Each time the largest parent variable of the scoring function is selected and inserted into the set, the loop is terminated, and the scoring function cannot be increased further.

Incremental K2 Algorithm Based on Random
Attribute Order

Random Generate Nodes
Order. e K2 algorithm must initialise the order of nodes such that only the node in front of node i can be the parent of node i. is is defined as e disadvantages are as follows. First, the order of nodes is not easy to obtain in most actual network structures, and the expression of this a priori knowledge is not conducive to the understanding of expert knowledge. Second, the fault tolerance for the order of nodes is poor. If the order of nodes that is input into the K2 algorithm is dissimilar to that in the real structure, the accuracy of the K2 algorithm will be greatly reduced; this is owing to its algorithmic theory. In this study, our first objective is to reduce the dependence on the order of nodes. For nodes, each iteration randomly generates an array of nodes, and the generation procedure (see Algorithm 1) is as follows.

Genetic Incremental K2 Algorithm.
e basic idea is to divide the training data into two groups and use the first set of training data to learn a basic Bayesian network structure using the K2 algorithm. In the process of learning, not only the current optimal value, i.e., the decision-making of the algorithm each time, but also several suboptimal values are saved [23], a GA is applied, and the current four optimal values are mutated to a new optimal value. e number of suboptimal values can be adjusted, for example, three or four.
is study selects three suboptimal values. When using the second set of incremental data, it will not research; instead, it will take the four optimal values and the new genetic value as the next search space. e algorithm skilfully eliminates the low-level model, reduces the search space, and improves the efficiency of the algorithm.
In addition, the optimal score function value is preserved in each iteration. After the iteration, the node order of the Bayesian network with the optimal score function value is considered to be the best node order. e algorithm is divided into two parts. e first part (see Algorithm 2)is to generate one optimal value and three suboptimal values, mutate a new optimal value from the first set of data, and store all values in the candidate matrix. e following pseudocode expresses the first part. e second part (see Algorithm 3) of the algorithm uses the second set of data to optimize the suboptimal value. e core content is as follows.

Experimental Results
To test the algorithm, the general ALARM, Asia, and CANCER networks were selected in the experiment. Under different sample numbers, the running time and structural hamming distance [24] were used to evaluate different algorithms.
e experiment adopted the Bayesian network toolbox in the MATLAB platform. e operating environment was Windows 7, Intel (R) Core(TM) i3-4170, 3.70 GHz CPU, 8.00 GB RAM. e results of the experiment are listed in Tables 1 and 2. is algorithm relaxes the strict requirement of the K2 algorithm on node order and improves the efficiency of learning the Bayesian network structure.
(1) In the ALARM network (comprising 37 nodes), the experiment began with a sample size of 4000, the running time of K2 was 7.928, GAK2 was 5.675, and GIMK2 was 0.861; when the size varied to 50000, the times were 29.877, 18.429, and 3.128; when the size was 100000, the times were 55.659, 37.839, and 5.987. SHD is large at first; however, it eventually reduces to zero.
(1) Procedure GCDK2{ (2) {Input: K2 algorithm need parameter initialisation, mutation rate pm � 0.5} (3) {Output: a bayesian network and a matrix contain optimal value and location } (4) For i � 1-n{ (5) P old � g(i, π i ) (6) While OK To Proceed and Length is OK (7) pps � mysetdiff(order(1: i − 1), ps){Potential parents} (8) nps � length(pps) (9) For pi � 1-nps{ (10) pscore End for} (12) [P first , best p first ] � max(pscore);{best p first is a location, P first is the max first score} (13) Get second pscore and second p (14) Get third pscore and third p (15) Get fourth pscore and fourth p (16) Mutate optimal values get a new genetic value GA pscore (17) If (P first > P old ) {P old � P first ;ps � ps best p first (18) Input P first best p first second pscoresecond p,third pscore , (19) third_pfourth pscore,fourth p and GA pscoreinto candidate matrix (20) End if} (21) End while} (22)  e results of the program operation indicate that the genetic incremental K2 algorithm has a shorter running time than K2 and GAK2 for the same sample size. When the mobile Bayesian networks with different number of nodes, particularly when the dataset size increases with the number of nodes, the running time of K2 and GAK2 becomes extremely long.

Conclusion
As data analysis is now being conducted on big data, if we need to analyse big data with uncertain knowledge, especially in the case of numerous attributes, the genetic incremental K2 algorithm can reduce the search space and (1) Procedure GIMK2(part){ (2) For i � 1-length(candidate) (3) child � candidate(i, 1) (4) For j � k to k+4 {from k to k+4 because the candidate stored 5 values} (5) ppar � candidae(i, j) (6) u � find(clamped(child, : ) �� 0) (7) Get P new from child and u (8) End for (9) best action � candidate(i, best p) (10) End for (11) . . .   considerably improve the efficiency of the algorithm. e improved algorithm in this paper is effective; however, it has disadvantages such as the fact that the search space of each algorithm depends on the current optimal path; thus, it is easy to fall into local optimum. e algorithm should thus combine particle swarm optimization, ant colony, or another optimization algorithm to avoid falling into local optimum.

Data Availability
e labeled dataset used to support the findings of this study is available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.