Causal Discovery of Flight Service Process Based on Event Sequence

The development of the civil aviation industry has continuously increased the requirements for the efficiency of airport ground support services. In the existing ground support research, there has not yet been a process model that directly obtains support from the ground support log to study the causal relationship between service nodes and flight delays. Most ground support studies mainly use machine learning methods to predict flight delays, and the flight support model they are based on is an ideal model. The study did not conduct an in-depth study of the causal mechanism behind the ground support link and did not reveal the true cause of flight delays. Therefore, there is a certain deviation in the prediction of flight delays by machine learning, and there is a certain deviation between the ideal model based on the research and the actual service process. Therefore, it is of practical significance to obtain the process model from the guarantee log and analyze its causality. However, the existing process causal factor discovery methods only do certain research when the assumption of causal sufficiency is established and does not consider the existence of latent variables. Therefore, this article proposes a framework to realize the discovery of process causal factors without assuming causal sufficiency. The optimized fuzzy mining process model is used as the service benchmark model, and the local causal discovery algorithm is used to discover the causal factors. Under this framework, this paper proposes a new Markov blanket discovery algorithm that does not assume causal sufficiency to discover causal factors and uses benchmark data sets for testing. Finally, the actual flight service data is used.


Introduction
The 2019 Civil Aviation Industry Development Statistics Bulletin [1] shows that compared with 2018, the total civil aviation transportation turnover in 2019 has increased by 7.2%. The increase in the total airport transportation turnover requires the improvement of airport flight service efficiency. Flight service includes various Through process mining, we get a practical business process model, and on this basis, calculate the performance indicators of each link to describe the operation status of each link. By analyzing the causal relationship between the performance indicators of each link and the flight departure delay, we can determine which node's operation situation affects the flight departure and launch, and provide effective guidance for reducing flight delays and improving airport operation efficiency.
There has not yet been a phenomenon in the existing flight support research that causal factors in the business process based on actual service logs have been discovered. This paper proposes a framework, automatic discovery of causal factors in the presence of latent causal variables based on process mining (ACLP), based on the flight service process model mined by the process mining algorithm, and on the premise of relaxing the sufficiency of causality, using the score-based maximum ancestor graph Markov blanket The algorithm (SMMB) generates a local ancestor map of the flight service launch delay and use the direction between nodes extracted by the process model as a supplement to the local ancestry graph to adjust the direction of the edge, to realize the automatic discovery of cause and effect of the flight service business performance. Our proposed framework combines the process mining method with the causality discovery method under the existence of unobservable variables. When the business process of flight service is unknown, a flight service process model that fits the actual situation is established based on the sequence of events with process mining. Aiming at the problem of unobservable variables in actual scenarios, the newly proposed SMMB algorithm is used to determine the causal relationship between business process performance indicators, and the extraction of causal factors from event data to explaining business process performance is realized. The SMMB algorithm proposed in this paper is based on the score-based local directed acyclic graph (DAG) discovery algorithm, that is, the score-based local learning (SLL) algorithm, which is extended according to the characteristics of the Maximal ancestral graph (MAG) Markov blanket (MB). It is a topology-based method. The neighbor set and spouse set of the target variable is constructed by the method of scoring. Then according to the relevant definition and inference of the area set proposed by [16], the adjacent area set is determined, and the complete MB is searched. Through the advantages of scoring method in searching the neighborhood set and spouses set of the target variable, the SMMB algorithm has a better performance on the F-measure evaluation index than the constraint-based MAG MB algorithm proposed before. It provides new ideas for the automatic causal discovery of the flight service process under latent variables. This article is organized as follows: Section 2 discusses the background, Sections 3 and 4 respectively introduce the proposed framework and experimental results, and give some suggestions based on the experimental results. Section 5 draws conclusions and makes future work Outlook. management organizations. Based on the complexity of process control flow and the related concepts of process repeatability and predictability, the research of Di Ciccio et al. Di Ciccio et al. [8] divide business processes into the following three macro types: structured processes, semi-structured processes, Unstructured process. The structured process is characterized by a clearly defined, predictable, and repeatable sequence of activities and its input and output are pre-defined, while the semi-structured and unstructured process has no pre-defined and repeatable sequence of activities. The semi-structured process can outline the possible sequence of activities based on the case, determine the input of the required activities, and change the sequence of some activities through a specific situation's characteristics. The activities of the unstructured process are differently combined based on the specific instance. The sequence of activities becomes completely case-dependent when the level of process flexibility and unpredictability is increased.
Process mining is a method of analyzing actual business processes based on event logs generated by the system. The idea is to discover, monitor, and improve real business processes by extracting knowledge from event logs.
Definition 1 (event [9]): An event is the instantiation of an activity in a business process, usually represented by a tuple , where a represents the activity name attribute corresponding to the event, caseID represents the instance attribute where the event is located, and eventID represents the event ID attribute of the event, start t represents the start timestamp attribute of the event, end t represents the end timestamp attribute of the event, and represent the value range of each attribute. The event log L for a specific process model comprises a series of events in the process instance. The sequence of all events in the process instance in chronological order is the trajectory. A complete trajectory corresponds to one execution of the process. In terms of flow, all historical execution traces constitute the event log L. The discovery of process models has always been a hot issue in process mining； that is, information about the original process model, organizational context, and execution attributes can be obtained from the execution log in the absence of a priori. Most process discovery algorithms usually apply a single algorithm to control flow steps [10], such as alpha algorithm [11], heuristic mining algorithm [12], multi-stage process mining algorithm [13], region-based mining algorithm [14]. The above algorithm is effective when applied to a structured business process. Still, when applied to a semi-structured or unstructured process, the model found by the above algorithm is really "spaghetti-like."These models describe every detail of unstructured behavior found in logs too finely. The reason for the problem lies in the assumptions that these process mining algorithms are based on. Assume as follows: Assumption 1 [14]: All logs are reliable and trustworthy. Assumption 2 [14]: There exists an exact process which is reflected in the logs.
These assumptions are completely reasonable in a structured and controlled environment, but they do not hold true in a less structured real environment. Therefore, the process model discovery algorithm based on the above assumptions will simulate the entire process completely, accurately, and meticulously. The results are often "spaghetti," and process managers cannot obtain effective information from the model.
In order to solve the above problems, Christian W et al. [15] proposed a process mining algorithm based on fuzzy theory. When dealing with unstructured problems, the fuzzy algorithm can distinguish whether the task is important or not and can remove unnecessary details. A more advanced view is abstracted, which focuses on discovering a more advanced mapping of behavior in the log rather than trying to discover the true process model.  [16]): The observed variable set V is said to be causal sufficient if and only if any common cause of two or more variables in V is also in V.
Causal sufficiency considers that given a set of observed variables V, there is no latent common cause for V's subset of variables.
Definition 3 (Faithfulness [17]): In a , , BN P V G   , G is faithful to the probability distribution P over V if and only if every independence present in P is entailed by G and Markov conditions. P is faithful if and only if G is faithful to P. The faithfulness assumption establishes a relationship between the probability distribution P and its underlying DAG G. We can use a conditional independence test instead of d separation to find all BN's dependencies or independence under this assumption. Under the assumption of satisfying the sufficiency of causality, the MB of the target variable in the DAG includes the parents, children, and spouses of the target variable T. Nowadays, the MB discovery algorithm for DAG has been relatively complete. It can be divided into topology-based methods and non-topological methods. The non-topological methods greedily test each variable and target by using the definition of Markov blanket, like the IAMB algorithm [18]. The topology-based approach aims to gradually search for the MB of the target node, such as Min-max Markov Blanket (MMMB) [19], using the topological characteristics of the MB. The article [20] [21] introduce the same framework based on the topology method and conducts extensive experimental research to verify its superior performance in various applications.
Without assuming the sufficiency of causality, when the underlying data generated has potential common causes, MAG is proposed to represent the independent relationship between the observed variables. There is no need to mark the potential common causes in the structure clearly. A hybrid graph is a collection of nodes and edges, and its edges may be one-way edges (→) or bidirectional edges (↔). V is the colliding node in the ancestry graph. Otherwise, i V is a non-colliding node on τ. Every non-endpoint variable on the collision path from the target node T to Y in the MAG is a collision node. For example, 1 2 3 Definition 5 (m-connection, m-separation [22]): In the ancestor graph G=(E,V), given a set of nodes , The non-colliding nodes on the path p do not belong to Z. 2. Each colliding node on the path is the ancestor of a member of Z, so the path p between A and B is m-connection. If there is no m-connection path concerning Z between A and B, then A and B are m-separation.
Definition 6 (Maximum Ancestor Graph [23]): For any two nonadjacent variables in an ancestor graph, if there is a set of variables m-separating them , the ancestor graph is maximal.
There are relatively few discovery algorithms for MAG MB. The article [16] proposed for the first time the constraint-based local causality discovery algorithm (M3B algorithm) of MB under the MAG framework that does not assume the sufficiency of causality, instead of learning the overall MAG and directly learning MB. This algorithm is a topology-based MB algorithm. The algorithm first finds the neighborhood set (parents and children) of the target node and uses a recursive search algorithm to recursively find the area set of a given target to complete the MB. The article [23] has proved that the constraint-based method is sensitive to error propagation, and there is no scoring method for the algorithm of MAG MB discovery. Therefore, the framework of the SLL algorithm mentioned is extended according to the characteristics of MAG MB. First, the neighbor set and spouse set of the target variable is constructed by the method of scoring. Then according to the relevant definition and inference of the area set proposed in the literature [7], the adjacent area set of the target node is determined, and the MB is finally completed. This paper optimizes on the basis of the fuzzy process model and obtains the actual flight guarantee process model. Based on the actual business process, the performance indicators of the nodes are calculated to measure the operating status of each node, and finally, the SMMB local causality mining algorithm is used to find the root cause of the delay in flight launch when existing latent variables. This method provides new research ideas for discovering causal factors in process mining in the presence of latent variables. For the automatic discovery of causal factors of business performance, few scholars have done in-depth research. For example, literature [24][25] proposed a method for automatically discovering process performance bottlenecks and deviations based on event data, but it did not explore causality. Literature [26] proposed a method based on time series analysis to detect the causal relationship between business process characteristics and process performance indicators. However, the Granger causality test method adopted did not consider the existence of latent variables; that is, it believed that the actual data satisfies the assumption of causality sufficiency. Therefore, this article's contributions are as follows: (1) For the first time, it is proposed to realize the automatic discovery of causal factors of business performance under the premise of relaxing the assumption of causal adequacy. (2) The proposed local MAG discovery algorithm based on scoring has more advantages than M3B and RFCI algorithms.

Materials and Methods
This section proposes a framework to realize the automatic discovery of flight guarantee causality from the sequence of the events, as shown in Figure 1. The framework consists of two parts: (1) process model mining (2) construction of the local causal structure. The first part is the fuzzy mining algorithm, which mainly includes two stages. The first stage is the initialization stage, which establishes the initial process model through the flight guarantee event sequence. The second stage is the simplified stage, which mainly includes three parts, conflict resolution, edge filtering, Clustering, and abstraction. This article proposes solutions to the unary and N-ary conflicts in the initial model and optimizes the process model. The second part is the construction of local causal construction. This method is based on the SLL algorithm. It is mainly used to search for the neighboring nodes and spouse nodes of the target node and combines MAG MB based on the searched neighboring node-set and spouse set. The feature searches the area set of the target node, the parent set of the area set, etc., and then completes the complete Markov blanket's construction. 3.1 Process model mining As mentioned earlier, the fuzzy mining algorithm is divided into two stages. The first stage is the initialization stage, which converts each observed event type into an activity node. The directed edges added to the model represent the ordering relationship between activities. The second stage is the simplification stage, divided into three steps: conflict resolution of binary relations, edge filtering, aggregation, and abstraction. As shown in Figure 2, the initialization model's possible conflict relationships include binary conflicts and N-ary conflicts, and unary conflicts. The conflict resolution part of the fuzzy mining algorithm only includes the solution of the binary conflict problem, which leads to the phenomenon of the N-ary cycle and self-circulation in the process model obtained by the fuzzy process mining algorithm. For the flight service process, self-circulation is possible in the process. Take the change of aircraft parking position as an example. When the aircraft enters the airport, due to the shortage of aircraft space resources, the aircraft needs to change its parking position several times. Therefore, in the event sequence, the change of aircraft parking position will appear multiple times in a row, and the model obtained by the process mining algorithm will have a self-circulation phenomenon. The existence of this phenomenon may be a pure exception. There are no binary and N-ary cycles in the actual operation process of flight service, and it operates in sequence over time. Aiming at the reason for this phenomenon and the solution of binary conflict of fuzzy mining algorithm, the solution of N-ary conflict and unary conflict is derived to optimize the process model of fuzzy mining. In the fuzzy mining algorithm [27], the generation of binary conflicts is divided into three situations: binary loop, exception, and concurrency. If the relative importance ( , ) rel A B and ( , ) rel B A of two conflicting relationships exceed the retention threshold, then activities A and B form a binary cycle. If at least one conflicting relationship is lower than this threshold, determine the offset between the relative importance, ( If the offset value exceeds the ratio threshold, the less important relationships are deleted. Suppose at least one relationship wants to retain the threshold for importance, and the offset value is less than the ratio threshold. In that case, the relationship between A and B is a low and balanced relationship, implying that A and B are executed at the same time, so both edges are deleted at the same time. The formula for relative importance is as follows: Where  is the node-set of the process model, is the priority relationship assigned to each pair of nodes , A B   , and is the relative importance between each pair of nodes A and B.
On the basis of the solution of the binary conflict, the solution of the N-ary conflict is derived. First, the relative importance of the binary relationship does not apply to the N-ary conflict, so it needs to be expanded to the relative importance of the N-ary relationship.
Where  is the node-set of the process model, N is the size of the node-set  Similar to the binary conflict, the situations in which the N-ary conflict relationship is generated can also be divided into the following three categories: 1) N-ary cycle: N activities 1 2 {A ,A ,...A } N form a cycle, that is after 1 2 {A ,A ,...A } N is executed in sequence, N A can return to activity 1 A and start again. In this case, the priority relationship between these activities is allowed in the actual process and therefore needs to be preserved.
2) Exception: The process is executed in sequence 1 but there will be exceptions to in the actual execution process. In this case, remove the exception edge in the weaker chain structure.
3) Concurrency: There is a parallel structure in N activities A  that cause conflicts. Relationship. It can be seen from the above that the flight guarantee process model essentially does not have the possibility of N-ary cycles, so the causes of N-ary conflicts are exceptions and concurrent situations. The solutions are as follows.
Determine the offset between each chain relationship and the chain relationship with the greatest relative importance, as shown in formula (3). Unlike binary conflicts and N-ary conflicts, there is no concurrency in unary conflicts, but only self-circulation or exceptions. This situation can be resolved by creating a virtual node, removing the unary conflict, and introducing it into the fuzzy mining algorithm for edge filtering. As shown in Figure 3. After the loop is released, the priority relationship of virtual nodes B_1 and B→B_1 can be obtained.

Local causal structure construction
The second part of the framework focuses on building the MAG MB of the target variable based on the SLL algorithm under the assumption of relaxing causality sufficiency. Different from the MB of DAG, MB of MAG includes the area set and the node-set related to the area set in addition to the parent-child set and spouse set. The area set is defined as follows: Definition 7 (District Set [7]): The district set of a target variable T in an MAG, denoted as dis(T),is a set of variables in which ( ) i V dis T   ,and the path from i V to T only contains bidirectional edges. On this basis, the article [7] proposed a method for determining whether a variable belongs to the target variable district set.
The above theorem shows that the variable in sp(T) is a candidate variable for the district set variable in adj(T). From this, we can further conclude whether there is a bidirectional edge between the target variable T and the variable V, as shown below. Corollary 8: Through the above deduction, the district set adjacent to the target variable T can be judged.

MAG MB includes the parents pa(T) of T, the children ch(T) of T, the spouses sp(T) of T, and the district set dis(T) of T, union of parents of each variable
Compared with the MB discovery algorithm that uses the independence test to find the target variable T, the score-based MB discovery algorithm relies on certain scoring criteria to learn the most suitable network structure for the data sample. It has the following characteristics: i.i.d samples from some distribution  .Let G be any BN structure and ' G be the same structure as G. but with an edge from a node T to a node X. Let G X Pa be the parent set of X in G. A score criterion s is locally consistent if, as the size of the Data D goes to infinity, the following two properties hold true: ( Intuitively speaking, adding an arc can eliminate independent constraints that do not exist in the data generation distribution, thereby increasing the score. Adding an arc cannot eliminate such constraints and reduce the score. Therefore, the scoring function can replace constraints to construct a causal structure to some extent. For MAG structure learning, the existing DAG scoring function cannot be directly applied to MAG. M 3 C [29] and GSMAG [22] algorithms proposed a scoring function suitable for MAG, based on residual iterative conditional fitting to obtain the maximum likelihood estimation of a given MAG parameter. However, the M3HC and GSMAG algorithms make new assumptions on the data, which are not general. From Corollary 7, we can get the method for judging bidirectional edges. The core of the idea is to judge whether the spouse node of the target node T and the spouse node of the neighboring node V_j of T are in the other party's neighboring node. Therefore, this paper proposes a topology-based method to find the target node's neighborhood set and spouse set using the SLL [30] method. On this basis, use Corollary 7 to determine the bidirectional edge of the target node and the neighboring nodes. In this way, the neighboring district nodes of the target node are found, and the complete district set of the target variable is obtained through search the neighboring district nodes of the district nodes. The algorithm is shown below:

23: end for
The first step of the SMMB algorithm is to search for the parent and child sets of the target node based on the SLL algorithm, and the fourth to fifth steps call the FINDSPOUSES of the SLL algorithm to find the spouse node of the target node and its child nodes. The idea is as follows: The FINDNEIGHBORS algorithm is divided into two stages. The first stage searches for the potential neighbor nodes of the target variable puts the nodes except the target node one by one into the set Z where only the target variable exists and calls the subroutine to learn Z. Put the learned potential neighbor nodes into Z to update Z to complete the search for the set of potential neighbors. The subroutine can use a dynamic programming algorithm or other precise algorithms. In this article, the commonly used precision algorithm, the GES [31] algorithm, is used as a subroutine, and its scoring function is as follows: The scoring form of the local structure formed by the target variable and its parent node is as follows:

Experiment
In order to evaluate the quality of the method proposed in this paper, in section 4.1, this paper uses the benchmark Bayesian network test data set alarm data set to test the method proposed in section 3.2, and evaluates the method proposed in section 3.2 with the indicator F-measure, which proves that it is suitable for common RFCI The superiority of the algorithm and the M3B algorithm. n section 4.2, use the flight guarantee data of China Xining Airport in July 2018, use the actual flight guarantee data at the airport to generate the airport flight guarantee process model, calculate the operation duration of each link as the performance indicator of each link, and extract the direction of the edges between nodes in the process model. The SMMB algorithm is used to construct a local causal model of performance indicators and flight launch delays and adjust the direction of the one-way edge in the causal model according to the direction between the extracted process model nodes. Finally, the MMHC algorithm is used to construct a local causal model of flight delays as a benchmark model to compare and analyze with the causal model constructed by SMMB. The local causal model constructed by the MMHC algorithm will also be adjusted according to the direction of the edges between nodes in the process model.

Causal discovery algorithm testing
The experimental test data source is the ALARM network, which contains 37 nodes and 46 edges. The network is a sparse network, which is considered a default standard for measuring the causal network construction program's level, and many algorithms and various programs have verified this data,and There is a standard network structure for comparison and reference. In order to test the performance of the above algorithm, this article uses ALARM data to randomly generate three sets of data. The first set of data includes 5 data sets of 2500 data instances, the second set of data includes 5 data sets of 5000 data instances, and the third The group data includes 5 data sets of 10,000 data instances. Then hide some common causes in the generated data set, and treat these hidden variables as potential common causes. Specific steps are as follows: 1) Do not hide any variables as latent variables, and this paper mine the local causal network of variable VTUB.
2) The hidden variable INT is used as a latent variable, and this paper mine the local causal network of VTUB, where INT is the potential common cause of the variable SHNT, the variable VLNG, and the variable PRSS.
3) Hidden variables INT and PMB are used as latent variables, and this paper mine the local causal network of VTUB. In this network, INT is the potential common cause of variable SHNT, variable VLNG, and variable PRSS, and variable PMB is the potential common cause of variable PAP and variable SHNT.
Use the data set generated by the above steps to compare the SMMB algorithm with RFCI and M3B algorithm, respectively. The RFCI and M3B algorithms are both constraint-based MAG discovery algorithms. There are three types of conditional independence tests, 2 G test for discrete variables, ' Fisher s Z test for continuous variables with linear relations with additive Gaussian errors, and kernel-based test for continuous variables with nonlinearity and non-Gaussian.
This article is the same as the literature [16]. Both RFCI and M3B algorithms are tested, and the significance level of the test is set to 0.05. The test index used is F-measure. F-measure: F-measure combines two indicators of prediction accuracy and recall and is defined as: 2 Pr -Pr Re Precision: The prediction accuracy rate refers to the percentage of correctly predicted MNC to the total number of predicted MNI. It is used to evaluate the number of false positives in the output of the algorithm.  Tables 1, 2, and 3, the accuracy, recall rate and F1 value vary with sample size.   4062338 From the accuracy comparison chart, we can clearly see that the SMMB algorithm and the M3B algorithm are relatively close in accuracy, and both are better than the RFCI algorithm. In the comparison of the recall rate, the RFCI algorithm is much higher than the SMMB algorithm and the M3B algorithm.It can be seen that the RFCI algorithm contains more redundant nodes than the causal network discovered by the M3B and SMMB algorithms. Compared with the M3B algorithm, the SMMB algorithm has a certain degree of competition in accuracy, but the SMMB algorithm has a higher recall rate than the M3B algorithm.As a result, in terms of comprehensive evaluation index F1, the F1 value of the SMMB algorithm is better than that of the M3B algorithm and the RFCI algorithm.
SMMB algorithm, RFCI algorithm, and M3B algorithm are algorithms for constructing causal networks based on topology, and all need to find adjacent variables of a given target variable first.For the RFCI algorithm, the key is to find the correct graph skeleton from the data set. For the SMMB algorithm and M3B algorithm, the key is to find the neighboring nodes of the target variable. The RFCI algorithm uses the PC-stable algorithm to find the network skeleton, while the M3B algorithm uses the AdjV algorithm to find the neighboring nodes of the target variable. SMMB algorithm uses the score-based SLL algorithm framework when looking for the target node's parent-child set and spouse set. Compared with the constraint-based method such as the AdjV algorithm and the PC-stable algorithm, this algorithm searches the target node's neighborhood set. It has more advantages.   The ProM used in this paper is used as a tool for mining the flight service process, using the flight guarantee data of Xining Airport in China in July 2018 for a case study. There are 122839 pieces of this data, which record the case, type, Activity, Resource, Timestamp, and other information of Xining Airport from July 1st to July 31st. The data sample is shown in Table 4 below. The fuzzy process mining algorithm plug-in in the ProM tool performs process mining on the flight service event log. The retention threshold and the ratio threshold are set to 0.27 and 0.35, respectively, to obtained the flight service process model. As mentioned in section 3.1, The process model is optimized for the N-ary conflict and the unary conflict resolution. The optimized model is shown in Figure 7. For the node's evaluation index, this paper calculates the timestamp difference between the previous node and the next node as the duration of the previous node to evaluate the service operation efficiency of each node. For the link structure with parallel relationship, according to the link structure's duration is used as the basis for competition, and the longer part is selected as the overall duration of the parallel structure. In addition, the departure time of the node's front station and the node's own station is used as the evaluation index. By looking for the causal relationship between the flight launch delay time and the evaluation indicators of each flight guarantee node, we can analyze which nodes of the flight guarantee cause the final delay. Import the obtained evaluation index of each flight node into the SMMB algorithm, and the target node is flight delay. The resulting causal network of flight guarantee is shown in Figure 8 below. Figure 9 shows the local causal model of flight delays obtained after importing the evaluation indicators of each node of the flight guarantee into the PC algorithm. This is extracted from the complete causal model. The meaning of each node letter in the figure is shown in Table 4.
It can be seen from Figure 9 that CHECKINSTART_ALANDINTIME, REGEAR_REALTAKEOFF,TRACTORBIND_TRACTORSTART,MANIFESTCON VEY_CLOSECABINGATE,CLEANPERSONARR_CLEANSTART,REARCABINT RANSPORTCAREND_TRACTORBIND,DSHUTTLEARR_BOARDINGSTART,TA KEOFFATTHEFRONTSTATION_PUSHPERSONARR,PASSENGERUPPERSONA RR_DSHUTTLEARR, DELIVERYPERSONARR_REGEAR are considered to be the main causes of flight delays. However, in Figure 8, there are bidirectional edges between CHECKINSTART_ALANDINTIME,REGEAR_REALTAKEOFF, TRACTORBIND_TRACTORSTART, MANIFESTCONVEY_CLOSECABINGATE, and flight delays, which means that there are unobservable latent variables that affect the two sides of the bidirectional edges connection. This is also the advantage of the MAG graph to represent the causal model. The MAG graph can represent the existence of latent variables through bidirectional edges, avoiding the influence of confounding effects caused by latent variables. The causal network diagram discovered by the MMHC algorithm does not consider the existence of latent variables, leading to the mistaken belief that CHECKINSTART_ALANDINTIME,REGEAR_REALTAKEOFF,TRACTORBIND_ TRACTORSTART,and MANIFESTCONVEY_CLOSECABINGATE are the direct causes of flight delays. There is no causal relationship in nature. Therefore, improving the four links of CHECKINSTART_ALANDINTIME, REGEAR_REALTAKEOFF,TRACTORBIND_TRACTORSTART, and MANIFESTCONVEY_CLOSECABINGATE in airport flight guarantee cannot effectively improve airport flight delays. Figures 8 and 9 all believe that CLEANPERSONARR_CLEANS-TART,REARCABINTRANSPORTCAREND_TR ACTORBIND,DSHUTTLEARR_BOARDINGSTART,TAKEOFFATTHEFRONTST ATION_PUSHPERSONARR,and PASSENGERUPPERSONARR_DSHUTTLEARR are the direct causes of flight delays. Therefore, if the airport wants to change the current situation of flight delays, it can start by improving the interval time of the five links of CLEANPERSONARR_CLEANSTART, REARCABINTRANSPORTCAREND_TRACTORBIND,DSHUTTLEARR_BOAR DINGSTART,TAKEOFFATTHEFRONTSTATION_PUSHPERSONARR, and PASSENGERUPPERSONARR_DSHUTTLEARR, optimize the efficiency of cleaning personnel and release personnel, and reasonably plan the driving route of tractors and shuttles, to improve the status of flight delays at the airport.

Conclusions and Future Work
This paper uses a new framework to automatically discover the root cause of business process performance problems. Compared with the automatic causal discovery of business process performance based on the Granger causality test proposed by some researchers, the method proposed in this paper uses the SMMB algorithm. Instead of the Granger causality test to solve the influence of potential confounding effects, the node pairs with latent variables are marked by bidirectional edges. However, this article only discovers the causal relationship between business performance indicators represented by flight delays and node operation duration and does not explore the specific causal effects of each node duration on flight delays. This problem is essential that a causal effect estimation problem when covariates are missing. In future work, we will further explore this problem by estimating the causal effect of each node on flight delays and looking for nodes that have a greater impact on delays; take appropriate measures to these nodes to improve the business performance of the airport.