Set-Based Differential Evolution Algorithm Based on Guided Local Exploration for Automated Process Discovery

Evolutionary algorithm is an effective way to solve process discovery problem which aims to mine process models from event logs which are consistent with the real business processes. However, current evolutionary algorithms, such as GeneticMiner, ETM, and ProDiGen, converge slowly and in difficultly because all of them employ genetic crossover and mutation which have strong randomness. /is paper proposes a hybrid evolutionary algorithm for automated process discovery, which consists of a set-based differential evolution algorithm and guided local exploration. /ere are three major innovations in this work. First of all, a hybrid evolutionary strategy is proposed, in which a differential evolution algorithm is employed to search the solution space and rapidly approximate the optimal solution firstly, and then a specific local exploration method joins to help the algorithm skip out the local optimum. Secondly, two novel set-based differential evolution operators are proposed, which can efficiently perform differential mutation and crossover on the causal matrix./irdly, a fine-grained evaluation technique is designed to assign score to each node in a process model, which is employed to guide the local exploration and improve the efficiency of the algorithm. Experiments were performed on 68 different event logs, including 22 artificial event logs, 44 noisy event logs, and two real event logs. Moreover, the proposed algorithm was compared with three popular algorithms of process discovery. Experimental results show that the proposed algorithm can achieve good performance and its converge speed is fast.


Introduction
Process-based information system (PIS), including workflow management system (WfMS), customer relationship management (CRM), enterprise resource planning (ERP), has become the fundamental infrastructure of modern enterprises. PIS can greatly improve the operational efficiency of enterprises. Besides that, it will record the information of business processes, such as name of activities, time of activities happening, life cycle of activities, to form event logs.
e XES Standard, published by IEEE in 2016, provides a unified and extensible language to standardize the content and format of event logs. Process mining technique can be used to discover a process model from a XES event log. It is hoped that the mined process model is as consistent as possible with the real business process. e obtained process model can be used to improve business processes of enterprises, increase production efficiency, and optimize products. For example, ASML employs process discovery technique to optimize the wafer scanner during the production of lithography machines. e ERP system of SAP uses the process discovery technique to assist users design business process, analyze the bottleneck of business, and plan resource. Philips collects the event logs from their medical devices around the world to analyze customers' habits. By this way, they can optimize their medical products and shorten the time of product development.
Generally, there are three major tasks in process mining, including process discovery, conformance checking, and process enhancement [1]. Process discovery aims to obtain a process model, which is as consistent as possible with the real business process. Most studies are focused on mining binary relations of any two activities in event logs from the perspective of control flow. Beyond that, it can also discover the knowledge contained in the event log from other perspectives, such as organization, time, resource, and so on. Conformance checking is used to measure the deviation of a mined process from a real business process by replaying the event log on the mined process model. is technique can be used for the diagnosis of the process model as well as the analysis of business bottleneck. Process enhancement focuses on changing or extending a prior process model. For example, by using time-stamp in an event log, the model can be enhanced to analyze bottleneck, estimate remaining time, and discover hierarchical process model. In this paper, I only focus on the process discovery technique from the perspective of control flow. e ɑ-algorithm, proposed by van der Aalst et al., is usually regarded as a milestone in the field of process mining [2]. It models the workflow by Petri net and can effectively find the causal relation, parallel relation, and choice relation between any two activities from the event log. After that, some variants of the ɑ-algorithm were proposed, just like ɑ+ algorithm [3] and ɑ++ algorithm [4]. However, there are many shortcomings in the ɑ-series algorithms, such as ability to resist noise, alignment-based fitness, and precision. To address these problems, some more efficient algorithms were proposed, such as ILP Miner [5] and inductive miner [6,7]. e former was proposed recently by van Zelst et al., which is based on integer linear programming. e latter was proposed by Leemans et al. Both of them show good performance when dealt with small event logs.
Evolutionary algorithm is an effective way to solve the problem of process mining. de Medeiros et al. [8] firstly employed the genetic algorithm (GA) in process mining, named as GeneticMiner. By defining good fitness function as well as genetic operators (i.e., crossover and mutation), the GeneticMiner can find a process model, which is consistent with the real process. Cheng et al. [9,10] indicated that the GeneticMiner cannot effectively discover parallel structures from event logs; therefore, they proposed a hybrid technique, which is based on integration of GeneticMiner, particle swarm optimization, and differential evolution to improve the result of process mining. Vázquez-Barreiros et al. [11] proposed another algorithm, named ProDiGen, which improves the GeneticMiner by introducing hierarchical fitness function to find complete, precise, and minimal structure process models. Buijs et al. [12][13][14] employed evolutionary tree to present process models and proposed an alignment-based technique to guide the mutation in GA. However, it is unbearable to embed alignment-based local search in mutation operation because it is too time-consuming. In general, the advantage of the genetic algorithm includes good antinoise, and it can deal with a major part of key problems in process mining by a unified framework, such as invisible tasks, non-free-choice structure, and tasks with duplicated name. However, the convergence speed of current GA-based algorithms is too slow because all of them adopt random search.
In this paper, a hybrid evolutionary algorithm for process mining is proposed, named DEMiner. e innovation of our work includes three parts: (1) A hybrid evolutionary strategy is proposed in this work. In our method, DEMiner firstly approximates the optimal process model by a set-based DE algorithm; when the prematurity is detected, a guided local exploration method will join in the evolution process to help the algorithm skip out the local optimum.
(2) Two set-based DE operators, i.e., a set-based mutation operator and a set-based crossover operator, are designed for differential evolution of causal matrix. (3) A fine-grained evaluation method is proposed to guide the local exploration by assigning scores to all the nodes in the candidate process models. is method can not only help the population avoid prematurity but also improve the efficiency of the DEMiner. e rest of this paper is organized as follows. Section 2 introduces some basic knowledge of process mining, such as Petri net and causal matrix, as well as the DE algorithm. Section 3 explains the proposed algorithm in detail. e experiments as well as the analysis of experimental results are given in Section 4. Finally, Section 5 gives conclusions.

Process Mining.
In the problem of process mining, a process model is generally modeled as a place/transition net (abbreviated as P/T Net), which is a variant of classic Petri net. e definition of P/T is given below.
Definition 1 (P/T Net) [2]. A P/T Net is a tuple N � (P, T, F), where P is a finite set of places, T is a finite set of transitions, P ∩ T � ∅, and F � (P × T) ∪ (T × P) is a finite set of directed arcs.
Let N � (P, T, F) be a P/T Net. Elements of P ∪ T are called nodes. A node x is an input node of another node y if 〈x, y〉 ∈ F. Similarly, a node x is an output node of another node y if 〈y, x〉 ∈ F. Furthermore, a symbol •x denotes all the input nodes of node x; that is, Similarly, x• denotes all the output nodes of x. Based on the P/T Net, a formal definition of workflow net (abbreviated as Wf-Net) is given. [2]. Let N � (P, T, F) be a P/T Net, and t is a fresh identifier not in P ∪ T. N is a Wf-Net if is strongly connected Figure 1 shows a process model, which is represented by a Wf-Net. e circles denote places, and the squares denote transitions. e transitions in Wf-Net represent the activities (also called tasks) in the real business process. e black dot in the initial place denotes a token. A transition is enabled to be fired if all of the input places contain tokens. e word "fire" means that an activity is ready to be executed. If a 2 Complexity transition fires, tokens in its input places are removed; meanwhile, tokens are put in its output places. For example, if the transition "A" is fired, the token in place "start" would be removed and the place "P1" and "P2" will get a token, respectively. After that, three transitions "B," "C," and "D" would be enabled. Note that "P1" just has one token; in other words, although two transitions (a.k.a. "B" and "C") are enabled, only one can be fired. us, the possible sequences between "A" and "E" include {B, D}, {C, D}, {D, B}, and {D, C}.
Given a sound Wf-Net N � (P, T, F), we say δ ∈ T * is an event trace and W ∈ P(T * ) is an event log, which consists of traces. Take the above process model as an example, and it can induce lots of event traces, just like ABDEG, ACDEH, ADBEFCDG, and so on. e first problem required to be solved in evolution-based process mining is coding of chromosome. Unfortunately, it is hard to directly employ Petri net for evolution. de Medeiros proposed the causal matrix to represent the process model, which has been applied to many evolutionary algorithms of process mining, such as ProDiGen and GeneticMiner. e definition of the causal matrix is given below.
Since we usually need to compare the process model presented by the causal matrix with other models presented by Petri net, a method of mapping the P/T Net to the causal matrix is required. Definition 4 shows a method for converting a P/T Net to a causal matrix.
Definition 4 (Mapping of a P/T Net to a Causal Matrix) [8]. Let N � (P, T, F) be a P/T Net. e mapping of N is a tuple To explain the mapping process, we convert the process model in Figure 1 to a causal matrix, which is shown in Table 1. Take activity "E" as an example, and it has two input places "P3" and "P4." Furthermore, the input transitions of "P3" are transitions "B" and "C," and the input transition of "P4" is transition "D," and thus, It should be noted that those activities in the same subset of I Π (t) have an OR-join relation, and those different subsets in I Π (t) have an AND-join relation. On the contrary, activities in the same subset of O Π (t) have an OR-split relation and those different subsets have an AND-split relation. Besides, I Π (t) � { } demonstrates that the input of the activity is empty and O Π (t) � { } demonstrates the output of the activity is empty.

Differential Evolution
Algorithm. Differential evolution algorithm (DE), firstly proposed by Das et al. in 1995, is a stochastic method simulating biological evolution, in which the individuals adapted to the environment are preserved through repeated iterations [15]. Compared to other evolutionary algorithms, the DE algorithm has some advantages, such as better global searching ability, fast convergence speed, and strong robustness. e major steps of DE include mutation, crossover, evaluation, and selection, which is similar with GA. DE starts with a population, which contains N randomly generated individuals (also known as chromosomes). e individual is represented by a vector X → r 2,i ,G , and X → r 3,i ,G from the population. e indices r 1,i , r 2,i , and r 3,i are mutually exclusive integers randomly chosen from the range [1, N]. en, the difference of any two of these three vectors (i.e., X → r 2,i ,G and X → r 3,i ,G ) is scaled by a scalar factor F, and then the scaled difference is added to the last vector To enhance the potential diversity of the population, a crossover operation comes into play after generating the donor vector. e donor vector exchanges its components with the target vector X → i,G under this operation to form the ere are two popular ways for crossover in DE, which are exponential crossover (or two-point modulo) and binomial crossover (or uniform). is paper just introduces the latter, which is given in formula (2), where Cr is the crossover rate and j rand is a random number. e condition j � j rand guarantees that at least one element of the donor vector will be selected. e obtained trial vector will be evaluated by a predefined fitness function. If the fitness of the trial vector is higher than the fitness of the target vector, the DE algorithm would replace the target vector by the trial vector; otherwise, it keeps the target vector:

Framework of DEMiner.
e GA-based process mining algorithms, including GeneticMiner [8], ProDiGen [11], and ETM [12], suffer from the problem that all of them need hundreds or even thousands of generations to converge to a solution. e reason behind this problem is that the genetic operators follow a completely random way, without taking advantage of the information of the log and the errors of the mined model during the parsing of the traces. e ProDiGen solves this problem in a simple way that it selects an incorrectly parsed activities as the crossing point in the step of crossover. However, it cannot significantly improve the speed of convergence because the process of crossover is also random. e ETM employed the alignment-based technique for local exploration to accelerate the convergence. However, the total running time is also unacceptable because the alignment algorithm is too time-consuming.
In this section, I will introduce a hybrid evolutionary algorithm for automated process discovery, called DEMiner. e main steps of DEMiner are shown in Figure 2. It can be seen from the figure that the major difference between DEMiner and the traditional evolutionary algorithm is that it needs to select a specific evolutionary strategy in the loop.
Step V is a set-based DE algorithm (abbreviated as DE or DE algorithm), which is in charge of fast approximation of the optimal solution. However, the DE algorithm usually falls into local optimum. To overcome the premature convergence, I employ Step VI, which is a guided local exploration algorithm. e local exploration algorithm can take advantage of the error information during the parsing of the log and help DEMiner quickly skip out the local optimum. e pseudocode of DEMiner is given in Algorithm 1. e algorithm skips out the loop when the number of generation is higher than a predefined threshold max-Generations or the timesNotChange is higher than max-NotChange. e variable timesNotChange records how long the population has not been replaced. In the loop, two statistics, i.e., meanFitness and devFitness, are used to detect whether the algorithm appears premature convergence. e former is the mean fitness of the population, and the latter is the deviation of the fitness value. If meanFitness and devFitness are lower than predefined thresholds MF and DF, at the same time, I think the algorithm is premature. Besides two statistics, a random number rand is used in the condition. e reason behind this consideration is that the global searching ability of the local exploration algorithm is lower than the DE algorithm. Sometimes, the local exploration may make an individual move forward, but it may not cause significant changes in two statistics. erefore, the proposed algorithm will randomly choose a strategy if it falls into local optimum. Next, I will introduce these steps in detail.

Population Initialization.
e population initialization method used in DEMiner follows the heuristic method proposed in [8], which is based on the causal relation between activities. Except that, there are two changes in our method, called gene bank and taboo list, which can improve the performance of DEMiner.
Gene bank is a set of chromosomes (i.e., individuals), including the individuals that are in current population and the individuals that have been eliminated during the evolution. To reduce the cost of memory space, the individuals in gene bank will be serialized; in other words, they will be converted to a simple format. For example, [D]]." If the algorithm generates an individual which has been in gene bank, the individual would be discarded without calculation of its fitness value.
Taboo list keeps a set of historical operations of local exploration. In DEMiner, an important step, called guided local exploration, is employed to search around a specified node. e local exploration requires to randomly select one of the three operations, which are adding an arc, deleting an arc, or redistributing a node. ese operations that have been performed, no matter they are useful or useless, are forbidden to be selected again. Every node has a taboo list, and these lists will be initialized to empty at the beginning. Table 1: Causal matrix of the process model.

Fitness Function.
Generally, two metrics should be considered when evaluating a process model, which are completeness and precision [16]. e completeness quantifies the ability of a discovered process model that it can accurately parse the traces recorded in the event log. A natural way to define a completeness metric is the number of correctly parsed traces divided by the total number of traces. However, such definition is too coarse because it cannot indicate how many parts of an individual are correct when the individual does not properly parse a trace. Consider two process models: one is a totally incorrect process model, and the other just misses an arc; the above method cannot distinguish the two individuals because both of them cannot correctly parse the log. Due to this, I employ the partial completeness given in [8], which takes into account the correctly parsed activities as well as the number of tokens, which are missing or not consumed during the parse. I use the symbol "C f " to denote the completeness metric. A discovered process model may not be appropriate even if it gets completeness. For example, a flower model can parse arbitrary event logs, but it is useless. Precision is used to quantify the fraction of the behavior allowed by the model, which is not seen in the event log. However, it is hard to give a proper definition of precision because it has to detect all the extra behavior, i.e., possible path in the model but not in the log. In [4], the definition of the precision is allEnabledActivities(L,CM) divided by max(all Enabled Activities(L, CM)), where all Enabled Activities(L, CM) is the number of enabled activities when a log L is parsed by an model CM. e denominator is a function which returns the max number of enabled activities in the population. It is easy to find that the precision of each individual depends on the rest of the population. In this work, I consider another definition of precision proposed in [11] (see formula (3)). It is easy to find that the more the activities enabled in a process model is, the lower the precision of the process model is: Generally, I need to assign weighted coefficients to combine the two metrics in a weighted sum [17]. However, it is difficult to combine the two metrics in an appropriate way because the used precision is not normalized. erefore, a hierarchical method is employed to define the fitness function in this work. Because the completeness is more important than the precision when evaluating a discovered process model, I firstly compare the completeness of two process models; if their completeness is equal, then I compare their precision. By this way, when the completeness of all individuals are equal to 1, the individual that has better precision would win. It is easy to notice that the hierarchical fitness function can be easily extended by other metrics, such as structure complexity and generalization. I  II  III   V  I V   VI   End I  II  III  IV  V  VI Step Description Initialize the population Evaluate individuals in the population Stopping condition achieved? Select an evolutionary strategy Build a new population by the DE algorithm Build a new population by guided local exploration (1) Initialize population (2) Evaluate population (3) Calculate meanFitness and devFitness of the population

Start
Generate the trial individuals by the guided local exploration (8) else (9) Generate the trial individuals by the DE algorithm (10) Evaluate the trial individuals (11) if the fitness of the trial individuals is higher that the fitness of the target targets do (12) Replace population (13) timesNotChange ⟵ 0 (14) else (15) timesNotChange++ (16) Update meanFitness and devFitness (17) generation++ ALGORITHM 1: Pseudocode of DEMiner.

Differential Evolution Algorithm.
e DE algorithm contains a loop which goes through all individuals in the population. For each individual (called the target individual), it firstly generates a donor individual based on three randomly selected individuals (mutation operation), and then it combines the target individual and the donor individual to get a trial individual (crossover operation). It must be emphasized that, due to the obtained donor/trial individual may be inconsistent, both of them should be repaired before going to next step. en, the trial individual will be evaluated if it is not in the gene bank. If the fitness value of the trial individual is higher than the fitness of the target individual, the target individual would be replaced by the trial individual; meanwhile, the trial individual would be added into the gene bank. It can be seen that there are three key steps in the DE algorithm, which are mutation, crossover, and repair. Next, the details of the three steps will be explained.

Mutation.
Current set-based evolutionary algorithms usually employ crisp sets to represent the candidate solutions (which are called individuals or chromosome in GA). For example, Chen et al. [18] proposed a set-based particle swarm optimization algorithm in which the candidate solutions are represented by a set of ordinal pairs. However, the causal matrix is a type of much more complex set. e elements in I(Activity) and O(Activity) are crisp sets, such as }. erefore, the traditional set-based mutation operators cannot be directly used in this work. In Ou-Yang's method [9,10], the mutation operator randomly selects ingredients from three individuals, and then use them to update the target individual to obtain a donor individual. e advantage of the method is that some good ingredients (e.g., parallel structure) can be directly transplanted to the target individual, which can improve the searching ability of the GeneticMiner. However, Ou-Yang's method cannot be directly employed in this work because the proposed algorithm is entirely based on the DE algorithm. In other words, it needs more flexible mutation operators.
is section introduces two novel operators, which allow the proposed algorithm to perform differential mutation on the causal matrix. e definitions of the two operators are given below.

Definition 5 (Minus Operator between Two Sets). Given a causal matrix CM � (A, C, I, O) and two sets
Definition 6 (Plus Operator between Two Sets). Given a causal matrix CM � (A, C, I, O) and two sets S 1 , S 2 ∈ P(P(A)). en, S 1 + S 2 is defined as where ∪ denotes a generalized union operation and e ′ /∪S 2 means that it removes the elements in ∪S 2 from e ′ . It is easy to find that the plus operator will keep the elements in S 2 . e reason behind this consideration is that S 2 in formula (4) is, in fact, the result from formula (3) (a.k.a. the difference of two sets). By this way, it can greatly change the structure of S 1 and enhance the potential diversity of a trial individual. Figure 3 gives an illustrating example. Given three sets which represent three distinct input sets of activity D, In terms of the definition of mutation in DE (see formula (1)), it requires to firstly calculate the difference of S 2 and S 3 and then add the difference to S 1 . Based on Definitions 5 and 6, we can get Note that the scale factor F is not used in the mutation.

Crossover.
e aim of crossover is combining a donor individual and a target individual to generate a trial individual. e trial individual would take the place of the target individual if its fitness is higher than the target individual.
ere are two kinds of popular crossover methods, which are the exponential and the binomial. I employ the latter in this work. e pseudocode of the binomial operator is shown below. "Cr" is called the crossover rate. e binomial crossover is performed on each of the activity node whenever a randomly generated number "rand" between 0 and 1 is less than or equal to the "Cr" (Algorithm 2). e "r" is a randomly chosen index, which ensures that the trial individual can get at least one component from the donor individual.

Repair.
As is known to all, individuals obtained in the iterative process of evolutionary algorithms are always inconsistent. For example, it is possible to get a trial individual does not contain activity "E." Besides that, the input of the "start" activity as well as the output of the "end" activity may be not empty.
erefore, a repair operation is required to be performed on the donor individual as well as the trial individual. In the GA-based process mining algorithms, such as GeneticMiner and ProDiGen, the repair operation is simple because the crossover as well as the mutation are performed on a designated point. e repair operation in this work is much more complex because the mutation as well as the crossover operation are performed on all nodes in a causal net. In other words, it is required to repair all nodes in a causal matrix. Before introducing the algorithm of repair, I firstly give the definition of consistence for a causal matrix.
e pseudocode of repair is given in Algorithm 3. Steps 1-6 are in charge of repairing the "start" node as well as the "end" node. It firstly lets the I ind (start) and O ind (end) be empty. en, it would generate a new input set (output set) for O ind (start) (I ind (end)) if they are empty. From Steps 7 to 19, the algorithm goes through all nodes in the causal matrix and repair I ind (t 1 ) and O ind (t 1 ), respectively. ere are two choices in the repair operation. Take the repair of the input of t 1 for an example; if the output of t 2 does not contain t 1 , it may randomly add t 1 to O ind (t 2 ) or remove t 2 from I ind (t 1 ). By this way, we will finally obtain a consistent individual.

Guided Local Exploration.
Although evolutionary algorithms, including the GA algorithm and the DE algorithm, have strong ability of global search, all of them suffer from the problem of premature convergence. In [14], van Eck et al. proposed a local exploration method based on alignment. In the method, a A * algorithm is employed to find the optimal alignment between a process model and an event trace.
By this way, it can find out the abnormal areas in the process model. However, the alignment-based local exploration has two drawbacks. Firstly, it can only locate the abnormal area but not a node, which is too coarse to guide the exploration. Secondly, although the technique can accelerate the convergence of the GA algorithm, the execution time of the A * algorithm is so long, which makes the total execution time unbearable.
is paper proposes an efficient and simple method to guide local exploration, which can help DEMiner skip out (1) Add the activities A into the trial individual (2) Generate a random number r between 1 and length(A) ALGORITHM 2: Pseudocode of the binomial crossover.
Complexity 7 the local optimum and move forward to the global optimum. e method is based on token-based log replay, which is also employed in evaluation of the process model (a.k.a. causal matrix) in this work. e original algorithm for parsing a log on a causal matrix only records three types of information, which are "allParsedActivities," "allMissingTokens," and "allExtraTokensLeftBehind." e allParsedActivities denotes the total number of activities which are correctly parsed, the allMissingTokens denotes the number of missing tokens in all event traces, and the allExtraTokensLeftBehind denotes the number of tokens that are not consumed after the parsing. In our method, besides those, it requires to record the nodes that where the parsing errors happen, including miss tokens during the parsing or left behind tokens after the parsing. By this way, it can achieve fine-grained evaluation of all nodes.
It is easy to find that the proposed method has several advantages. Firstly, it can accurately locate the abnormal nodes and improve the efficiency of the local exploration. Secondly, the time complexity of the proposed method is much lower than the alignment-based method because it does not need extracomputation. e evaluation of nodes can be finished along with the evaluation of the individual. e formulas of fine-grained evaluation are given below, in which C i f (t) represents the score of I ind (t) and C o f (t) represents the score of O ind (t): e pseudocode of the guided local exploration is shown in algorithm 4.
Step 2 employs a roulette wheel strategy to randomly select a node for local exploration. e node with a lower score has great probability to be selected.
Step 3 randomly selects a direction for exploration, i.e., "input" or "output." Steps 4-28 randomly choose a mutation operation, including randomly add an arc to the node, randomly delete an arc from the node, and randomly redistribute the structure of the node. An example for illustrating the redistribution operation is that given } after the redistribution. In the algorithm, a taboo list is used to record the history of local exploration. Some operations which are useless (a.k.a. cannot make the individual move forward) are recorded in the taboo list. By this way, it can improve the efficiency of the local exploration.

Experiments
In this section, I give the experiments as well as the analysis. e experiments are focused on two aspects. e first is to evaluate that whether the DE algorithm and the guided local exploration are efficient to accelerate the convergence speed of the proposed algorithm. e second is to evaluate the performance of the proposed algorithm (a.k.a. the DEMiner). Next, the event logs used in the experiments will be introduced.

Event Logs.
In the experiments, 66 event logs were used for evaluation of the proposed algorithm. e event logs can be classified into three groups. e first group contains 22 artificial event logs which are from [8,19] and can be downloaded from https://svn.win.tue.nl/repos/prom/DataSets/ GeneticMinerLogs/. e description of the event logs is shown in Table 2. e process models that generate these logs include different structures, such as sequence, choice, parallelism, loops, and invisible tasks. ese process models, represented as Petri nets and heuristic nets, can be found in [19].
ALGORITHM 3: Pseudocode of repair of a candidate individual.

Complexity
In the event logs, the traces with same event sequence are grouped together. e second group, which is used to evaluate the antinoise ability of the DEMiner, contains 44 event logs. ese event logs were generated based on the first group of event logs, which contain 5% and 10% noise, respectively. ree different types of operations for noise generation were used, including randomly add an event into a trace, randomly delete an event from a trace, and randomly swap two adjacent events in a trace. To incorporate noise, the traces of the original noise-free logs were randomly selected and then one of the three noise types was applied and each one with an equal probability of 1/3. e third group includes two real event logs, both of which are downloaded from https://data.4tu.nl/repository/ collection:event_logs. e first event log which was named "BPI2013cp" records the process information from the Volvo IT problem management system. It includes 1487 traces as well as 6660 events. e second event log which was named "Sepsis" records the events of sepsis cases from a hospital ERP system. It includes 1050 traces as well as 15214 events.

Convergence Speed and Running Time.
is section introduces the experiments for evaluation of the efficiency of the DE algorithm as well as the guided local exploration method.
is section evaluates whether the DE algorithm and the guided local exploration are efficient to accelerate the convergence speed of the proposed algorithm. Four different strategies are considered in the experiment, which are the DE algorithm without local exploration (denoted as DE), the DE algorithm with random local exploration (denoted as DE + Random Search), the DE algorithm with guided local exploration (denoted as DE + Guided Search), and the GA. It should be explained that (1) the random search used in the second strategy is the genetic mutation in the GeneticMiner, and (2) the third strategy is the DEMiner proposed in this work, and (3) the GA algorithm follows the framework in this work but uses the genetic operators of the GeneticMiner. ree metrics are employed in the experiments, which are completeness, precision, and generation (i.e., number of iterations). To avoid the inaccuracy of the experimental results caused by randomness, each algorithm was run 10 times and the average value of each metric as well as its standard deviation is calculated. e first group of event logs was used for the evaluation. e computer for experiments is equipped with a 2.5 GHz CPU and 8 GB memory. e parameters setting is shown in Table 3. It should be explained that the population size is the number of activities multiplied by 1∼2. e parameter "MF" is set to 0.7 because the local exploration is not hoped to be involved in the search too early. e parameter "DF" is set to 0.2 which is used for detection of premature convergence. In fact, a slight change of these parameters, (1) foreach ind in population do (2) selected ⟵ randomly select an activity from ind (3) mutationType ⟵ randomly select a direction for exploration (4) if rand < 1/3 do // randomly add an arc (5) if mutationType � "input" do (6) precursor ⟵ select a precursor of selected activity which is not in I ind (selected) (7) arc ⟵ <precursor, selected> (8) else (9) successor ⟵select a successor of selected activity which is not in O ind (selected) (10) arc ⟵ <selected, successor> (11) if add arc is allowed in taboo list do (12) Add arc into ind (13) else go to 5 (14) else if rand < 2/3 do // randomly delete an arc (15) if mutationType � "input" do (16) precursor ⟵ select an activity from I ind (selected) (17) arc ⟵ <precursor, selected> (18) else (19) successor ⟵ select an activity from O ind (selected) (20) arc ⟵ <selected, successor> (21) if delete arc is allowed in taboo list do (22) Delete arc from ind (23) else go to 15 ALGORITHM 4: Pseudocode of the guided local exploration. Complexity e.g., the MF is set to 0.6∼0.8 and the DF is set to 0.1∼0.2, would not affect the performance of the algorithm, including the quality of mining results as well as the convergence speed. e experimental results are shown in Table 4. For completeness, it can be seen that the four algorithms (from left to right in Table 3) always achieve completeness 1.00 on 1 event logs, 10 event logs, 20 event logs, and 12 event logs, respectively. e result demonstrates that the "pure" DE algorithm cannot achieve the best model; a.k.a., it always falls into local optimum. en, for precision, the "DE" algorithm achieves the best precision on just one event log. Except that, the "DE + Random Search" and the "GA" achieve the best precision on 2 event logs, and the "DE + Guided Search" achieves the best precision on 14 event logs stably. From the two metrics, it can be seen that the "DE + Guided Search" plays much better than other three strategies. For generation, it requires to exclude the "DE" algorithm because it always suffers from premature convergence. Among the remaining algorithms, it is obvious that the "DE + Guided Search" has the fastest convergence speed and the "GA" has the slowest convergence.
To illustrate the time complexity of the DEMiner, the running time of "DE + Guided Search" is also recorded, which is shown in Figure 4. From the figure, it can be seen that the minimum running time is about 3 seconds ("a6nfc") and the maximum running time is about 80 seconds ("bn3").
is proves the time performance of the DEMiner. Based on the above results, some conclusions can be drawn. (1) e "DE" algorithm always falls into local optimum. (2) e "DE + Random Search" and the "GA" can discover process models with similar quality, but the former has faster convergence speed than the latter. Note that both of the two algorithms use random search (i.e., the genetic mutation). e difference is that the "DE + Random Search" employs the DE algorithm. It proves that the DE algorithm can quickly approximate the optimal solution and accelerate the convergence speed. (3) Based on the experimental results of the "DE + Random Search" and the "DE + Guided Search," it can be seen that the latter achieves much better results than the former. is explains that the guided local exploration is efficient to help the DE algorithm skip out the local optimum and improve the searching ability of the DEMiner.

Setup.
is section compares the performance of the DEMiner with three popular process mining algorithms.
rough the experiments, I want to evaluate the performance as well as the antinoise ability of the DEMiner. e selected process mining algorithms for comparison include Heuristics Miner (HM) [20], ILP Miner [5], and ETM Miner [12]. Among these algorithms, HM is a popular tool of process mining, which outputs a heuristic net as a mining   result, and ILP Miner as well as ETM are two state-of-the-art algorithms in the field of process mining. ProM 6.9 which is the most popular process mining platform was used in the experiments [21]. Parameters of the three algorithms were set to default. Because ILP Miner and ETM output a Petri net and a process tree as a mining result, respectively, the obtained models must be converted to causal matrices based on Definition 4. By this way, the four algorithms can be evaluated by a unified way. Specially, if the outputting model contains invisible transitions (e.g., the ETM), it would be converted to causal matrices by hand. Four metrics defined in [8,19] were used to evaluate the algorithms in the experiments. e metrics include behavioral precision (Bp), behavioral recall (Br), structural precision (Sp), and structural recall (Sr). e Bp and Br are based on the parsing of an event log by the mined model and the original model. e former detects how much behavior is allowed by the mined model that is not by the original model. e latter detects for the opposite. Moreover, the closer the values of Bp and Br to 1.0, the higher the similarity between the original and the mined models. e Sp and Sr metrics are based on the causality relations of the mined and original models, the former detects how many causality relations the mined model has that are not in the original model, and the latter detects for the opposite. Different from the Bp and Br, the Sp and Sr measure the similarity from the structural point of view. When the original model has connections that do not appear in the mined model, Sr will take a value smaller than 1, and, in the same way, when the mined model has connections that do not appear in the original model, Sp will take a value lower than 1.0.

Noise-Free Event Logs.
First of all, the experiments were performed on 22 noise-free event logs. e results are listed in Table 5, in which the best results on each log are in italics. From the results, it is easy to find that the performance of the DEMiner is slightly better than other three algorithms.
e DEMiner achieves the optimal solutions (i.e., four metrics are equal to 1) on 18 event logs. e HM, ILP Miner, and ETM achieve the optimal solutions on 16 event logs, 14 event logs, and 5 event logs, respectively. To compare the four algorithms more intuitively, a combinatorial metric, called average f-score, is designed, which is shown as follows: e results are shown in Figure 5. From the figure, it is easy to find that the DEMiner only lost to other algorithms on four event logs, which are "a5," "a6nfc," "a7," and "h6p18." Except that, the DEMiner obtained the best results on the remaining 18 event logs. Moreover, the average f-  score achieved by the DEMiner on the rest 4 event logs is over 0.9. It demonstrates that the DEMiner has good performance. Later, a deep analysis is given. ere are four event logs that the DEMiner could not achieve the best results, which are "a5," "a7," "h618," and "h6p36." e mining results which were repeated most often are shown in Figure 6. In the figure, the incorrect parts have been labeled by red color. Moreover, the dotted lines denote the missing arcs (a.k.a in the original model but is not discovered by the DEMiner), and the solid line denotes the incorrect connections (a.k.a. structural errors).
In Figure 6(a), it can be seen that the mined model lacks a cycle <E, E>. In the original model, there are two cycles on the node "E." e reason behind this phenomenon is that the operators of differential mutation proposed in this work (Definitions 5 and 6) will remove such structure during the evolution. Assume S1 � {{E}, {E}, {B, C}} and S2 � {{E}, {B}},   it is easy to find that the two sets will obtain the same completeness value, but the precision value of S1 is lower than that of S2 (the former has more enabled activities). In other words, S1 would be replaced by S2 during the evolution. Similarly, this phenomenon also appears in the event log "h6p18." Next, in the mined model of "a7," the input set of node "D" is {{2, 4},{3, 7, 8}}, which is {{2}, {3, 7},{4, 7, 8}} in the original model. e reason behind this incorrectness is also the differential mutation operations. It is easy to find that removing the intersection part of two sets is a high probability event in the proposed algorithm. Assume two sets S1 � {3, 7} and S2 � {4, 7, 8}, S1 + S2 � {{3, 7}, {8}} in terms of Definition 6. It can be seen that the intersection of two sets is removed. From above analysis, it can be found that the proposed method could not achieve the best results in some rare cases (e.g., two circles on a same node and existing redundant structures). However, from another perspective, it shows that the proposed method prefer the model with low structural complexity. For "h6p36," the four algorithms discovered the same model shown in Figure 6(d). e mined model lacks two arcs <KB, NB> and <KA, NA>, which exist in the original model (heuristic net) [19]. rough analysis, I find the original heuristic net is incorrect. From the CPN model of "h6p36," it can be seen that the model just has two parallel paths starting with "KA" and "KB," respectively. However, the giving heuristic net has two arcs, i.e., <KA, NA> and <KB, NB>, which may lead to two nonexistent paths {Start, KA, NA, End} and <Start, KB, NB, End>. erefore, the mining results of four algorithms can perfectly fit to the event log in fact.

Noisy Event
Logs. Next, the experiments were performed on 22 event logs with 5% noise. e experimental results are shown in Table 6. e performance of the HM degrades significantly. Noise also affects other two algorithms, but their performance degradation is smaller than the HM algorithm. Moreover, the ETM can also discover the optimal process models on "paral5," which did not achieve the best result before. Similarly, the average f-score of the four algorithms are calculated (see Figure 7). It can be seen from the figure that the performance of DEMiner slightly degrades and it achieves the best results on 13 event logs. Moreover, the average f-score of the DEMiner is between 0.8 and 1.0.
Later, the experiments were performed on the event logs with 10% noise. e experimental results are listed in Table 7, and the average f-score of the four algorithms is shown in Figure 8. From the table, we can see that the DEMiner achieves the best results on 14 event logs, and the other three algorithms (from left to right) achieves the best models on 2, 7, and 4 event logs, respectively. We can see that the ETM discovers the optimal model on "a7" and "a8," which contain 10% noise, but it does not find the optimal model on the same logs that with 5% noise. is is because the two logs are independent; i.e., the inserted noise may be totally different. rough careful comparison of Figures 7 and 8, it can be found that the performance of the DEMiner dose not degrade significantly, and it keeps in a stable level. Based on the experiments on two groups of event logs with noise, a conclusion can be drawn that the DEMiner has good antinoise ability. However, we should notice that the DEMiner cannot discover the optimal solutions on all event logs with noise. is phenomenon demonstrates that the DEMiner cannot yet avoid noise interference.

Performance on Real Event Logs.
is section shows the performance of the DEMiner on two real event logs, i.e., "BPI2013cp" and "Sepsis." In the experiments, a "start" event as well as an "end" event was added to each trace at the running time. It ensures that each path has the same "start" node and the "end" node. To eliminate the randomness of evolution, the DEMiner was executed 10 times on each event log. e processes of evolution on two event logs are shown in Figures 9 and 10   can be seen that the DEMiner converges fast on two event logs. Furthermore, three metrics were employed to evaluate the efficiency of the DEMiner, which are alignmentbased fitness [22], alignment-based precision [23], and the combined f-score [24]. e calculating results, including the average value and the standard deviation, are listed in Table 8. Compared with the results given in [24], it is easy to find that the results obtained by the DEMiner is slightly lower than the result obtained by the ETM on "Sepsis," and the results obtained by the DEMiner is better than the result obtained by the ETM on "BPI2013cp". is proves that the DEMiner can play well on the real event logs.

Conclusions
is paper proposes a new process mining algorithm, named DEMiner. e proposed algorithm is based on a hybrid evolutionary strategy, which consists of a set-based DE algorithm and a guided local exploration algorithm. Meanwhile, some techniques are employed to improve the efficiency of the DEMiner, such as gene bank, taboo list, and consistence repair. To evaluate the performance, 68 event logs were used in the experiments. Some conclusions can be drawn based on the experimental results: (1) rough comparison of the four different strategies (i.e., DE, DE + Random Search, DE + Guided Search, and GA), the "DE + Guided Search" outperforms the rest strategies and it can achieve the best quality solution as well as the fastest convergence speed. Moreover, the results prove that the DE algorithm can rapidly approximate the optimal solution, but it always suffers from premature convergence. e guided local exploration can help the DE algorithm skip out the local optimum and improve the efficiency of the proposed algorithm.
(2) rough comparing the performance of DEMiner with three popular process mining algorithms (i.e., HM, ILP Miner, and ETM) on 22 noise-free event logs and 44 noisy event logs, it shows that the DEMiner can achieve the best result on most of the event logs. Furthermore, based on the experimental results on 2 real event logs, it can be concluded that the DEMiner can work well on real-world events logs. is proves the effectiveness and the efficiency of the proposed algorithm.
However, we can see that the DEMiner did not successfully discover an optimal process model from any one of the 44 noisy event logs. is demonstrates the drawback of the DEMiner. Besides that, it is hard for the DEMiner to discover some rare structures, such as two circles on a node. is is the future work for us.

Data Availability
e event logs used to support the findings of this study are included within the article, which can be downloaded from https://svn.win.tue.nl/repos/prom/DataSets/ GeneticMinerLogs/. e corresponding process model can be found in [19].

Conflicts of Interest
e author declares that there are no conflicts of interest.