Automatic Emergence Detection in Complex Systems

. Complex systems consist of multiple interacting subsystems, whose nonlinear interactions can result in unanticipated (emergent) system events. Extant systems analysis approaches fail to detect such emergent properties, since they analyze each subsystem separately and arrive at decisions typically through linear aggregations of individual analysis results. In this paper, we propose a quantitative definition of emergence for complex systems. We also propose a framework to detect emergent properties given observations of its subsystems. This framework, based on a probabilistic graphical model called Bayesian Knowledge Bases (BKBs), learns individual subsystem dynamics from data, probabilistically and structurally fuses said dynamics into a single complex system dynamics, and detects emergent properties. Fusion is the central element of our approach to account for situations when a common variable may have different probabilistic distributions in different subsystems. We evaluate our detection performance against a baseline approach (Bayesian Network ensemble) on synthetic testbeds from UCI datasets. To do so, we also introduce a method to simulate and a metric to measure discrepancies that occur with shared/common variables. Experiments demonstrate that our framework outperforms the baseline. In addition, we demonstrate that this framework has uniform polynomial time complexity across all three learning, fusion, and reasoning procedures.


Introduction
Complex systems usually consist of multiple subsystems, whose nonlinear interactions can cause unpredictable and disastrous outcomes.However, it is intractable to analyze all possible outcomes in complex systems directly due to the combinatorial nature of this problem.Extant analysis approaches often build separate models for all subsystems and make conclusions about the entire system by linearly aggregating individual analysis results.This approach, although simple, cannot model emergence of complex systems.In fact, emergence is one of the most challenging concepts of complex systems.However, there exists significant discrepancy about the nature of emergence.Some researchers, such as Mill and John [1] and Broad [2], model complex systems using a layered approach, in which the world consists of different strata.Per this approach, higherlevel emergent properties result from lower-level causal interactions.Others, such as Wears et al. [3], study emergent properties by predictive approaches and claim that emergent properties are system level features that could not have been anticipated.Per this definition, emergent properties are those that cannot be predicted even by individuals who possess thorough knowledge of the parts of this system.Popper and Eccles [4] relate emergence to unpredictability by studying the nondeterminism within complex system.Another viewpoint identifies a spectrum of approaches to emergence.Bedau [5] distinguishes between weak and strong emergence.Per his definition, weak emergence can be derived from the knowledge of the system's microdynamics and external conditions but only by simulation.Strong emergence, on the other hand, cannot be derived even by simulation.
No matter how emergence is defined, the consensus among these definitions is that emergence stems from the interaction of subsystems of a complex system.To model subsystem interactions and detect resulting emergence in a complex system, we need to model this complex system first.Extant complex systems modeling techniques can be classified into three groups: (1) subject matter experts (SMEs) manually analyze system dynamics and create a descriptive model, such as the model in [3]; (2) experts simulate system dynamics via agent-based complex systems model (ACS), such as the model in [6]; and (3) data scientists collect data about individual subsystems, learn subsystem models from data via machine learning approaches, and integrate subsystem models via ensemble methods, such as the model in [7].The first approach is only useful to perform postevent analysis, since SMEs can only manually analyze event related scenarios from all possible scenarios, whose number is combinatorial.The second approach requires that experts manually build behavioral models for each agent and set up proper parameters.It is both time-consuming and expensive to build such models for large-scale complex systems.The third approach, even though easy to apply and suitable for large complex systems, cannot detect emergence, because ensemble methods integrate subsystem models by their outputs, neglecting their interactions among shared/common variables.
To overcome the drawbacks in extant approaches, we need a new framework that can automatically build complex system models from data, can detect emergence, and can scale to large complex systems simultaneously.The first requirement for the new framework is learning a complex system model from data automatically.Given a single dataset drawn from the entire complex system, we could simply learn a single model in hopes of capturing all interactions and then detect emergence within it.However, since large-scale complex system usually consists of multiple (loosely) coupled (possibly competing) subsystems, it is impractical (and likely infeasible) to construct a single dataset which captures all its features and dynamics.We typically only have access to multiple datasets corresponding to different subsystems at best.Due to this limitation, we can only learn a separate model from each dataset for a subsystem and fuse them into one model via shared variables between different submodels.Ensemble methods learn separate models for different subsystems, but ensemble method infers on these models separately and chooses the (weighted) majority opinion as the final opinion.However, the true result may differ from the majority opinion.
We provide an alternative definition of emergence in complex systems derived as follows: Given some target variable, we query its state on the subsystem models learned from corresponding datasets and group their opinions into majority and minority sets.Then we observe its state at the entire system level.If its true state (observed over the entire system) is different from the majority opinion given by subsystems, we consider this situation as emergent.This is like the one given by predictive approaches in that it "cannot be predicted even by individuals who possess thorough knowledge of the parts of this system." As such, based on the existence of majority and minority opinions, we can define emergence as composed of four types.If all subsystems form a unanimous opinion, and the true result differs from it, we call it Type 1 emergence.If both majority and minority opinions exist, but the true result differs from both opinions, we call it Type 2 emergence.If both majority and minority opinions exist, and the true result is consistent with the minority opinion, we call it Type 3 emergence.If only minority opinions exist, but the true result differs from all minority opinions, we call it Type 4 emergence.This emergence definition is complete for a complex system with an arbitrary number of subsystems, if each subsystem can provide a valid opinion about the queried target.However, if some subsystem cannot provide direct opinion on target variable but can provide opinion about variables which also exist in other subsystems, its opinion will impact other subsystems' opinions about target variable in an implicit manner.Even worse, if such feedback exists among these subsystems, we will not reach a conclusion easily.Such complex scenario will be studied in the future.
In this paper, we describe our approach to modeling and detecting emergence in complex systems according to our proposed definition of emergence.In brief, we first learn subsystem dynamics through a probabilistic graphical model called Bayesian Knowledge Bases (BKBs) [8] from observations on each subsystem.Then we fuse these BKBs into one BKB via the BKB fusion algorithm [9], which includes interactions among subsystems both probabilistically and structurally sound.We name the fused BKB as FBKB.Lastly, we perform belief updating on the fused BKB (FBKB) to detect emergence in this complex system.The entire framework, which consists of learning, fusing, and reasoning blocks, is named as the Bayesian Knowledge Fusion for Complex System (BKFCS).
Experiments on synthetic datasets show that our proposed method can detect emergence over extant approaches.We also show that our proposed algorithm has polynomial time complexity for all three phases of learning, fusion, and reasoning.
The contribution of this paper is twofold: (i) It defines four types of emergence in a complex system based on deviations from majority and minority opinions observed from each subsystem, derived from observations/datasets of its subsystems.This quantitative data-driven emergence definition is different from extant descriptive definitions of emergence in that it sets up a concrete boundary between different kinds of emergence.This unique quantitative approach is the first of its kind to the best of our knowledge.
(ii) It designs an automatic emergence detection algorithm based on supervised machine learning techniques.This framework is built upon a probabilistic graphical model named Bayesian Knowledge Base, which not only detects the four types of emergence, but also traces back variable interactions resulting in emergence.
The rest of this paper is organized as follows: We begin with brief backgrounds on Bayesian Knowledge Bases (BKBs), learning BKBs from data, multiple BKB fusion, and belief updating on BKBs.These are the principle components used in our detection framework and algorithm.Next, in Section 3, we formally define emergence in complex systems according to our four proposed types, provide an illustrative example of a complex system, apply our emergence detection framework to this example, explore factors underlying emergence in complex systems with respect to our framework, Complexity 3 and briefly recap the framework and its operation.Having established our framework, we detail our experiments and analyses on synthetically generated complex systems testbeds against extant approaches and our proposed factors and measures.

Background
This section first introduces Bayesian Knowledge Bases (BKBs), the building block of our proposed framework.Next, we summarize the BKB learning approach for subsystem data and describe how to fuse such multiple BKBs into one fused BKB, which represents subsystem interaction that can cause emergence.Lastly, we present how to run belief updating on a fused BKB and its role in emergence.

Bayesian Knowledge Bases (BKBs).
Before introducing BKBs, we would like to provide some intuitions behind our choice of building blocks for our framework.Researchers have proposed various methods and modeling strategies to explore different aspects of complex system.In this paper, since the research objective is to detect and explain emergence in complex systems, we opted for probabilistic graphical models, which are powerful tools to explore variable relationships and provide quantitative explanations.In fact, probabilistic graphical models such as Bayesian Networks (BN) [10] and Markov Random Fields (MRF) [11] have been widely applied to model causal relationships and/or interactions among variables in a system.Many researchers also proposed different methods to learn a Bayesian Network or Markov Random Field from data [12][13][14][15][16][17][18][19][20].
However, neither BNs or MRFs will serve our purpose well.In a MRF, variable connections are undirected, which cannot provide a causal relationship.However, one of our goals is to understand causal relationship in emergence.For BNs, extant methods of fusing multiple BNs into one BN have several drawbacks.First, if two BNs include contradictory information about variable causality direction, extant fusion algorithm requires compromise and consensus regarding this direction [12,13], which results in unrecoverable information loss.Second, if two BNs contain incompatible variable distributions, a new distribution is created by merging them [12].Unfortunately, this new distribution no longer represents the observed causal relationships found in the subsystems.
To solve these problems, we apply Bayesian Knowledge Bases [8] into our emergence detection framework.BKBs are an alternative to Bayesian Networks (BNs), by specifying dependence at the instantiation level (versus BNs that are specified only at the random variable level); by allowing for cycles between variables; and by loosening the requirements for specifying complete probability distributions.Figure 1 illustrates a simple BKB.
In general, a BKB  is specified by a set of I-nodes I (instantiation nodes, rectangles), a set of S-nodes S (support nodes, circles), and edges E between I and S, namely, the tuple {I, S, E}.In a BKB, a variable is called a component (denoted as ).A BKB does not include an icon for a component; instead it represents all instantiations/states of a component with multiple I-nodes.This is different from a BN, which represents a variable/component with a single icon.In Figure 1, there are two components,  and .Each component can take two states, Yes and No.An I-node    is noted as a rectangle, and it represents the th state of the th variable.In this example, an I-node  1  1 :  = Yes corresponds to the first rectangle in the first row with remaining I-nodes  1  2 :  = No,  2 1 :  = Yes, and  2 2 :  = No, respectively.An S-node is represented as a circle, and it contains a value for some prior or conditional probability.A directed edge connects an S-node and an I-node, which represents direct conditional dependency between the single immediate I-node descendant of the S-node (also called its head, denoted as head  ()) and the immediate I-node predecessors (also called its tail, denoted as tail  ()).The conditional probability (head  () | tail  ()) is denoted as Pr().In the example, the I-node  2  1 :  = Yes is the descendant or head of the S-node with value 0.01, and the I-node  1  1 :  = Yes is a predecessor or tail of the same S-node.This connection  1  1 → 0.01 →  2 1 represents the conditional probability ( = Yes |  = Yes) = 0.01.If an S-node only has a descendant but no predecessor, the connection from it to its descendant represents the prior probability.In the example, one such connection is 0.02 →  1  1 , representing that ( = Yes) = 0.02.
The set of components which set {tail  ()} belongs to is the parent component set of I-node head  (), noted as Pa(head  ()).In the example, component  is the parent set of the I-node  2  1 and the I-node  2 2 .This relationship is similar to that in a BN, where all states of one variable have the same set of parent variables.In a general BKB, however, different states of a component can have different parent variable sets.
2.2.Subsystem Learning from Data.This subsection describes a BKB learning algorithm, inspired by extant BN learning algorithms.
The first step of building our Bayesian Knowledge Fusion for Complex System (BKFCS) emergence detection framework is to learn a probabilistic model from subsystem data.In machine learning literature, scoring function-based methods have been widely applied in BN learning problem.Scoring functions can be classified into two categories: information theory-based scoring functions and Bayesian scoring functions [14].
Instead, we propose a modified scoring function designed for learning a BN-like BKB and a greedy algorithm to learn a BKB from a given dataset.This algorithm learns a BKB that maximizes the scoring function (1) given dataset , assuming it contains  cases and  variables/features, and each feature/component   ,  ∈ [1, ], has (  ) states/Inodes.The notation () means the number of cases in which condition "" holds.The penalty constant  is set to 0.01 in our algorithm.This function consists of two parts: the first part computes the log likelihood of BKB given dataset , and the second part is the penalty for complexity and overfitting, which is proportional to the difference between number of possible S-nodes and number of S-nodes that appear in the BKB.The difference between existing scoring functions and our proposed function is that in the penalty term (the second part), MDL, BIC, and AIC only penalize network fitness by total number of parent-child patterns, namely, (  )∏ ∈{Pa(  )} ().
We have also learned BKBs using Bayes, BDeu, MDL/BIC, and MIT (entropy) and AIC scoring functions and tested their performance against BKBs learned by our proposed function on thirteen UCI datasets.We choose these five popular scoring functions whose usefulness has been widely tested and validated.Their average accuracies are 84%, 83%, 82%, 70%, and 85%, respectively (details in Table 12).Our scoring function achieves 85% average accuracy on the same testbed (details first column in Table 13).It turns out that our function can outperform four of five scoring functions and has comparable performance with AIC.However, BKBs learned using AIC tend to result in simple structures.Even though simple BKBs based on AIC scores can perform equally well in classification tasks compared to BKBs learned based on our proposed method, a BKB learned from AIC score drops variable interactions within a BKB and across BKBs.Without sufficient interactions across BKBs, a BKB learned from AIC score reduces the capability to detect emergence.As such, we cannot use AIC.
In general, learning a BN or BKB from data is NP-hard; therefore we make several tradeoffs to achieve polynomial time complexity.A detailed complexity analysis is provided in Appendix A. In the worst case, the time complexity of the entire learning algorithm is (100 * 1000 *  2 ) = ( 2 ), where  is the number of variables and  is the number of cases.The other two constants are explained in the Appendix.
To test its performance against other kinds of models on a general supervised classification task on the same testbed, we compare BKB classifier's performance with a wide range of popular classifiers: Adaboost [21], Bayesian Network [22], Sequential Minimal Optimization (SMO) [23], logistic regression [24], and decision tree [25].Experiment results show that our classifier has comparable accuracy.Since learning a BN-like BKB is not the central contribution of this paper, these results are detailed in Appendix A.

Subsystem Probabilistic Fusion
. This subsection describes how to fuse multiple BKBs learned from multiple subsystem related datasets into one fused BKB (FBKB) that represents the entire system dynamics.
To fuse multiple BKBs, we apply the BKB fusion algorithm developed by Santos Jr. et al. [9].This resulting fused BKB (FBKB) is the Knowledge Base that the BKFCS framework will reason on.
We design another BKB in Figure 3, which contains the same set of variables, but different probabilistic distributions with BKB 1 in Figure 1.Then we get a fused BKB in Figure 2 by fusing BKB 1 and BKB 2 from Figure 3.Briefly, the idea is to associate each component from each BKB with a special component named as source fusion component.In this example, there are two such components: src and src, and each has two source I-nodes: "tom" and "john."Each source I-node connects to an I-node via all S-nodes pointing to it.Each source I-node also has one S-node that points to it, representing the reliability/weight of its source.In this example, this weight is 0.5/0.5, meaning that two sources "tom" and "john" are equally reliable.
This source fusion component is the glue that connects variables from different subsystems together.Therefore, it fuses BKBs from various subsystems at the variable instantiation level.In this way, fusion not only computes inferences originated from each subsystem, but also computes new  inferences generated by subsystems interactions through their shared variables.The accumulated probability of these new inferences contributes to detection of emergence.Fusion also preserves the distributions and variable relationships in the base subsystems without loss of information.In general, a fused BKB cannot be represented as a BN since both cycles and different parent I-node combinations can occur for each target I-node drawn from the different BKBs being fused together [9].We provide the details of the BKB fusion algorithm in Appendix B. The time complexity of BKB fusion algorithm is also polynomial.In particular, its worst-case complexity is (|  |+ |  |+|  |), where |  | is the number of I-nodes in all subsystem BKBs, |  | is the number of S-nodes, and |  | is the total number of edges/arcs.Please refer to Appendix B for details.

Belief Updating for Emergence
Detection.This subsection describes an efficient belief updating algorithm on the FBKB and briefly demonstrates how to detect emergence and perform general classification tasks at the same time.
In general, performing belief updating on a BN or a BKB is an NP-hard problem.It is also NP-hard to find an approximate solution [26].Bayesian belief updating involves computing the probability that target variable Tar takes a certain state  based on an observation that some other feature variables take certain states.It is denoted as (Tar =  | Evidence), where Evidence is a set of observed feature variables instantiations.Since it is proportional to the joint probability (Tar = , Evidence), we only compute this joint probability.We can compute this probability by summing up all inferences probabilities which are consistent with Evidence and Tar = .Exact inferencing simply enumerates all inferences, picks out consistent ones, and sums their probabilities as the joint probability.
If we do belief updating on BKB 1, BKB 2, and their fused BKB, we will discover emergence.As a simple demonstration of the belief updating procedure, we first name all S-nodes of the three BKBs in Table 1.Notice that, in this example, each pair of S-nodes sums up to 1, so only half of all S-nodes need to be marked.Based on these marks and belief updating rules, we compute variable 's state probability in the three BKBs, as shown in Table 2.In the last two rows, the constant 0.25 is the product of two source fusion variable priors, namely, 0.5 * 0.5.In fact, since two sources have equal weights, and the constant appears in all inferences, it does not change the relative ordering of 's (two) states' probabilities.
From the last column , we see that, for both BKBs 1 and 2, ( = Yes) > ( = No).In the fused BKB, on the other hand, we see that ( = Yes) < ( = No).This is one type of emergence, which cannot be detected by aggregating separate analyses on the subsystems.This is just a simple Complexity 0.83 0.17 0.01 0.99 example of emergence.For general purpose complex systems, we will fully describe our detection framework through realworld examples and provide the underlying mathematical formulations and solutions.Finally, we note that, in the example fused BKB, the number of inferences doubled compared with that in each single BKB, which is the result of variable interaction.In a fused BKB, there can exist an exponential number of inferences, which makes exact inferencing algorithm extremely demanding with multivariate systems.Instead, we provide a sampling-based approach to approximate the joint probability.To overcome the NP-hard problem, we set up a constant threshold on the number of valid samples we collect before termination.Therefore, our approximation approach has uniform polynomial time complexity and maintains decent performance compared to exact inferencing algorithm.We also compared its running time and accuracy against exact inferencing algorithm and conclude that it is sufficient to serve our purposes for detecting emergence efficiently.In worst case, the time complexity of approximation algorithm is (SV * |Evid|),  ∈  + , while the exact inferencing algorithm is ( SV * |Evid|), where SV is the number of shared variables among subsystems, |Evid| is the number of evidences in a testing case, and  is the average number of states across all shared variables.Details are described in Appendix C.

Automatic Detection of Emergence
This section first formally defines different types of emergence in complex systems and explains the intuitions behind these definitions.Next, it applies our proposed framework on a real-world example about a historical US blackout.Lastly, we analyze some major factors causing emergence in a general complex system and how to detect emergence from data automatically.We briefly summarize our proposed emergence detection framework.

Definition of Emergence in Complex
Systems.We define emergence in complex systems formally in this subsection, which forms the basis for all the following subsections.
As mentioned in the Introduction, emergence is unpredictable system behaviors caused by nonlinear interactions within its subsystems.However, many other reasons can cause unexpected/unpredictable system behaviors.In such cases, those unpredictable behaviors should not be categorized as emergence.To rule out alternative explanations of unexpected behavior or emergent behavior of a complex system, such as due to incomplete information, inconsistent measurements, or inexpert judgments, we make three assumptions about this definition: (i) Assumption one is that all subsystems within a complex system are observed, and their features/behaviors are recorded descriptively and/or quantitatively.This assumption indicates that there is no hidden subsystem or obscured subsystem behavior, which may result in unpredictable behavior in the overall system.(ii) Assumption two is that someone with sufficient expert knowledge can build consistent models based on these observables for each subsystem and analyze subsystem behaviors from the constructed models.In this assumption, "consistent" means that the same modeling technique and logic are applied across all subsystems, and no discrimination is allowed.(iii) Assumption three is that we have access to ground truth about both subsystem and overall subsystem behaviors, so that the emergence definition is based on ground truth, rather than relative metrics influenced due to applied modeling techniques.
In our framework, we require that datasets are available for both subsystems and the overall system and that a maximum likelihood logic is applied in the system behavior modeling.In this way, all three assumptions are satisfied.
Prior work [28] studied an emergent border crossing behavior during the 2009 H1N1 pandemic in Mexico using the BKB framework.In that paper, two types of emergence were defined: strong emergence and weak emergence.However, the BKBs were manually constructed from descriptive data sources.In this paper, we apply a data-driven approach for automatic emergence detection whenever data is available.
Given subsystem data and maximum likelihood logic, we can query about target variable's (Tar's) most likely state in all subsystems.Then each subsystem makes decisions based on their partial knowledge of Tar, learned from the corresponding subsystem dataset.The subsystems' opinions can form multiple sets: a majority opinion set and/or minority opinion set(s).In an extreme case, all subsystems form a unanimous opinion, and there is no minority opinion.In another case, each subsystem has a different opinion from the others' , or each opinion has an equal number of supporters.In this case, there is no majority opinion.
At last we apply the same logic on an overall system dataset to figure out the most likely state of Tar.Intuitively, if there is a majority opinion from the subsystems side, it is expected to coincide with the overall system opinion.Otherwise, we claim this discrepancy as one form of emergence.If there is no majority opinion from the subsystem side, and overall system opinion agrees with one of the minor opinions, it is also accepted.Otherwise, we also claim it as one type of emergence.Based on these intuitions, we illustrate four types of emergence in Table 3.In this table, Sub 1 to Sub 3 represent three subsystems.Whole means the opinion from overall system.Type labels the type of emergence this case belongs to.The states "a", "b", "c", and so forth represent different opinions about Tar from subsystems and/or overall system.
In general, a complex system can have an arbitrary number of subsystems, but three is the minimum number to have all types of emergence.We notice that not all (if any) will occur in a complex system.If Tar is binary, only Type 1 and Type 3 can occur; if it is multinomial, all four types can occur.Furthermore, per this definition, we believe that Type 3 emergence should be observed most often.The condition for Type 2 emergence is harder to meet, so it should occur less frequently.Type 1 and Type 4 are likely rarest as their conditions are most stringent.

Emergence Detection: BKFCS. This subsection details emergence detection through BKFCS.
If we have a dataset about system behaviors under various circumstances, we can apply our BKFCS to detect emergence within the system from data.We also name a system configuration as a case in the dataset.A system configuration refers to a variable-state pair tuple, representing system working status.For instance, if a system has two binary variables,  and , then it will have at most four different configurations, namely,  = Yes,  = Yes;  = No,  = Yes;  = Yes,  = No;  = No,  = No.In the system dataset, which is stored as a two-dimensional matrix format, each row corresponds to one configuration, and each column corresponds to a feature/variable in that system.We also call each row an entry or case of the system.In addition, we assume both subsystem datasets and overall system dataset are available.Therefore, we can set up ground truth for each case.To identify an emergent case against a nonemergent case, we need to label each testing case as emergence or nonemergence based on majority and minority opinions.Assuming that subsystem datasets are labeled as   ,  ∈ [1, #subsystems ], and overall system dataset is labeled as   .We use   to label ground truth of each case, but only provide BKFCS with subsystem datasets   .By comparing its prediction with ground truth, we can measure BKFCS's performance.
To classify a testing case as emergent or nonemergent, we first run belief updating on all BKBs learned from those subsystem datasets.Then we form majority and minority opinions based on individual opinions from all BKBs.Based on these opinions, we know which state of target leads to emergent case and which does not.Next, we perform belief updating again on the fused BKB, which gives probabilities for both emergence and nonemergence states.
To simplify this procedure, we first treat emergence detection as a binary classification problem; namely, all types of emergence cases are viewed as positive, while nonemergence cases are viewed as negative.For each testing case, we compute the accumulated probability of this case being positive  + and the accumulated probability of it being negative  − per function (2).Then we normalize  + and  − into   + and   − and compare them to determine if this case is emergent per (3).In this equation, if the difference th =   + −   − is bigger than a predefined threshold (will be discussed in experiment section), we declare it as emergence.Then we compare claimed result with ground truth to evaluate BKFCS's performance.

An Example of Emergence in Complex System
. This subsection details a real-world emergence example.
We selected the 1996 US west coast blackout [29] as our conceptual demonstration example.On July 2, 1996, a blackout occurred on the west coast of the US, which impacted over two million customers.The first event was a single phase-to-ground fault on the 345 kV Jim Bridger-Kimport line.System protection removed this line from service clearing the fault.Twenty milliseconds later, system protection opened the 345 kV Jim-Bridger-Goshen line due to misoperation of the ground element in a relay at Bridger.Loss of the two lines correctly initiated a remedial action scheme (RAS) that removed two generating units from service.The next event was system protection opening the 230 kV Round Up-LaGrande line due to misoperation of a zone 3 relay at Round Up.These three events together caused a series of disturbances to the entire system and caused overload on other lines, which further brought down more lines offline.
Per incident report [30], "the simultaneous combination of operating conditions on July 2 was not anticipated or studied.The speed of the collapse seen July 2 was not observed in this region and was not anticipated in studies."In fact, due to the combinatorial nature of interactions that could happen in such complex systems, it is impractical to evaluate all combinations in their studies and prevent all possible advert outcomes before they happen.
This incident meets all three assumptions of proposed emergence definition.First, all behaviors and features of each subsystem, which is power supply and delivery system in the case, are recorded.Their designed features are all functional as expected.For each subsystem, its individual purposes, such as line protection, power delivery rebalancing, and overload protection, are all achieved as well.In theory, these measures should be sufficient to protect the entire system from collapsing.In short, this meets the first assumption of no hidden behavior or missing information.Second, all subsystems handle incidents according to the same logic, which is prebuilt into hardware and software action rules.Employees in that company also followed operation procedures to handle all situations they met to solve immediate problems.This satisfies the second assumption of equal treatments in all subsystems.Finally, the entire system behavior is also recorded, which represents system-scale failure.Therefore, we know the ground truth behavior of both subsystems and overall systems.
Since all three assumptions are met, we can claim that the observed behavior belongs to Type 1 emergence.It means that since all subsystems have been reviewed separately, power delivery in the entire network should not fail.However, overall system observation tells us the opposite.In the next subsection, we apply our proposed framework to model this incidence and compute the emergence.

Applying Emergence Detection Framework on 1996 US
West Coast Blackout Incidence.This subsection details how to apply our proposed framework to model this incidence.
In this accident, the first three major events are Jim Bridger-Kimport line open, Jim Bridger-Goshen line open, and Round Up-LaGrande line open.Since details of incidence are recoded in descriptive manner, we manually build three BKBs representing each event (Figure 4).Next, we fuse them into one FBKB (Figure 5) by BKB fusion algorithm.Then we perform BKB belief updating on three-event BKBs and the FBKB and choose variable "system failure" (abbr."SF") as target.For demonstration purpose, we label S-nodes as before in Table 4. Next, we list target state probabilities for every subsystem (single event) and entire system in Table 5.In the last two rows of this table, variable  1 is the product of source fusion variable probabilities which correspond to that inference.Remember that the BKBs in this case are simplified such that only instantiated variables states are depicted, so we can see some S-nodes do not occur in any subsystem BKB but occur in overall system BKB.
In Table 5 checking target state probabilities, we know that, in all three events, (SF = ) < (SF = ), but in the overall system BKB, we see that (SF = ) > (SF = ).This is a Type 1 emergence per our definition.Now we study this emergence from a mathematical point of view.We know the values of S-nodes in these BKBs are just one solution to the following set of inequalities.Other types of emergence can be constructed in a similar way if the feasible region for these inequalities is not empty.This is the mathematical foundation (4) for emergence in this work.However, this realworld example only displays one type of emergence.In the next section, we will discuss emergence detection in a general system.(4)

Relevant Factors Underlying
Emergence.This subsection describes relevant factors effecting emergence in complex systems from data-driven approach and emergence detection on general systems.Recall the US blackout example above.We noticed that it shared multiple parameters in different subsystems, both variables and probabilities.In a general complex system, however, all kinds of divergence can occur across different subsystems.We now discuss these variations from a datadriven approach, which provides quantitative metrics of these factors.
In some complex systems, different subsystems have similar structures and parameters, such as power delivery system; in other complex systems, subsystems differ from each other, such as in health care delivery systems.It is reasonable to believe that subsystem variation also plays a role in emergence of complex systems.Therefore, if we collect multiple datasets for subsystems of a complex system, we should consider how different datasets coming from different subsystems differ from each other.To quantify their difference, we define dataset similarity metrics.These metrics introduce relevant factors for emergence.These metrics will be used in the Experiments.
This measures the difference of a certain variable between two experts' views.
Assume that V ∈  and that it only exists in   1 ,   2 , . . .,    , where Define variable similarity,  V , as the average pairwise variable similarity for variable V in these sets, namely,  V = ∑ 1≤<≤  V (   ,    )/().This measures the difference of certain variable in all experts' views on average.
Define datasets similarity, Ω, as the ratio between , where () denotes number of variables of set .This measures the difference in the variable selection criteria of two experts.
A related question is how these differences could happen in real-world systems.The answer is complicated.Sometimes different subsystems observe partially overlapped subsets of features on a system, and each shared variable in different subsystems has the same probability distribution.Such systems should have high dataset similarity scores between their subsystems.In other situations, different subsystems observe the same variable from various perspectives, resulting in contradictory probability distributions on each shared variable.These systems will have low dataset similarity scores between their subsystems.In the second kind of situations, shared variables have different distributions from one source to another caused by perspective difference, sample representativeness, random noise, and system biases.
Therefore, once we have datasets about subsystem characteristics under various system configurations, we should be able to identify which configurations lead to emergent behaviors.
As for ground truth, we apply a model independent criterion.Let  we can determine whether it is an emergent case and which type of emergence it belongs to.This forms the ground truth for each case in testing set   .
Based on the ground truth, we can perform an emergence detection task.Given several datasets   ,  ∈ [1, ], representing subsystem dynamics, we first learn each BKB from one dataset by BKB learning algorithm introduced in the Background.Then we fuse these BKBs into one FBKB per BKB fusion algorithm.Lastly, we run belief updating via sampling method on both individual BKBs and the FBKB for each testing case.To detect emergence versus nonemergence case, we form majority and minority opinions by querying about most probable state of target variable Tar on individual BKBs and compare the opinion of querying FBKB on target variable Tar.Per emergence definition, we decide whether this case is emergence and which type of emergence it belongs to.Finally, we compare our decision with ground truth label to see if we make the right call.

Emergence Detection Framework Recap.
We now provide a step-by-step recap description of our framework.
Step 1. Collect data from multiple subsystems.These datasets contain subsystem feature states as well as target variable states.
Step 2. Learn BKBs for each subsystem via BKB learning algorithm if subsystem data are presented in a structured form.Otherwise, we build BKBs manually based on descriptive data about subsystem features and target variable states.
Step 3. Fuse BKBs for subsystems into one FBKB via BKB fusion algorithm.If we have information about BKB reliabilities, we assign them to fusion algorithm; otherwise, we simply assign equal reliabilities to all subsystem BKBs.
Step 4. Analyze single BKBs and FBKB using belief updating.Compute individual BKB opinions and FBKB opinions for each system feature state combinations.
Step 5. Determine which cases belong to emergence and the emergence type according to definitions in Table 3.
Step 6. Compare BKFCS decision of emergence with ground truth if we have access to it and evaluate its performance.

Experiments
This section begins with designing synthetic datasets that simulate various types of complex systems.Then, it details building complex system models from synthetic dataset via BKB learning and fusion.Finally, we summarize the framework's performance in comparison with existing methods.

Designing Synthetic Datasets.
Even though various types of complex systems exist in real world, the subsystem datasets for emergence modeling typically have not been available for one of two reasons.(1) Extant subsystem behaviors and features are usually described in natural language or equations in postmortem briefings, but we cannot directly apply the framework to such forms of knowledge now.(2) In the cases when subsystem datasets have been recorded, they are not available to the public for commercial, security, or political reasons.As such, we test our proposed framework BKFCS against baselines using synthetic testbeds.We selected thirteen datasets (Table 6) from UCI machine learning library [27] per several rules.First, both independent and dependent variables are categorical or binary, since BKBs do not currently handle continuous variables.If we choose continuous features and discretize them, we will introduce an uncontrolled level of noise.Second, sample number is sufficient compared to variable number; otherwise, no algorithm will extract useful pattern from that dataset and result in meaningless comparison.Finally, these datasets include various variable and sample number combinations so that they represent a diversity of scenarios-covering different scales of complex systems, different amounts of available data, and various kinds of variable interactions between subsystems and within a subsystem.
To evaluate BKFCS performance, we split one dataset into training and testing set in a 10-fold cross validation fashion.For each training set, we can split it into multiple subsets, where a subset includes a part of all features and all cases.Different subsets have varying numbers of shared/common variables, representing their interactions in complex systems (Algorithm 1).To simulate the dataset similarity difference,  (11) En df o r ( 12) Endfor (13) Endfor Algorithm 1: Generating original synthetic datasets.we introduce ten popular perturbing functions that transform an original distribution to a perturbed one on shared variables (Table 7).These functions have various effects on the original distribution: some can transform a uniform or relatively even distribution to a skewed one, others can lessen the skewness of distribution, and others can flip the density of distribution, making rare cases more popular and common ones less popular.In short, they cover most scenarios in which the procedure of fusing multiple inconsistent information can result.The perturbation procedure is as follows: (1) for each shared variable, we compute its original probability mass function (pmf); (2) choose a function randomly for each source; (3) compute the perturbed pmf for each source; and (4) modify shared variable instantiations so that the distribution of the modified shared variable follows perturbed pmf with minimal change.

Applying BKFCS on One Synthetic Dataset. This subsection demonstrates learning BKBs and BKB fusion from synthetic datasets.
We first demonstrate BKFCS on dataset balloon, where there are 76 cases, and each case includes five variables.Therefore, per algorithm in Algorithm 1, each training set contains 68 cases, and each testing set contains 8 cases.The five variables are "size," "act," "age," "color," and "class" (target variable), all of whom are binary variable.If we set low variable overlap (Ω = 0.1), we can have one shared variable.In one round, we pick "size" as shared feature variable and split the rest three into three subsystems evenly.
Based on three subsets created via this manner, we learn three BKBs via the BKB learning algorithm mentioned in the Background, which are drawn in Figure 6.
Then, we apply the BKB fusion algorithm detailed in the Background to fuse three BKBs into one and perform belief updating on fused BKB.For space reason, we omit showing fused BKB here.Finally, we run emergence detection algorithm on the testing set.The details of emergence detection on it as well as on other datasets will be presented in the following subsection.

Emergence Detection on All Synthetic Datasets. This subsection details emergence detection algorithm on all synthetic datasets
Here we evaluate BKFCS performance on these datasets.A typical way of evaluating classifier performance is to compare true positive rate against false positive rate and plot the results into ROC figures.To study the ratio of correct claims of emergence versus false claims, we need to know how many cases are truly emergence cases.After all, emergence can only be detected if it occurs in testing sets.We analyze the emergence rate in the synthetic datasets by comparing majority and minority of individual subsystem dataset opinions against overall system opinion on each case in the testing sets.For instance, if, for a test case, three subsets' opinions are the same, but the overall set opinion is different than this opinion, we label this case as Type 1 emergence case.If our model predicts that it has the same opinion of the overall opinion, we classify it as correctly identified; otherwise, we claim it generates a false negative case.To evaluate its overall emergence rate in a dataset, we collapse different types of emergence.The aggregated emergence rate, which sums up all four types of emergence for each dataset under different parameters, is summarized in Table 8.
Perturbation is also involved in some experiments to simulate probability distribution variations in subsystem datasets.We simply named these datasets as perturbed sets and named those which have the same distribution of shared variables as original sets.In most datasets and both original and perturbed sets, emergence rate is positively correlated (with  value < 0.05) to datasets similarity, Ω.This is because   the more shared variables there are among different subsystems, the more interactions exist among various subsystems.Recall that, in 3.2, we need to compare computed accumulated state difference th in (3) with some predefined decision threshold.In our experiments, we vary this threshold from 0.05 to 0.25 at 0.05 step and list all results.The results for different decision thresholds and different dataset similarities are shown in Figure 7.It only contains results for original sets.We also compute ROCs for perturbed sets and it shows similar relationships, so we omit that due to space limitation.From this figure, we see that, in both original and perturbed sets, all ROCs are above the baseline (this line means "true positive rate" = "false positive rate").In addition, as Ω grows from 10 percent to 60 percent, most ROC curves move northwest (ensemble method), indicating an improved performance.Thirdly, in most datasets, the decision threshold has a significant impact on precision and recall.Finally, at a fixed threshold, precision and recall have huge variances among different datasets.However, in most conditions, our proposed algorithm can reach 50 percent true positive rate while controlling false positive rate to be under 20 percent.
This figure demonstrates the overall performance of BKFCS on all types of emergence.However, we still want to break it down by each type.Therefore, we need to know emergence rate in each dataset for each type and evaluate its detection efficiency.
Here, we treat the different types of emergence cases separately and show the emergence rate for each dataset in Table 9.In this table, the first column in the first row shows 9%, meaning, in original dataset, when omega is set to 1 (10% overlap features), the average Type 1 emergence rate across thirteen datasets is 9 percent.The second column of  the first row shows 12%, meaning that the average Type 1 emergence rate for thirteen datasets is 12 percent, and so on.For each omega, Type 3 emergence occurs most often, followed by Type 1 emergence.Type 2 and Type 4 emergence are less often observed.These results are consistent with our emergence definition, because Type 2 and Type 4 emergence indicate more divergent opinions from the various subsystems, indicating a harder decision-making process.Type 1 and Type 3 emergence, on the other hand, occur more often in practice, and it should be easier to detect them as well.
To test this hypothesis, we compute a confusion matrix for detection rates on each type of emergence by BKFCS and the average detection rate across thirteen datasets in all omega values in Table 10.In this table, the sum of each row represents the percentage of total cases that really belong to a certain type of emergence.The sum of each column is the percentage of cases that are predicted to be a certain type of emergence.In each column and each row, the number denotes the percentage of cases that is classified as that kind of emergence.The results indicate that BKFCS can detect most Type 3 and Type 1 emergence, but it performs worse on Type 2 emergence.It cannot detect any Type 4 emergence.Its performance is reasonable in that proposed BKB learning algorithm learns a BKB model from subsystem data by maximizing likelihood score, penalized by BKB structure complexity.As a result, it has limited capability in capturing extreme low frequency patterns, which maps to Type 4 emergence.

Performance Comparison against Ensemble Methods.
This subsection compares the performance of BKFCS with BN fusion baselines.
The baseline is set up as follows: for each subsystem dataset, we learn a BN using the Weka machine learning package.Then we learn a BN for the whole system from the union of subsystem dataset.Remember that we only provide classifiers with subsystem dataset and keep whole system dataset as ground truth.Then by comparing majority and datasets with opinion of BN learned from the union dataset, we evaluate its emergence detection capability.
We repeat this procedure on all datasets with all parameters and list true positive rate and false positive rate in Table 11.In comparison, we list BKFCS results in the same table with threshold 0.05 results.At last, we summarize their average performance in six configurations in Figure 8.In  this figure, we organize results into six groups from left to right, where group 1 represents original set, omega = 0.1, and group six maps to perturbed set, omega = 0.6.Remember that, for false positive rates, lower is better, and for true positive rate, higher is better.Then we do a one-tail paired t-test for both true positive and false positive rates on all six configurations.Results show that eleven out of twelve tests are significantly different at 0.05 level.All six true positive rates in BKFCS are those of BN, but two are significantly larger than those of BN (groups 3 and 6).Only group 5 shows no significant difference in false positive rate between two classifiers.In short, BKFCS is much better than BN ensemble approach in detecting emergence in these synthetic datasets results.

Conclusion
In this paper, we propose a quantitative definition of emergence and an emergence detection algorithm that learns and fuses several subsystem models through variable interaction, which preserves all inconsistent information.Experiments on synthetic datasets show that this algorithm can better detect emergence in complex systems than extant methods.To the best of our knowledge, this automatic emergence detection approach of fusing graphical models is the first in this field.

A. BKB Learning Algorithm
This appendix provides details on a new BKB learning algorithm, analyzes its time complexity, and compares its performance against five baselines on thirteen datasets from UCI machine learning library [27].We assume that there exists at least one dataset  for each subsystem, from which we learn a BKB.Even though a BKB can contain cycles, we concentrate on learning acyclic BKBs now for simplicity.Learning cyclic BKBs and their modeling impacts will be studied in future work.This algorithm first learns a component level structure  and then builds a BKB  from  and dataset .In a general BKB, different states of a component  can have different sets of parent components, namely, {Pa(  1 )} ̸ = {Pa(  2 )}, 1 ̸ = 2.However, different states of a component have the same set of parent components in a BKB built from , for  specifies parent-children relationship at variable level other than variable instantiation level.The set of parent components of a component  is denoted as Pa().Based on this simplification, our algorithm learns a BKB that maximizes the score function given dataset , assuming it contains  cases and  variables/features, and each feature/component   ,  ∈ [1, ], has (  ) states/Inodes.The notation () means the number of cases in which condition "" holds.The penalty constant  is set to 0.01 in our algorithm.This function consists of two parts: the first part computes the log likelihood of BKB given dataset , and the second part is the penalty for complexity and overfitting, which is proportional to the difference between maximized number of possible S-nodes and number of Snodes that appear in the BKB.A nonzero difference means this BKB only associates component   with some instantiations of Pa(  ) that occur in the training set but cannot generalize to unobserved instantiations, which is a sign of overfitting.What is more, a parent set Pa(  ) containing ∏ ∈Pa(  ) () >  possible instantiations must overfit to the training data, since  is the upper bound of observed patterns.
Based on (1), we design a polynomial time BKB learning algorithm which finds a near-optimal solution, as shown in Algorithm 2. It has been well known that learning a general BN from data is a NP-hard problem, so we set up some constraints to make a polynomial time algorithm possible: first, we include a threshold in the number of iterations (1000); second, we set up an upper bound on the number of parents each feature/variable can have in the BKB (C.1), and it also avoids overfitting per previous analysis about parent pattern limit; last but not least, since this algorithm takes a greedy strategy, it can only find a local maximum from a given starting search point, and we precompute multiple starting points with various density and search for multiple local maxima in parallel.Then we choose the best BKB among these local maxima as an approximation of the global maxima.
The algorithm works as follows: a fully connected DAG (directed acyclic graph) has ( − 1)/2 edges/arcs, and we would like to search from multiple initial graphs with different densities.Remember that an "edge/arc" in structure  connects two components/variables, while an edge in BKB  connects an I-node and an S-node.In line (2), we generate a random DAG  0 with density ratio * ( − 1)/2, where ratio ranges from 0.01 to 1 at an interval of 0.01.
To compute it, we first do a random shuffle of variables 1 to  and build a fully connected graph based on this shuffle.Namely, the variable in the front of shuffle points to all variables behind it.It guarantees acyclic property and its time complexity is ().Then we pick the first ⌊ratio * ( − 1)/2⌋ edges from this fully connected graph to form  0 , and its time complexity is  (1).From  0 , we iteratively search for a better graph from all its immediate neighbors.
Here an immediate neighbor of  0 means a graph which can be built by adding, deleting, or reversing an arc/edge (, ),  ∈ [1, ],  ∈ [1, ],  ̸ = .Here  and  represent two components this edge connects with, while, in a BKB , head  () and tail  () represent I-nodes.There are three possible scenarios, and each scenario corresponds to two potential neighbors, as shown from line (7) to line (11).In each scenario, we test the acyclic property of two potential neighbors through topological ordering in line (13) Therefore, we compute Δ via (A.1).If this neighbor is built by reversing an arc, both  and  will change their scores, and we compute Δ via (A.2).In both cases, we count all instantiations of Pa(  ) in one loop over  cases and compute the log likelihood score for each instantiation.In the worst case, dataset  contains min((  )∏ ∈{Pa(  )} (), ) different patterns.Therefore, the time complexity of computing a node score difference is ( + min(, (  )∏ ∈{Pa(  )} ())) = ().After we get Δ, we compare it with current largest improvement Δ max and update its value, as shown from line (15) to line (17).After we evaluate all neighbors of   , we update   with the best neighbor  max for the next iteration.However, if no neighbor has a higher score, then   is a local maximum, and the iteration stops, as shown from line (22) to line (25).In each iteration, the worst-case time complexity is ( 2 ).For all iterations from each starting point, the worstcase time complexity is (1000 2 ) = ( 2 ).The time complexity of entire algorithm is (100 * 1000 *  2 ) = ( 2 ).Therefore, this is a polynomial time complexity algorithm.
In practice, we can optimize running time in several ways: first, we compute a local maximum from different initial graphs in parallel.Second, within each iteration, we compute the node scores of neighbors in parallel.Third, we memorize all node scores for patterns already computed and do a constant-time look up for existing patterns.The platform we use is a 16-node Dell cluster, and each node contains two Intel5 Xeon5 CPU E5-2640 clocked at 2.6 GHz.Each node has 512 G of RAM.We have a total of 512 hyperthreaded cores/216 physical cores, which can speed up the algorithm by 2 orders of magnitude.
First, we learn BNs from UCI single datasets with five scoring functions using Weka.The classification accuracies are listed in Table 12.In the first column, each abbreviation corresponds to one dataset in the same order as in Table 6.For instance, "Bs" refers to "Balance-scale" and "Co" is short for "Connect 4".The last row "Avg" denotes the average result of all datasets.In the first row, each abbreviation denotes one learning scoring function.Next, we compare In addition, we notice that BKBs learned by proposed scoring function also outperform most BNs learned by several extant scoring functions.In fact, according to the "no free lunch theorem" [31], all classifiers have their strength and weakness, as shown in performance variation on various datasets.

Complexity
(1) Let   = {I  , S  , E  } be an empty BKB and   a weight function ( 2) Evidence is a set of observed feature variable instantiations.Since it is proportional to the joint probability (Class = , Evidence), we only compute this joint probability.We compute this probability by summing up the probabilities of all inferences which are consistent with Evidence and Class = .Exact inferencing simply enumerates all inferences, picks out consistent ones, and sums their probabilities as the joint probability.In general, since BNs are special case of BKBs, exact belief updating is NP-hard [32], and even an approximation of the posterior is NP-hard [33].However, the emergence detection or general classification task does require general belief updating.It is possible to design a polynomial time approximation algorithm for our purposes.Sampling-based approximation method such as importance sampling, MCMC, and Gibb sampling has been widely applied in BN updating [34,35].However, we cannot directly apply an existing approximation method to FBKB.First, a FBKB is not a BN, because it may include cycles and have different parent sets for the same variable.Cycles are introduced when fusing two fragments with conflicting causality graphs: expert one believes that variable  1 causes  2 but expert two believes in the opposite.Extant BN approximation methods do not consider these.Second, sampling methods such as importance sampling perform poorly on low frequency samples or extreme CPT entries, while emergence is a rare event which has low frequency.Third, in a fused BKB, a source fusion node is a special kind of node, which represents source reliability beyond an ordinary prior probability.Extant approximation method cannot distinguish them from normal feature variables and therefore cannot fit our special purpose.Therefore, we need to design an approximation method for running belief updating and for emergence detection.(C.3)

𝑃 SI
The approximation algorithm is shown in Algorithm 4. The algorithm works as follows.First, assuming  is the set of source fusion components, we compute the number of combinations of source fusion I-nodes, UB, in lines (2) and (3).Next, we begin to sample min( num , UB) valid inferences from BKB .In experiments, sample number  num is set to  * |SV|, where  is a constant ranging from 1 to 5. From line (5) to line (9), we create a random vector   for sample , where   consists of state indices of all components in , kept in a fixed order.If it has not been visited before, we continue processing it from line (10) to line (19).We instantiate all components in  based on   , denoted as I-nodes set   , and all feature components based on evidence set Evid, denoted as I-nodes set   , in lines (11) and (12).In line (13), we do belief updating on this sample inference SI  based on (C.1).
Next, we classify this sample inference as a valid or invalid inference in line (14) based on (C.2) and place its source fusion I-nodes index   into corresponding pools, namely, VP(valid pool) and IP(invalid pool), as shown from line (14) to line (18).After we sample num valid inferences or reach the sampling upper bound min(UB, 10 *  num ), we compute each target state probability by aggregating over inferences related to VP, which is denoted as valid inference set SI  , through function (C.3) in line (22).Append state index  of component  to   (9) Endfor (10) If(  ∉ VP)&&(  ∉ IP) (11) Set evidence on feature variables based on Evid, denoted as   (12) Set evidence on variables in  based on   , denoted as   (13) Do belief updating for sample inference SI  (14) I ft es t(SI  ) == true (15) A d d  to VP,  =  + 1 (16) e l se (17) A d d  to IP (18) En dif (19) Endif ( 20) End (21) For each state  ∈ Tar (22) Compute joint probability Pr() (23) IfPr() >   (24) L e t  = Pr(),  =  (25) Endif (26) End for (27) Output each Pr(),  ∈ Tar, and report  as predicted state of Tar Algorithm 4: BKB updating by sampling method.This sampling upper bound is set up for the following reason: since there are a combinatorial number of inferences in worst case, we must sample a large portion of them before we can get enough valid inferences.This will result in exponential time complexity.We avoid this by setting up a cap on total number of inferences at 10 *  num , which guarantees a worst-case polynomial time complexity and still maintains decent relative accuracy according to experiments below.The outputs are the target states with the largest aggregated probability and each state's aggregated probability, in line (27).This sampling approach provides equal sampling frequency for each inference, so that emergence inferences with low probabilities will have a chance of being sampled together with nonemergence inferences with high probabilities, thus overcoming the shortcomings of importance sampling for emergence detection.
Now we consider the correctness of our proposed sampling method.Consider a binary classification problem where all feature variables are observed and where the exact inferencing algorithm predicts (class = Yes) > (class = No).An individual inference will provide probabilities for both class states.We call an inference positive if it computes (Yes) > (No), Δ = (Yes) − (No).Otherwise, we call it negative and the difference is Δ  = (No) − (Yes).Assume the number of consistent inferences is  and the number of inconsistent ones is .Since exact inferencing algorithm predicts that (Yes) wins, it follows that (Yes) − (No) = ∑  =1 Δ() − ∑  =1 Δ  () = Δ − Δ  > 0, where Δ is the mean of positive inference prediction value differences.
Based on these definitions, we consider the task where we have V source fusion components.Each has three source fusion I-nodes.Exact inferencing algorithms need to compute 3 V inferences in the worst case.The percentage of positive inferences is  = /( + ).If we sample a sufficient amount  ≪ 3 V of valid inferences, then the number of positive inferences follows a binomial distribution ∼(, ).The number of negative inferences follows ∼(1 − , ).The expected probability mass will be Δ for positive inferences and be (1 − )Δ  for negative ones.If we replace  with /( + ), we get a positive probability mass (/( + ))Δ and a negative probability mass (/( + ))Δ  .Since we already know Δ − Δ  > 0, we conclude that sampling can give the same prediction as an exact algorithm with a high probability that is positively correlated to size .
However, the mean of sampled inferences only converges to the real mean Δ or Δ  if sufficient samples are collected, which is caused by the variation in the stochastic process, but it is independent of the valid inference distribution.If we sample inferences with replacement, the sample mean follows a normal distribution (,  2 /), where  and  are mean and standard deviation of the original inference distribution.If we sample inferences without replacement, the sample mean still follows a normal distribution (,  2 ((−)/(− 1))), where  is the total number of valid inferences, and the other parameters are the same as before.In practice, this approximates a normal distribution pretty well when sample size reaches 30, which is a theoretical number independent of sample size given that the actual population size is much larger than that.This number works well for large BKBs with hundreds or more inferences.In short, this number is much smaller than actual inference size on large BKB.On a small BKB, however, 30 samples are too many relative to the valid inference size.Therefore, we propose a linear sample size  * |SV|, where  is a constant ranging from 1 to 5. We also set up an upper bound on total sampled inference at 10 *  * |SV|.In addition, we know that inferencing time for one inference is a linear function of the number of evidences in this inference (number of I-nodes set as evidence); therefore, the time complexity of approximation algorithm is (SV * |Evid|),  ∈  + , while the exact inferencing algorithm is ( SV * |Evid|), where  is the average number of states for source fusion nodes.To evaluate our algorithm's performance with various sampling sizes, we test its running time on original sets against an exact inferencing algorithm under various conditions, as shown in Figure 9.
We also computed this on perturbed datasets with similar results.We run our approximation algorithm with 5 different sampling rates  ∈ [1,5].In this figure, -axis is the product of sampled inference number (SIF) and the average number of evidences in one inference, which is |SIF| * |Evid|; -axis is the average running time in second for each case/observation.We apply a linear model to fit these two values.The average goodness of fit measure (adjusted  2 ) is 0.94 for different  values in original set and 0.96 for perturbed set.Two algorithms have comparable running times on small datasets.However, as the variable number becomes large, the advantage of our sampling approach becomes obvious.The advantage also becomes more obvious when the omega value increases.Therefore, it

Figure 7 :
Figure 7: ROC for all parameter combinations.

ComplexityFigure 9 :
Figure 9: Running time comparison of approximation and exact inferencing.

Table 2 :
Belief updating of 's state.

Table 5 :
Target state probability in US blackout.

Table 7 :
Perturbation function to create perturbed sets.

Table 8 :
Emergence rate in all datasets.

Table 9 :
Rate for 4 types of emergence.

Table 10 :
Confusion matrix for four emergence types.
(14)e time complexity of the acyclicity check is (+ | E nb |), where |E nb | is the number of edges in this graph  nb .If a neighbor is acyclic, we compute its score and compare it with the current graph's score (  ).In fact, we only compute the scores of nodes in the set { |   ( nb ) ̸ =   (  )} through function (A.3).In this function, structure graph  is associated with BKB , and  can be either current graph   or its neighbor graph  nb .In line (4), each node score of current best graph   is stored as   (  ).In line(14), the change of score Δ is computed as follows: if this neighbor is built by adding or removing an arc (, ), only  will change its score.Δ =   ( nb ) −   (  ) (A.1) Δ =   ( nb ) +   ( nb ) −   (  ) −   (  )  max = (  ) = ∑   (  ), Δ max = 0,  max =

Table 12 :
Performance comparison of BNs learned by different scoring functions.

Table 13 :
Performance comparison on single dataset.