The Emergence of Informative Higher Scales in Complex Networks

,


Introduction
Networks provide a powerful syntax for representing a wide range of systems, from the trivially simple to the highly complex [1][2][3].It is common to characterize networks based on structural properties like their degree distribution or clustering, and the study of such properties has been crucial for the growth of Network Science.Yet there remains a gap in our treatment of the information contained in the relationships between nodes in a network, particularly in networks that have both weighted connections and feedback, which are hallmarks of complex systems [4,5].As we will show, analyzing this information allows for modeling the network at the most appropriate, informative scale.is is especially critical for networks that describe interactions or dependencies between nodes such as contact networks in epidemiology [6], neuronal and functional networks in the brain [7], or interaction networks among cells, genes, or drugs [8], as these networks can often be analyzed at multiple different scales.
Here we introduce information-theoretic measures that capture the information contained in the connectivity of a network, which can be used to identify when these networks possess informative higher scales.To do so, we focus on the out-weight vector, W out i , of each node, v i , in a network.is vector consists of weights w ij between v i and its neighbors, v j , and w ij � 0 if there is no edge from v i to v j .For each W out i we assume  j w ij � 1, which means w ij can be interpreted as the probability p ij that a random walker on v i will transition to v j in the next timestep, where a random walker might represent the passing of a signal, an interaction, or a state-transition [9].e information contained in a network's connectivity can be characterized by the uncertainty among its nodes' out-weights and inweights.e total information in the relationships between nodes is a function of this uncertainty and can be derived from two properties.
e first is the uncertainty of a node's outputs, which is the Shannon entropy [10] of its out-weights, H(W out i ). e average of this entropy, 〈H(W out i )〉, across all nodes is the amount of noise present in the network's relationships, only if 〈H(W out i )〉 � 0 is the network is deterministic.e second property is how weight is distributed across the whole network, 〈W out i 〉. is vector is composed of elements that are the sum of the in-weights w ji to each node v i from each of its incoming neighbors, v j (then normalized by total weight of the network).Its entropy, H(〈W out i 〉), reflects how certainty is distributed across the network.If all nodes link only to the same node, then H(〈W out i 〉) � 0, and the network is totally degenerate since all nodes lead to the same node.
e effective information (EI) of a network is the difference between these two quantities: e entropy of the distribution of out-weights in the network forms an upper bound of the amount of unique information in the network's relationships, from which the information lost due to the uncertainty of those relationships is subtracted.Networks with high EI contain more certainty in the relationships between nodes in the network (since the links represent less uncertain dependencies, unique associations, or deterministic transitions), whereas networks with low EI contain less certainty.Note that EI can be interpreted simply as a structural property of random walkers on a network and their behavior, similar to other common network measures [9].
Here, we use this measure to develop a general classification of networks (key terms can be found in Supplementary Materials, SM V A).Furthermore, we show how the connectivity and different growth rules of a network have a deep relationship to that network's EI. is also provides a principled means of quantifying the amount of information among the micro-, meso-, and macroscale dependencies in a network.We introduce a formalism for finding and assessing the most informative scale of a network: the scale that minimizes the uncertainty in the relationships between nodes.For some networks, a macroscale description of the network can be more informative in this manner, demonstrating a phenomenon known as causal emergence [11,12], which here we generalize to complex networks.is provides a rigorous means of identifying when networks possess an informative higher scale.

Effective Information Quantifies a Network's Dependencies.
is work expands to networks previous research on using effective information to measure the amount of information in the causal relationships between the mechanisms or states of a system.Originally, EI was introduced to capture the causal influence between two subsets of neurons as a step in the calculation of integrated information in the brain [13].Later, a system-wide version of EI was shown to capture fundamental causal properties in Boolean networks of logic gates, particularly their determinism and degeneracy [11].
Our current derivation from first principles of an EI for networks is equivalent to this system-wide definition (SM V B), which was based originally on interventions upon system states.For example, if a system in a particular state, A, always transitions to state B, the causal relationship between A and B can be represented by a node-link diagram wherein the two nodes-A and B-are connected by a directed arrow, indicating that B depends on A. is might be a node pair in a "causal diagram" (often represented as a directed acyclic graph, or a DAG) such as those used in [14,15] to represent interventions and causal relationships.In such a case, the information in the causal relationship between A and B can be assessed by intervening to randomize A do(A � H max ) and measuring the effects on B. e EI would be the mutual information between A and B under such randomization: [16].
To expand this framework to networks in general, we relax this intervention requirement by assuming that the elements in W out i sum to 1.In this case, an "intervention" can be interpreted as dropping a random walker on the network.For example, if the network represents a DAG or Markov chain, then dropping a random walker on a node v i would be equivalent to do(v i ). e entropy of the transitions of the random walkers and how those transitions are distributed defines the EI of a network.In this generalized formulation, only in networks where the nodes and edges actually represent dynamics, interactions, or couplings does EI indicate information about causation.In the case where edges represent correlations, or where what nodes or edges represent is undefined, EI is merely a structural property of the information contained in the behavior of hypothetical random walkers (however, this situation is no different from other analysis methods that rely on random walkers).
Here we describe how this generalized structural EI property behaves in common network models, asking basic questions about the relationship between a network's EI and its size, density, and structure.ese inquiries allow for the exhaustive classification and quantification of the information contained in the connectivity of real networks.It is intuitive that the EI of a network will increase as the network grows in size.In general, adding more nodes should increase the entropy, which should in turn increase the amount of information.However, in cases of randomness rather than structure, EI should reflect this randomness.We found this is indeed the case.
Figure 1(a) shows the relationship between a network's EI and its size under several parameterizations of Erdős-Rényi (ER) random graphs [17,18].As the size of an ER network increases (while keeping constant the probability that any two nodes will be connected, p), its EI converges to a value of −log 2 (p).
at is, in random networks, EI is dominated solely by the probability that any two nodes are connected, a key finding which demonstrates that, after a certain point, a random network structure does not contain more information as its size increases.is shift occurs in ER networks at approximately 〈k〉 � log 2 (N), which is also the point at which we can expect all nodes to be in a giant component [1]. is finding illustrates that network connectivity must be nonrandom to increase the amount of information in the relationships between nodes (see SM V C 1 for derivation).Note that if a network is maximally dense 2 Complexity (i.e., a fully connected network, with self-loops), EI � 0.0.However, we expect such dense low-EI structures to be uncommon, since network structures found in nature and society tend to be sparse [19].
We report another key relationship between a network's connectivity and its EI in Figure 1(b).We again compare the EI of a network to its size, focusing on networks grown under different parameterizations of a preferential attachment model [20,21].Under a preferential attachment growth model, a new node is added to the network at each time step, contributing m new edges to the network; these m edges connect to nodes already in the network, v j , with a probability proportional to k α j .Here, k j is the degree of node v j and α tunes the amount of preferential attachment.A value of α � 0.0 corresponds to each node having an equal chance of receiving a new node's link (i.e., no preferential attachment).
e classic Barabási-Albert network corresponds to linear preferential attachment, α � 1.0 [21].Superlinear preferential attachment, α > 1.0, creates networks that have less and less EI, eventually resembling starlike structures (see SM V C 2 for derivation).As shown in Figure 1(b), only in cases of sublinear preferential attachment, α < 1.0, does the network's EI continue to increase with its size.When α � 0.0-creating a random tree-the network's EI increases logarithmically as its size increases.e maximum possible EI in a network of N nodes is log 2 (N).
is can be seen in the case of a directed ring network where each node has one incoming link and one outgoing link, each with a weight of 1.0, so each node has one node uniquely connecting to it.In such a network, each node contributes zero uncertainty, since 〈H(W out i )〉 � 0.0, and , and therefore, its EI is always log 2 (N).In general, the EI of undirected lattices is fixed entirely by its size and the dimension of the ring lattice (i.e., d � 1 is an undirected ring, d � 2 is a taurus, etc. [22]), so for such lattices EI � log 2 (N) − log 2 (2d) (see SM V C 2 for derivation).
e picture that emerges is that EI is inextricably linked with a network's connectivity and growth (even network motifs, as shown in SM V D) and therefore to the fundamentals of Network Science.Random networks have a fixed amount of EI, and scale-freeness (α � 1.0) represents the critical bound for the growth of EI.In general, dense networks and star-like networks have less EI. e next section explores how EI's components explain these associations.

Determinism and Degeneracy
. Determinism and degeneracy are the two fundamental components of EI [11].
ey are based on a network's connectivity (see Figure 2(a) for a visual explanation), specifically the degree of overlapping weight in the networks.Determinism and degeneracy are derived from the uncertainty over outputs and uncertainty in how those outputs are distributed, respectively: In a maximally deterministic network wherein all nodes have a single output, w ij � 1.0, the determinism is log 2 (N) because 〈H(W out i )〉 � 0.0.Conceptually, this means that a Complexity random walker will move deterministically starting from any node.Degeneracy is the amount of information in the connectivity lost via an overlap in input weights (e.g., if multiple nodes output to the same node).In a perfectly nondegenerate system where all nodes have equal input weights, the degeneracy is zero since H(〈W out i 〉) � log 2 (N).Together, determinism and degeneracy can be used to define EI: ese two quantities provide clear explanations for why different networks have the EI they do.For example, as the size of an Erdős-Rényi random network increases, its degeneracy approaches zero, which means the EI of a random network is driven only by the determinism of the network, which is in turn the negative log of the probability of connection, p.Similarly, in d-dimensional ring lattice networks, the degeneracy term is always zero, which means the EI of a ring lattice structure also reduces to the determinism of that structure.Ring networks with an average degree 〈k〉 will have a higher EI than ER networks with the same average degree because ring networks will have a higher determinism value.In the case of star networks, the degeneracy term alone governs the decay of the EI such that hub-and-spoke-like structures quickly become uninformative in terms of cause and effect (see SM V C for derivations concerning these cases).In general, this means that canonical networks can be characterized by their ratio of determinism to degeneracy (see Figure 2(b)).

Effective Information in Real
Networks.So far, we have been agnostic as to the origin of the network under analysis.
As described previously, to measure the EI of a network, one can create each W out i by normalizing each node's out-weight vector to sum to 1.0.Regardless of what the relationships between the nodes represent, the network's determinism reflects how targeted the out-weights of the nodes are (networks with more targeted links possess higher EI), while the degeneracy captures the overlap of the targeting of nodes.High EI reflects the greater specificity in the connectivity, whereas low EI indicates a lack of specificity (as in Figure 2(a)).is generalizes our results to multiple types of representations, although the origin of the normalized network should be kept in mind when interpreting the value of the measure.
Since the EI of a network will change depending on the network's size, we use a normalized form of EI known as effectiveness in order to compare the EI of real networks.Effectiveness ranges from 0.0 to 1.0 and is defined as A corresponds to node A in the inset network motif ) is when a random walker on v i transitions to one of its neighbors with probability 1.0, whereas indeterminism occurs when v i has a uniform probability of visiting any node in the network in the next time step.Right: three example in-weight vectors to a given v j .A maximally degenerate vector, 〈W out i 〉 (top right, exemplified by the inset network motif ), is when every outgoing edge in the network connects to a single node, whereas minimal degeneracy occurs when each value in 〈W out i 〉 is uniformly (1/N).(b) By comparing the determinism and degeneracy of canonical network structures, we find a great deal of heterogeneity in different network models' ratios between their determinism and degeneracy.High degeneracy is characterized by hub-and-spoke topology, as in the case of the star network.Networks with high determinism are characterized by longer average path lengths, as in the case of a ring lattice.4 Complexity As the determinism and degeneracy of a network increase to their minimum and maximum possible values, respectively, the effectiveness of that network will trend to 0.0.Regardless of its size, a network wherein each node has a deterministic output to a unique target has an effectiveness of 1.0.
In Figure 3, we examine the effectiveness of 84 different networks corresponding to data from real systems.ese networks were selected primarily from the Konect Network Database [23], which was used because its networks are publicly available, range in size from dozens to tens of thousands of nodes, often have a reasonable interpretation as being based on interactions between nodes, and they are diverse, ranging from social networks, to power networks, to metabolic networks.We defined four categories of interest: biological, social, informational, and technological.We selected our networks by using all the available networks (under 40,000 nodes due to computational constraints) in the domains corresponding to each category within the Konect database, and where it was appropriate, the Network Repository as well [24].See Materials & Methods section and SM Table II for a full description of this selection process.
Lower effectiveness values correspond to structures that have either high degeneracy (as in right column, Figure 2(a)) or low determinism (as in left column, Figure 2(a)) or a combination of both.In the networks we measured, biological networks on average have lower effectiveness values, whereas technological networks on average have the highest effectiveness. is finding aligns intuitively with what we know about the relationship between EI and network structure, and it also supports long-standing hypotheses about the role of redundancy, degeneracy, and noise in biological systems [25,26].On the other hand, technological networks like power grids, autonomous systems, or airline networks on average are associated with higher effectiveness values.One explanation for this difference is that efficiency in human-made technological networks tends to create sparser, nondegenerate networks with higher effectiveness on average, wherein the nodes relationships are more specific in their targeting.
Perhaps it might be surprising to find that evolved networks have such low effectiveness.But, as we will show, a low effectiveness can actually indicate that there is informative higher-scale (macroscale) connectivity in the system.at is, a low effectiveness can reflect the fact that biological systems often contain higher-scale structure, which we demonstrate in the following section.

Causal Emergence in Complex Networks.
is new global network measure, EI, offers a principled way to answer an important question: what is the scale that best captures the connectivity of a complex system? e resolution to this question is important because science analyzes the structure of different systems at different spatiotemporal scales, often preferring to intervene and observe systems at levels far above that of the microscale [12].
is is likely because relationships at the microscale can be extremely noisy and therefore uninformative, and coarse-graining can minimize this noise [11].Indeed, this noise minimization is actually grounded in Claude Shannon's noisy-channel coding theorem [10], wherein dimension reductions can operate like codes that use more of a channel's capacity [16].Higher-level causal relationships often perform error-correction on the lower-level relationships, thus generating extra effective information at those higher scales.Measuring this difference provides a principled means of deciding when higher scales are more informative (emergence) or when higher scales are extraneous, epiphenomenal, or lossy (reduction).
Bringing these issues to network science, we can now ask, what representation will minimize the uncertainty present in a network?We do this by examining causal emergence, which is when a dimensionally reduced network contains more informative connectivity, in the form of a higher EI than the original network.Note that, as discussed, EI can be interpreted solely as a general structural property of networks.erefore, while we still call this phenomenon "causal emergence" because it has the same mathematical formalization as previous work in Boolean networks and Markov chains [11,12,16], here we focus on how it can be used to identify the informative higher scales of networks regardless of what those networks represent.
Notably, the phenomenon can be measured by recasting networks at higher scales and observing how the EI changes, Complexity a process which identifies whether the network's higher scales add information above and beyond lower scales.

Network Macroscales.
First, we must introduce how to recast a network, G, at a higher scale.is is represented by a new network, G M .Within G M , a micronode is a node that was present in the original G, whereas a macronode is defined as a node, μ, that represents a subgraph, S i , from the original G (replacing the subgraph within the network).Since the original network has been dimensionally reduced by grouping nodes together, G M will always have fewer nodes than G.
A macronode μ is defined by some W out μ , derived from the edge weights of the various nodes within the subgraph it represents.One can think of a macronode as being a summary statistic of the underlying subgraph's behavior, a statistic that takes the form of a single node.Ultimately there are many ways of representing a subgraph, that is, building a macronode, and some ways are more consistent than others in capturing the subgraph's behavior, depending on the connectivity.We highlight here that macroscales of networks should in general be consistent with their underlying microscales in terms of their dynamics.While this has never been assessed within networks or systems generally, there has been previous research that has asked whether the macroscales of structural equation models are consistent with the effect of all possible interventions [27].
Here, to decide whether or not a macronode is an consistent summary of its underlying subgraph, we formalize consistency as a measure of whether random walkers behave identically on G and G M .We do this because random walks are often used to represent dynamics on networks [9], and therefore, many important analyses and algorithms-such as PageRank for determining a node's centrality [28] or InfoMap for community discovery [29]-are based on random walks.
Specifically, we define the inconsistency of a macroscale as the Kullback-Leibler divergence [30] between the expected distribution of random walkers on G vs. G M , given some identical initial distribution on each.
e expected distribution over G at some future time, t, is P m (t), while the distribution over G M at some future time t is P M (t).To compare the two, the distribution P m (t) is summed over the same nodes in the macroscale G M , resulting in the distribution P M|m (t) (the microscale given the macroscale).We can then define the macroscale inconsistency over some series of timesteps T as inconsistency �  T t�0 is consistency measure addresses the extent to which a random dynamical process on the microscale topology will be recapitulated on a dimensionally reduced topology (for how this is applied in our analysis, see Materials & Methods).
What constitutes a consistent macroscale depends on the connectivity of the subgraph that gets grouped into a macronode, as shown in Figure 4.
e W out μ can be constructed based on the collective W out of the subgraph (shown in Figure 4(a)).For instance, in some cases, one could just coarse-grain a subgraph by using its average W out as the W out μ of some new macronode μ (as in Figure 4(b)).However, it may be that the subgraph has dependencies not captured by such a coarse-grain.Indeed, this is similar to the recent discovery that when constructing networks from data, it is often necessary to explicitly model higher-order dependencies by using higher-order nodes so that the dynamics of random walks to stay true to the original data [31].We therefore introduce higher-order macronodes (HOMs), which draw on similar techniques to consistently represent subgraphs as single nodes [31].
Different subgraph connectivities require different types of HOMs to consistently represent them.For instance, HOMs can be based on the input weights to the macronode, which take the form μ | j.In these cases, W out μ|j is a weighted average of each node's W out in the subgraph, where the weight is based on the input weight to each node in the subgraph (Figure 4(c)).Another type of HOM that generally leads to consistent macronodes over time is when W out μ is based on the stationary output from the subgraph to the rest of the network, which we represent as μ | π (Figure 4(d)).
ese types of HOMs may have minor inconsistencies given some initial state, but will almost always trend toward perfect consistency as the network approaches its stationary dynamics (outlined in Section 4).
Subgraphs with complex internal dynamics can require a more complex type of HOM in order to preserve the macronode's consistency.For instance, in cases where subgraphs have a delay between their inputs and outputs, this can be represented by a combination of μ | j and μ | π, which when combined captures that delay (Figure 4(e)).In these cases, the macronode μ has two components, one of which acts as a buffer over a timestep.
is means that macronodes can possess memory even when constructed from networks that are at the microscale memoryless, and in fact, this type of HOM is sometimes necessary to consistently capture the microscale dynamics.
We present these types of macronodes not as an exhaustive list of all possible HOMs, but rather as examples of how to construct higher scales in a network by representing subgraphs as nodes and also sometimes using higher-order dependencies to ensure those nodes are consistent.is approach offers a complete generalization of previous work on coarse-grains [11] and also black boxes [16,32,33], while simultaneously solving the previously unresolved issue of macroscale consistency by using higher-order dependencies.
e types of macronodes formed by subgraphs also provide substantive information about the network, such as whether the macroscale of a network possesses memory or pathdependency.

Causal Emergence Reveals the Scale of Networks.
A network has an informative macroscale when a recast network, G M (a macroscale), has more EI than the original network, G (the microscale).In general, networks with lower effectiveness (low EI given their size) have a higher potential 6 Complexity for such emergence, since they can be recast to reduce their uncertainty.Searching across groupings allows the identification or approximation of a macroscale that maximizes the EI.Checking all possible groupings is computationally intractable for all but the smallest networks.erefore, in order to find macronodes which increase the EI, we use a greedy algorithm that groups nodes together and checks if the EI increases.By choosing a node and then pairing it iteratively with its surrounding nodes we can grow macronodes until pairings no longer increase the EI, and then move on to a new node (see the Materials & Methods section for details on this algorithm).
By generating undirected preferential attachment networks and varying the degree of preferential attachment, α, we observe a crucial relationship between preferential attachment and causal emergence.One of the central results in network science has been the identification of "scale-free" networks [21].Our results show that networks that are not "scale-free" can be further separated into micro-, meso-, and macroscales depending on their connectivity.is scale can be identified based on their degree of causal emergence (Figure 5(a)).In cases of sublinear preferential attachment (α < 1.0), networks lack higher scales.Linear preferential attachment (α � 1.0) produces networks that are scale-free, which is the zone of preferential attachment right before the network develops higher scales.Such higher scales only exist in cases of superlinear preferential attachment (α > 1.0).And past α > 3.0 the network begins to converge to a macroscale where almost all the nodes are grouped into a single macronode.e greatest amount of causal emergence is found in mesoscale networks, which is when α is between  is is because after α > 1.0 the decreasing EI of the microscale leaves room for improvement of the EI at the macroscale, following a grouping of nodes.
Correspondingly the size of G M decreases as α increases and the network develops an informative higher scale, which can be seen in the ratio of macroscale network size, N M , to the original network size, N (Figure 5(b)).As discussed previously, networks generated with higher values for α will be more and more star-like.Star-like networks have higher degeneracy and thus less EI, and because of this, we expect that there are more opportunities to increase the network's EI through grouping nodes into macronodes.Indeed, the ideal grouping of a star network is when N M � 2 and EI � 1 bit.
is result is similar to recent advances in spectral coarse-graining that also observe that the ideal coarsegraining of a star network is to collapse it into a two-node network, grouping all the spokes into a single macronode [34], which is what happens to star networks that are recast as macroscales.
Our results offer a principled and general approach to such community detection by asking whether there is an informational gain from replacing a subgraph with a single node.erefore, we can define causal communities as being when a cluster of nodes, or some subgraph, forms a viable macronode (note that this assumes the connections in the network actually represent possible causal interactions, but it also merely a topological property).Fundamentally, causal communities represent noise at the microscale.e closer a subgraph is to complete noise, the greater the gain in EI by replacing it with a macronode (see SM V G).Minimizing the noise in a given network also identifies the optimal scale to represent that network.However, there must be some structure that can be revealed by noise minimization in the first place.In cases of random networks that form a single large component which lacks any such structure, causal emergence does not occur (as shown in SM V G).

Causal Emergence in Real Networks.
e presence and informativeness of macroscales should vary across real networks, depending on connectivity.Here, we investigate the disposition toward causal emergence of real networks across different domains.We draw from the same set of networks that are analyzed in Figure 3, the selection process and details of which is outlined in the Materials & Methods section.e network sizes span up to 40,000 nodes, thus making it unfeasible to find the best macroscales for each of them.erefore, we focus specifically on the two categories that previously showed the greatest divergence in terms of the EI: biological and technological.Since we are interested in the general question of whether biological or technological networks show a greater disposition or propensity for 8 Complexity causal emergence, we approximate causal emergence by calculating the causal emergence of sampled subgraphs of growing sizes.Each sample is found using a "snowball sampling" procedure, wherein a node is chosen randomly and then a weakly connected subgraph of a specified size is found around it [35].is subgraph is then analyzed using the previously described greedy algorithmic approach to find macronodes that maximized the EI in each network.Each available network is sampled 20 times for each size taken from it.In Figure 6, we show how the causal emergence of these real networks differentiates as we increase the sampled subgraph size, in a sequence of 50, 100, 150, and finally 200 nodes per sample.Networks of these sizes previously provided ample evidence of causal emergence in simulated networks, as in Figure 5(a).Comparing the two categories of real networks, we observe a significantly greater propensity for causal emergence in biological networks, and that this is more articulated the larger the samples are.Note that constructing a random null model of these networks (e.g., a configuration model) would tend to create networks with minimal or negligible causal emergence, as is the case for ER networks (Figure 13 in SM V G). at subsets of biological systems show a high disposition toward causal emergence is consistent, and even explanatory, of many long-standing hypotheses surrounding the existence of noise and degeneracy in biological systems [36].It also explains the difficulty of understanding how the causal structure of biological systems functions, since they are cryptic by containing certainty at one level and uncertainty at another.

Discussion
We have shown that the information in the relationships between nodes in a network is a function of the uncertainty intrinsic to their connectivity as well as how that uncertainty is distributed.To capture this information, we adapted a measure, effective information (EI), for use in networks and analyzed what it reveals about common network structures that have been studied by network scientists for decades.For example, the EI of an ER random network tends to −log 2 (p), and whether the EI of a preferential attachment network grows or shrinks as new nodes are added is a function of whether its degree of preferential attachment, α, is greater or less than 1.0.In networks where the mechanisms or transitions are unknown, but the structure is known, EI captures the degree of unique targeting in the network.In real networks, we showed that the EI of biological networks tends to be much lower than technological networks.
We also illustrated that what has been called "causal emergence" can occur in networks.is is the gain in EI that occurs when a network, G, is recast as a new network, G M .Finding this sort of informative higher scale means balancing the minimization of uncertainty while simultaneously maximizing the number of nodes in the network.
ese methods may be useful in improving scientific experimental design, the compression and search of big data, model choice, and even machine learning.Importantly, not every recast network, G M , will have a higher EI than the G that it represents, that is, these same techniques can identify cases of reduction.Ultimately, this is because comparing the EI of different network representations provides a ground for comparing the effectiveness of any two network representations of the same complex system.ese techniques allow for the formal identification of the scale of a network.Scale-free networks can be thought of as possessing a fractal pattern of connectivity [37], and our results show that the scale of a network is the breaking of that fractal in one direction or the other Note that a future area of research is how to efficiently identify such informative higher scales, as well as how network properties beyond the EI change across scales [38].
e study of higher-order structures in networks is an increasingly rich area of research [29,[39][40][41][42], often focusing on constructing networks that better capture the data they represent.Here, we introduce a formal and generalized way to recast networks at a higher scale while preserving random walk dynamics.In many cases, a macroscale of a network can be just as consistent in terms of random walk dynamics and also possess greater EI.Some macronodes in a macroscale may be of different types with different higher-order properties.In other words, we show how to turn a lowerorder network into a higher-order network.One noteworthy and related aspect of our work is demonstrating how a system that is memoryless at the microscale can actually possess memory at the macroscale, indicating that whether a system has memory is a function of scale.
While some [43] have previously recast subgraphs as individual nodes as we do here, they have not done so in ways that are based on noise minimization and maximizing consistency, focusing instead on gains to algorithmic speed via compression.Explicitly creating macronodes to Growing snowball samples of the two network domains that previously showed the greatest divergence in effectiveness: technological and biological networks.At each snowball size, N s , each network is sampled 20 times.Across these samples, the total amount of causal emergence for a given sample size is significantly different between the two domains (t-test, comparison of means).
Complexity minimize noise brings the dependencies of the network into focus.is means that causal emergence in networks has a direct relationship to community detection, a vast subdiscipline that treats dense subgraphs within a network as representing shared properties, membership, or functions [44,45].However, the relationship between causal emergence and traditional community detection is not as direct as it may seem.For one, causal emergence is high in networks with high degeneracy (i.e., networks with highdegree hubs, as we show in Figure 5(a)).Community detection algorithms do not typically select for such structural properties, instead focusing on dense subgraphs that connect more highly within the subgraph than outside [44].In SM Figure 12, we show a landscape of stochastic block model networks and their associated values for causal emergence.Indeed in networks that would have high modularity [46] (e.g., two disconnected cliques), we do observe causal emergence, but only when the two disconnected cliques are of different sizes. is distinction is key and situates networks that display causal emergence in a meaningful place in the study of complex networks.In light of this, macronodes offer a sort of community detection where the micronodes that make up a macronode are a community and ultimately can be replaced by a macronode that summarizes their behavior while reducing the subgraph's noise.Under this interpretation, community structure is characterized by noise rather than shared memberships.

Selection of Real Networks.
Networks were chosen to represent the four categories of interest: social, informational, biological, and technological (see SM Figure 10, where we detail the same information as in Figure 3, but also include the source of the network data in addition to the effectiveness value of each network).We used all the available networks under 40,000 nodes (due to computational constraints) within all the domains in the Konect database that reflected our categories of interest.For our social category, we used the domains Human Contact, Human Social, Social, and Communication.For our information category, we used the domains Citations, Coauthorship, Hyperlinks, Lexical, and Software.For our biological category we used the domains Trophic and Metabolic.Due to overlaps between the Konect database and the Network Repository [24] in these domains, and the paucity of other biological data in the Konect database, we also included the Brains domain and the Ecology domain from the Network Repository to increase our sample size (again, all networks within these domains under 40,000 nodes were included).For our technological category, we used the domains Computer and Infrastructure from the Konect database.Again due to overlap between the Konect database and the Network Repository, we also included the Technological and Power Networks domains from the Network Repository.For a full table of the networks used in this study, along with their source and categorization, see Table II.

Creating Consistent Macronodes.
Previously we outlined methods for creating consistent macronodes of different types.Here, we explore their implementation, which requires deciding which macroscales are consistent.Inconsistency is measured as the Kullback-Leibler divergence between the expected distribution of random walkers on both the microscale (G) and the macroscale (G M ), given an initial distribution, as in equation (5).
To measure the inconsistency we use an initial maximum entropy distribution on the shared nodes between G and G M , that is, only the set of nodes that are left ungrouped in G M .Similarly, we only analyze the expected distribution over that same set of micronodes.Since such distributions are only over a portion of the network, to normalize each distribution to 1.0, we include a single probability that represents all the nonshared nodes between G and G M (representing when a random walker is on a macronode).
We focus on the shared nodes between G and G M for the inconsistency measure because (a) it is easy to calculate which is necessary during an algorithmic search, (b) except for unusual circumstances, the inconsistency over the shared nodes still reflects the network as a whole, and (c) even in cases of the most extreme macroscales (such as when α > 4 in Figure 5), there are still nodes shared between G and G M .
Here, we examine our methods of using higher-order dependencies in order to demonstrate that this creates consistent macronodes.We use 1000 simulated preferential attachment networks, which were chosen as a uniform random sample between parameters α � 1.0 and 2.0, n � 25 to 35, and with either m � 1 or 2. ese networks were then grouped via the algorithm described in the following section.All macronodes were of the μ | π type, and their inconsistency was checked over 1000 timesteps.ese macronodes generally have consistent dynamics, either because they start that way or because they trend to that over time, and of the 1000 networks, only 4 had any divergence greater than 0 after 1000 timesteps.In Figure 11 in SM V F, we show 15 of these simulated networks, along with their parameters, number of macronodes, and consistencies.Note that even in the cases with early nonzero inconsistency, this is always very low in absolute terms of bits, and of the randomly chosen 15, none do not trend toward consistency over time.In our observations, most macronodes converge before 500 timesteps, so in analyzing the real-world networks using the μ|π macronode, we check all macronodes for consistency and only reject those that are inconsistent at 500 timesteps.More details about the algorithmic approach to finding causal emergence can be found in the following section.

Greedy Algorithm for Causal Emergence.
e greedy algorithm used for finding causal emergence in networks is structured as follows: for each node, v i , in the shuffled node list of the original network, collect a list of neighboring nodes, v j   ∈ B i , where B i is the Markov blanket of v i (in graphical models, the Markov blanket, B i , of a node, v i , corresponds to the "parents," the "children," and the "parents of the children" of v i [47]).
is means that v j   ∈ B i consists of nodes with outgoing edges leading into 10 Complexity v i , nodes that the outgoing edges from v i lead into, and nodes that have outgoing edges leading into the out-neighbors of v i .For each node in v j  , the algorithm calculates the EI of a macroscale network after v i and v j are combined into a macronode, v M , according to one of the macronode types in Figure 4.If the resulting network has a higher EI value, the algorithm stores this structural change and, if necessary, supplements the queue of nodes, v j  , with any new neighboring nodes from v j 's Markov blanket that were not already in v j  .If a node, v j , has already been combined into a macronode via a grouping with a previous node, v i , then it will not be included in new queues, v j ′  , of later nodes to check.
e algorithm iteratively combines such pairs of nodes until every node, v j , in every node, v i 's Markov blanket, is tested.

Figure 1 :
Figure 1: Effective information depends on network structure.(a) In Erdős-Rényi (ER) networks, we see the network's EI level off at EI � −log 2 (p) as N, the network's size, increases (log scale shown).(b) e EI of networks grown under a preferential attachment mechanism, which depends on the preferential attachment exponent, α.Under this network growth model, new nodes add their m edges (here, m � 1) to existing nodes in the network with a probability proportional to k α .Only sublinear preferential attachment (α < 1.0) allows for the continuous growth of EI with the growth of the network.e ribbons around the data represent standard deviations after 100 simulations of each.

Figure 2 :
Figure 2: Comparing determinism and degeneracy.(a) Left column: three example out-weight vectors, W out i , of a given node, v i .A maximally deterministic vector (top left, where the W outA corresponds to node A in the inset network motif ) is when a random walker on v i transitions to one of its neighbors with probability 1.0, whereas indeterminism occurs when v i has a uniform probability of visiting any node in the network in the next time step.Right: three example in-weight vectors to a given v j .A maximally degenerate vector, 〈W out i 〉 (top right, exemplified by the inset network motif ), is when every outgoing edge in the network connects to a single node, whereas minimal degeneracy occurs when each value in 〈W out i 〉 is uniformly (1/N).(b) By comparing the determinism and degeneracy of canonical network structures, we find a great deal of heterogeneity in different network models' ratios between their determinism and degeneracy.High degeneracy is characterized by hub-and-spoke topology, as in the case of the star network.Networks with high determinism are characterized by longer average path lengths, as in the case of a ring lattice.

Figure 3 :
Figure 3: Effective information of real networks.Effectiveness, a network's EI, normalized by log 2 (N) [11], of 84 real networks from the Konect network database [23], grouped by domain of origin.To look further at the names and domains of the networks in question, see SM V E Networks in different categories have varying effectiveness (t-test, comparison of means).

Figure 4 :
Figure 4: Macronodes.(a) e original network, G along with its adjacency matrix (left).e shaded oval indicates that subgraph S member nodes v B and v C will be grouped together, forming a macronode, μ.All macronodes are some transformation of the original adjacency matrix via recasting it as a new adjacency matrix (right).e manner of this recasting depends on the type of macronode.(b) e simplest form of a macronode is when W out μ is an average of the W out i of each node in the subgraph.(c) A macronode that represents some pathdependency, such as input from A. Here, in averaging to create the W out μ , the out-weights of nodes v B and v C are weighted by their input from v A .(d) A macronode that represents the subgraph's output over the network's stationary dynamics.Each node has some associated π i , which is the probability of v i in the stationary distribution of the network.e W out μ of a μ | π macronode is created by weighting each W out i of the micronodes in the subgraph S by (π i / k∈S π k ).(e) A macronode with a single timestep delay between input μ | j and its output μ | π, each constructed using the same techniques as its components.However, μ | j always deterministically outputs to μ | π.See SM V A for the full equations governing the creation of the W out μ of each of the different HOMs shown.

Figure 5 :
Figure 5: e emergence of scale in preferential attachment networks.(a) By repeatedly simulating networks with different degrees of preferential attachment (α values) with m � 1 new edge per each new node, and running them through a greedy algorithm (described in Materials & Methods), we observe a distinctive peak of causal emergence once the degree of preferential attachment is above α � 1, yielding networks that are no longer "scale-free."(b) e log of the ratio of original network size, N, to the size of the macroscale network, N M .Networks with higher α values-more star-like networks-show drastic dimension reductions, and in fact, all eventually reach the same N M of 2. Comparatively, random trees (α � 0.0) show essentially no informative dimension reductions.

Figure 6 :
Figure 6: Propensity for causal emergence in real networks.Growing snowball samples of the two network domains that previously showed the greatest divergence in effectiveness: technological and biological networks.At each snowball size, N s , each network is sampled 20 times.Across these samples, the total amount of causal emergence for a given sample size is significantly different between the two domains (t-test, comparison of means).