From the difference of structures to the structure of the difference

When dealing with evolving or multi-dimensional complex systems, network theory provides with elegant ways of describing their constituting components, through respectively time-varying and multi-layer complex networks. Nevertheless, the analysis of how these components are related is still an open problem. We here propose a framework for analysing the evolution of a (complex) system, by describing the structure created by the difference between multiple networks by means of the Information Content metric. As opposed to other approaches, as for instance the use of global overlap or entropies, the proposed one allows to understand if the observed changes are due to random noise, or to structural (targeted) modifications. We validate the framework by means of sets of synthetic networks, as well as networks representing real technological, social and biological evolving systems. We further propose a way of reconstructing network correlograms, which allow to convert the system's evolution to the frequency domain.


I. INTRODUCTION
Although complex networks theory 1,2 was initially used to describe the structure underpinning individual complex systems, in recent years there has been an explosion in the number of situations in which (potentially large) sets of networks have to be studied in a comparative way. The availability of multiple related networks may be the natural result of analysing different, yet compatible systems -as, for instance, functional brain networks obtained from a large set of healthy people, with the aim of identifying common connectivity patterns 3 ; or from control subjects and patients suffering from a given condition 4 , to detect differences between them. This can nevertheless also stem from the analysis of a single system across its parameters' and temporal dimensions. Following on the previous example, neuroscientists may be interested in characterising the temporal evolution of such networks during a long cognitive task 5,6 , or across different frequency bands 7 . Potential examples are not limited to neuroscience, and indeed appear in all research fields in which complex networks have been applied 8 , i.e. across social, biological and technological systems -a clear example of the latter being air transport networks 9,10 .
The analysis of the differences between two or more networks is a two-fold problem. On one hand, it entails the quantification of such differences 11 , e.g. by calculating a set of topological metrics and by comparing their normalised values 12 ; and, on the other hand, the understanding of the dynamical processes causing such changes. These two aspects of the problem are orthogonal, as both of them have to be taken into account for the correct understanding of an observed evolution. The fact that two networks are not equal does not imply the presence of a structured evolutionary process, as they may be the result of describing the same system under observational noise. Such conclusion cannot be drawn even from a statistically significant change in some topological metric: e.g. a reduction in the modularity may be the result of a random link rewiring, but also of a targeted process aimed at disrupting the modular structure. Even an increase in modularity may be the result of a random process, albeit with low probability. Lastly, and on the same line, one should not correlate the magnitude of the changes with the presence of targeted processes: noise does not necessarily result in small fluctuations only. These two aspects, i.e. description and structureness, are also of high relevance of real-world applications. For instance, in the specific case of brain functional networks, the presence of an unstructured difference between control subjects and patients may be ascribed to a global loss of brain connectivity, while structured changes may suggest a focused reorganisation of the information flow.
The second previously discussed point, i.e. the understanding of the dynamical processes causing a change, is a specific aspect of the more general problem known as phenotype to genotype 13,14 . While we can observe only the phenotype of a system, in this case the resulting physical or functional network, what we would really like to understand is the genotype that has created it. If several phenotypes are available, e.g. we can observe the temporal evolution of the system, we can in principle use the phenotype's dynamics to (partly) reconstruct the genotype: in other words, we can use the "difference of structures" to unveil the underlying "structure creating such difference".
Inspired by this, we here present a framework designed to answer the following specific question: do the observed changes follow a structure, or are they simply the result of random fluctuations? This framework is based on a) the calculation of the difference between the two observed networks, b) the representation of such difference as a new difference network, and c) the analysis of its structural characteristics. Specifically, we start from the assumption that changes resulting from non-random processes are characterised by correlations, which reflect in the presence of a meso-scale in the difference network. Such meso-scale can then be detected using a broad-band topological metric, i.e. the Information Content 15 , and its significance assessed through a statistical test based on ensembles of equivalent random networks. By means of a set of synthetic evolving networks, we show that this approach outperforms other alternatives, as the ones based on cross-network correlations 16 or von Neumann entropy 17,18 . We further demonstrate the usefulness of the proposed solution by analysing three real systems, respectively technical (the evolution of the world-wide air transport network), social (human contact networks in a hospital) and biological (comparison of functional brain networks corresponding to different frequency bands). We conclude this work by showing how this approach can be used to construct a network correlogram, which, among others, can be used to detect the natural frequency of a time-evolving network.

A. Information Content
For the sake of completeness, we here include a short overview of the Information Content metric, which is the basis of the proposed methodology. For a more complete description, the reader may refer to Ref. 15 .
The rationale behind the definition of the Information Content is that a regular network, or more generally any network presenting a meso-scale structure, displays strong correlations between the node's connectivity patterns. The information encoded by pairs of such correlated nodes is thus redundant, as the connections of one of them almost completely define the second one's. A clear example is yielded by networks with a strong community structure, in which two nodes belonging to the same community usually share most of their neighbours.
Following this idea, and given an initial network, the proposed algorithm identifies the pair of nodes whose merging would suppose the smallest information loss, i.e. that share most of their connections. The analysis of two nodes i and j thus entails, firstly, the creation of a vector of differences m, with m k = 1 − δ a i,k ,a j,k and δ being the Kronecker Delta. Secondly, the information encoded by m is assessed through the classical Shannon's entropy, defined as: p 0 and p 1 being respectively the frequency of zeros and ones in m, and N the number of nodes in the network. Note that I i,j represents the quantity of information required to reconstruct j's connections given i's ones; and thus the quantity of information lost when both nodes are merged.
The pair of nodes minimising I are then merged, and the quantity of information lost in the process is approximated by I. The process is iteratively repeated, until one single node remains, being the final Information Content IC the sum of the information lost in all steps.
As shown in a previous work 15 , low IC values indicate the presence of some kind of regularity in the link arrangement, including communities, hubs, or core-periphery configurations.

B. Comparing two networks
Suppose two networks, each one described by a corresponding adjacency matrix A 1 and A 2 , which have been observed under different conditions. Firstly, the most simple case includes two independent networks, representing two different systems -albeit of the same size, i.e. the same number of nodes. Secondly, these adjacency matrices can represent different layers of a multiplex network 19 . Finally, the networks may represent different snapshots of the same time-evolving system 20 . In all cases, changes between A 1 and A 2 can be encoded in a matrix D = |A 1 − A 2 |, whose element d i,j is equal to 1 when the corresponding link has changed in the two analysed networks, and zero otherwise. Note that D can be interpreted as the adjacency matrix of a network whose links depict a corresponding change between A 1 and A 2 .
With respect to the meso-scale structure of the difference network D, only two situations can be encountered. First, changes between A 1 and A 2 can be random, for instance due to measurement noise, or more generally due to uncorrelated forces; D would then resemble the adjacency matrix of a random network. Second, if changes between A 1 and A 2 are somehow correlated, the resulting network should present some kind of meso-scale structure. For instance, if changes only affect the connections of one node, D will be star-like shaped.
All intermediate situations, e.g. with only a part of the links modified at random, can be interpreted as a special (and noisy) case of the latter situation.
If changes are not random, and thus are correlated and form a meso-scale structure, the latter should be detected by the IC metric. An algorithm for the comparison of different networks can thus be designed, composed of the following steps: i) calculate D as |A 1 − A 2 |; ii) calculate the IC of the network D; iii) compare IC(D) with the value obtained in an ensemble of equivalent random networks. As for the latter point, several ways of normalising the obtained value are available. Firstly, one can simply calculate: where µ(IC r ) is the average Information Content obtained in an ensemble of random networks, with the same number of nodes and links as D. Note that IC * usually takes values in (0, 1), with values close to one indicating a random structure of the network D, and thus a random difference between A 1 and A 2 ; and 0 < IC * < 1, the presence of a structure in the changes. Further note that, while IC * > 1 is possible, it would indicate a structure more random than a random network, and can thus only be the result of random fluctuations.
While IC * provides a quantitative assessment of the structure of changes, it yields little information about the statistical significance of the same. In order to tackle this issue, a normalisation based on a Z-Score can be used: IC † values close to zero indicate random modifications between A 1 and A 2 , while negative values indicate modifications driven by some structure. The advantage of this formulation is that IC † can easily be transformed into a p-value, provided IC r follows a normal distribution -condition that is not fulfilled only for very small random networks.

C. Validation on synthetic networks
A simple way of validating the proposed algorithm involves the use of a set of controlled evolutions, i.e. governed by rules ensuring that the start and end points are known topologies. Given these two networks A start and A end , we construct a third network A whose links are drawn from A end with probability α, and from A start with probability 1 − α; and finally compare A with the initial network A start . Note that, for α = 0, A = A start and D = 0 N ×N ; on the other hand, α = 1 implies that A = A end and D = |A end − A start |. Therefore, α controls the degree of morphing between A start and A end .
Several evolutions of interest are analysed in Fig. 1. The four columns, from left to right, respectively represent the initial (rewiring α = 0) and final (α = 1) networks; D, for the maximum rewiring α = 1; and the evolution of the log 10 of the p-value of IC † , as a function of the rewiring α, calculated between the original and the rewired network. While, for the sake of clarity, the depicted adjacency matrices have a small size, all results have been obtained with networks of 100 nodes and 100 random realisations.
The first row describes the rewiring of a random network into a second random one.
As there is no correlation nor structure between the links that have changed, the resulting matrix D presents a random connectivity and no meso-scale; consequently, the drop in IC never becomes statistically significant, as depicted in the right panel. The second example, while similar, presents an important difference: if both the initial and final networks are random, the second is obtained by reversing the set of neighbours of one single node -see the corresponding matrix D. Note that, in this case, while the initial and final points are random, the evolution process is a structured one. This is correctly detected by the proposed metric, with the p-value dropping below 0.01 for α ≈ 0.15.
Similar behaviours are observed in the third and fourth examples, which describe two different networks converging towards a community structure. As creating or modifying a community requires links to be activated and de-activated in a targeted way, the metric detects the presence of a meso-scale in D. Finally, the latter example consists of a situation in which both the starting and final networks have the same community structure, being both contaminated by random noise. Accordingly, the difference between both has a random nature, and the p-value never becomes statistically significant.
Some general conclusions can be drawn from these results. Firstly, and most importantly, the structure of the two networks A 1 and A 2 is not relevant; instead, only the changes that are required to evolve from the former to the latter are. Specifically, two completely random networks may be associated with a structured change between them; and two well-structured networks may differ in a random fashion. Secondly, the presence of a statistically significant

Correlation
An interesting, and yet simple way of comparing two networks, or two layers in a multiplex network, is to calculate the correlation between the links present in both of them. In other words, given two networks A 1 and A 2 , the correlation expresses the probability that if a A 1 i,j = 1, then a A 2 i,j = 1. More generally, one can calculate a global overlap O A 1 ,A 2 as the total number of pair of nodes connected at the same time by a link in networks A 1 and A 2 , as proposed in Ref. 16 , i.e.: O does not provide information on the underlying mechanism driving such difference, as a same correlation value may be the result of random or structured changes.

von Neumann entropy
The von Neumann entropy (S V N ) is a metric that was initially introduced in quantum mechanics to assess the degree of mixing of the quantum states encoded in a probability distribution -and hence in a density matrix ρ. While the concept of a state probability distribution is not defined for complex networks, the metric can still be calculated over any density matrix, i.e. any Hermitian and positive semidefinite matrix. As previously shown 17,18 , S V N can be calculated over the density Laplacian matrix as: where k is the average degree, N the number of nodes composing the network, and It is easy to construct situations in which the difference network D is equal to A r or A m .
For instance, starting from a random network with a link density of 0.5, the first case is obtained when this is compared with another random network with the same size and link density; on the other hand, the second case is obtained by inverting the activation of links in the upper left and bottom right quarters of the adjacency matrix. The behaviour of the von Neumann entropy in these two situations is depicted in Fig. 2 -note that the right panels depict the evolution of the Z-Score of the S V N , as calculated against ensembles of equivalent random networks. It is clear that, even though the von Neumann entropy may be an alternative metric for comparing network structures and has a substantially lower computational cost, our proposed methodology is able to detect more complex changes, and is therefore more reliable in real-world situations.

Random to random
Random to random (structured)

III. APPLICATION TO REAL-WORLD SYSTEMS
A. World-wide air transport network As a first test case, we here consider the network created by flights between the top-50 and top-200 world airports, as extracted from the Sabre Airport Data Intelligence data set.
As previously proposed 25,26 , nodes represent airports, pairwise connected when the total number of passengers per month who used a direct flight between both airports is larger than 1000, i.e. at least ≈ 33 passengers / day. 72 snapshots are available, representing the monthly evolution of the system between January 2010 and December 2015.
The air transport network is known to present a strong seasonality, both on the short (i.e. daily) and long scales (monthly and yearly) 27 . This magnifies the importance of using a correct temporal representation, as projecting the system into a single atemporal network may result in severe topological distortions 28 . This fact is here confirmed by Fig. 3, which represents the evolution of three topological metrics (link density, modularity and assorta- more complete view of the evolution of the network, providing information (specifically, the nature of the changes) that is disregard by other metrics.

B. Hospital contact network
As a second example, we here consider the temporal network of contacts in the geriatric unit of a Lyon university hospital, including patients and health care workers, as described

C. Brain functional networks
As a third case study, we present an analysis of the brain activity of multiple healthy sub- to the overall resting state activity, and therefore not to be equivalent 35,36 .
A quite different picture nevertheless arises when one shifts the focus to subjects.  Let us denote by S the matrix of similarity, whose element s i,j encodes the similarity of the two networks respectively representing the system at times i and j -note that such matrix is completely equivalent to the results presented in Figs. 4 and 5. The auto-correlation of the sequence of N networks, for a time displacement of t > 0, is given by: In the r.h.s. of Eq. 6, the IC measure is used as a proxy of the similarity between two networks; to be more precise, this self-correlation thus assesses how a sequence of networks is intentionally equivalent to itself, excluding the presence of uncorrelated noise (unintentional changes) in the links. C(t) is, by construction, equivalent to the average of the t-diagonal of S, or of the matrices depicted in Figs. 4 and 5.
By calculating C(t) over all ts, it is possible to construct a full correlogram of the evolution of the studied system, with the maxima representing its natural frequencies. In order to illustrate this idea, Fig. 7 depicts the correlograms for the air transport networks (left panel) and the hospital networks (right panel) -the brain functional networks have not here been considered, as they do not represent a temporal evolution. The respective matrices S encode different variants of the IC metric: the log 10 of the p-value of IC † for the former (Fig. 5), and IC * for the latter (Fig. 4); as a consequence, the y axis of the two panels have different scales. This is not a problem as long as the meaning is similar; in this case, both IC * → 1 and IC † → 0 indicate highly similar networks, and both lie in the top part of the graph. As should be expected, the maximum in both correlograms is located at t = 0. Local minima can additionally be found at 24k (k ∈ Z) for the hospital data set, corresponding to a daily activity cycle; and at 12k (k ∈ Z) in the case of the air transport, indicating a yearly seasonality.
As a final issue, it has to be highlighted that any other metric can be used within Eq. 6 to calculate this correlogram, as for instance the global overlap O. Results would nevertheless have a different meaning, as the IC allows not just to quantify the time required for returning to a given configuration, but also to ensure that differences are not due to structural changes.

V. DISCUSSION AND CONCLUSIONS
Beyond the assessment of the raw differences between two networks, a more complex and challenging problem is to detect if these differences are due to random modifications or to organised forces. The two problems are complementary and not necessarily correlated. The network structure of a system may substantially change between two measurements, but still be the same topology deformed by strong observational noise. On the other hand, small changes may be due to the targeted (intentional) attempt of, e.g., promoting a node.
In this contribution we presented the use of Information Content 15 as a way of assessing the presence of meso-scale structures in the difference between two networks. The effectiveness of the metric has been demonstrated in several synthetic network evolutions, and tested with three real data sets respectively representing social, technological and biological systems. We additionally discussed the differences between the proposed approach and two a priori similar metrics, i.e. the network correlation 16 and the von Neumann entropy 17,18 .
The availability of a similarity metric further allows to adapt some standard techniques in time series analysis to the study of the evolution of networked systems. We here considered the case of self-correlations and correlograms, and showed that the natural frequency of the system, in terms of recurrence of intentional network changes, can be estimated by the maxima in the network self-correlation. While not explicitly discussed here, the proposed analysis can be extended to the more general case of the cross-correlation, in which multiple sequences of networks, for instance representing two or more systems, can be pair-wise analysed. Correlograms could also be used to select the best time resolution for sampling temporal networks, a topic still to be explored 39 .
As a final thought, an hidden assumption of this work is that the networks to be compared are expected to be topologically compatible, i.e. to have the same number of nodes.
While this holds for multiplex networks, general multi-layer and temporal graphs can have a variable size. The proposed methodology can still be used, provided an initial pre-processing is performed: for instance, the cores composed of nodes common to both networks could be isolated; while some information would be lost, the main evolutive trends could still be characterised. Furthermore, networks coming from different systems, e.g. respectively representing brain activity and air transport, could in principle be compared. Nevertheless, it would firstly be necessary to match nodes between both networks, that is, to create a map relating each node of the first network with the topological equivalent one of the second, by means of e.g. the SimRank 40 or similar algorithms.