Rhythms of the collective brain: Metastable synchronization and cross-scale interactions in connected multitudes

Crowd behaviour challenges our fundamental understanding of social phenomena. Involving complex interactions between multiple organizational levels and spanning a variety of temporal and spatial scales of activity, its governing mechanisms defy conventional analysis. Using a data set comprising 1.5 million Twitter messages from the 15M movement in Spain as an example of multitudinous self-organization, we investigate the processes that underlie the coordination its spatial and temporal scales. We propose a generic description of the coordination dynamics of the system measuring phase-locking statistics at different frequencies using wavelet transforms, identifying 8 frequency bands of entrained oscillations between 15 geographical urban nodes. Then we apply maximum entropy inference methods to describe Ising models capturing transient synchrony between geographical nodes in our data at each frequency band. The models show that 1) all frequency bands of the system are operating near critical points of their parameter space and 2) while fast frequencies present only a few metastable states displaying all-or-none synchronization, slow frequencies present a diversity of metastable states of partial synchronization. Furthermore, describing the state at each frequency band using the energy landscape of the corresponding Ising model, we compute transfer entropy to characterize cross-scale interactions between frequency bands, showing 1) a cascade of upward information flows in which each frequency band influences its contiguous slower bands and 2) downward information flows where slow frequencies modulate distant fast frequencies.


Introduction
Coordinated activity is a powerful force in creating and maintaining social ties [1]. From communal dances in ancient human groups to civic festivals in the French Revolution or collective muscular manifestations in Nazi marches and rallies [1, p.136, p.148-149], visceral, emotional sensations of shared movement have been used to create communal identities and to shape political landscapes. Historically, forms of distributed communication and coordination have often come together with episodes of large-scale mobilization and social change, as the widespread print-shop networks of radical reforming movements with the generalization of the printing press during the 16th century German Reformation [2] or postal networks of the Republic of Letters in the Age of Enlightenment a century later [3]. Today, amidst unprecedented development of communication technologies, new forms of coordination for large and scattered communities have been unleashed around the globe.
The rise of new digital communication tools and network technologies is accelerating fast bidirectional communication, generating new forms of collective communication and action. Digital communications increase the autonomy and influence of the social groups using them facilitating forms of mass self-communication [4], collective intelligence using pools of social knowledge [5] or smart mobs exploiting new found communication and computing capabilities via ubiquitous devices [6]. From protest movements including the Arab Spring or the Occupy movements to autonomous responses in the face of natural disasters, for example Hurricane Sandy or the Tōhoku earthquake, several examples highlight the increasing power of digitally connected social and political grassroots movements to shape events. Recognition of a growing influence has brought with it heightened scholarly interest in its explanation: how such movements arise and self-organize, what mechanisms underlie their formation and how are they able to constitute autonomous social and political subjects? [7]. Recent advances have described specific elements of connected multitudes: the geographical diffusion of trends [8]; the interplay between exogenous and endogenous dynamics [9]; or the connection between social media and collective activity in physical spaces [10]. Nevertheless, many of the mechanisms so far explored are specific to a particular scale or level of description of social dynamics. General mechanisms offering explanatory insights across different levels remain poorly articulated. The same problem applies to qualitative analyses trying to capture general principles of connected multitudes. These include perspectives stressing the individualistic logic pervading digital communication tools operating through sharing personalized content in social media [11], in sharp contrast

Results
We use a data set of 1,444,051 time-stamped tweets from 181,146 users, collected through the Twitter streaming API between 13 May 2011 and 31 May 2011 [20] using T-Hoarder [24]. Messages were captured during 17 days during the Spanish 15M social unrest events in 2011 containing at least one of a set of 12 keywords or hashtags related to the protest (see reference [20] for a detailed description). We extracted geographical information from the location information of users (see SI Appendix), selecting the 15 urban areas with the largest number of messages. Using this information, we generated time-stamped series reflecting the number of tweets emitted from each city for intervals of 60 seconds.

Synchronization at multiple frequencies
One of the most prominent features of the 15M movement was its fast territorial development. Without any coordination centre or any formal organization, the movement was able to reproduce a network of camps across Spanish cities in a period of a few days. As this coordination between geographical nodes takes place at several temporal scales, we propose a generic description of these interactions based on the temporal coordination of oscillations at multiple frequencies. We analyse the coordination between populations at main Spanish cities using Morlet wavelet filtering to extract the phase content θ i ( f ,t) of the activity time series at city i at time t and frequency f , with a span of frequencies in the range [1.67 · 10 −3 Hz, 9.26 · 10 −5 Hz] (from 10 minutes to 3 hours) logarithmically distributed with intervals of 10 0.01 . We use phase-locking statistics [25] to define phase-locking values between two cities i and j as: where δ is the size of the window of temporal integration: δ = n c f , being n c the number of cycles in which we analyse phase-locking. We use a value of n c = 8 cycles, similar to the values typically used in neuroscience, ensuring that we are detecting sustained synchronization. A i j (t) is a corrector factor removing spurious synchronization when the network is inactive (e.g. during nighttime, see SI Appendix).
Statistical significance of phase-locking values is determined by comparing them to phase-locking values of surrogate time series obtained using the amplitude adjusted Fourier transform [26]. We use 200 surrogate time series to estimate a significance threshold for the values of φ i j ( f ,t) for all values of f . The average phase-locking values of surrogate time series were used to compute a threshold φ th ( f ), indicating a value higher than 99% of surrogate data. Using this threshold, we define phase-locking links between two cities i and j as statistically salient values of φ i j ( f ,t): As we document in reference [27], using phase-locking statistics we find widespread moments of significant synchrony at different instants often corresponding with important moments of the 15M protests. For illustrative purposes, in Figure 1 we show the total number of phase-locking links S( f ,t) = ∑ i, j Φ i j ( f ,t) for a specific day of the protests. At faster frequencies (lower period) we observe short and less intense instants of synchrony, while at slower frequencies synchrony lasts for longer periods of time. Using wavelet pattern matching [28] over S( f ) = S( f ,t) after applying a linear detrending, we detect frequency peaks of synchronization in the system (see SI Appendix), identifying eight main frequency bands of synchronization f k , k = 1, ..., 8, where larger k corresponds to larger timescales (i.e. slower frequencies).

Pairwise maximum entropy modelling of phase-locking statistics
In order to inspect how these phase-locked coalitions are operating at each frequency band, we derive from our data statistical mechanics models of the system. With these models we can infer macroscopic properties from microscopic descriptions of the system. Using maximum entropy models we infer the probability distribution of possible states s of the network, corresponding to all the combinations of binary possibilities of each node being or not being phase-locked to other nodes in the network at a particular frequency of synchronization. For simplicity, we consider the state of a node i being phase-locked to a synchronized cluster, that is s i = 1, when it has at least one synchronization link and (i.e. Φ i j = 1 for at least one value of j), and otherwise the state of the node is set to s i = −1. We extract a pairwise maximum entropy model described by the Boltzmann distribution of an Ising model. This is the least-structured model that is consistent with the mean activation rate and correlations of the nodes in the network. Pairwise correlation maximum entropy models have been successfully used to map the activity of networks of neurons [29], antibody sequences [30] or flocks of birds [31]. These models, instead of being postulated as approximations of real phenomena, can infer exact mappings capturing measured properties of a system (means and correlations in our case), making them good candidates for capturing the structures underlying social coordination.
The maximum entropy distribution consistent with a known average energy is the Boltzmann distribution P(s) = Z −1 e −β E(s) , where s is a state of the network, Z is the partition function and β = 1 T k B , being k B Boltzmann's constant and T the temperature. The energy of the model with pairwise interactions is defined as E(s) = − ∑ i h i s i − 1 2 ∑ i< j J i j s i s j , where 'magnetic fields' h i represent influences in the activation of individual nodes and 'exchange couplings' J i j stand for the tendencies correlating the activity between nodes. Without loss of generality we can set the temperature T = 1. Considering a pairwise model, the resulting distribution of the maximum entropy model is: where the h i and J i j are adjusted to reproduce the measured mean and correlation values between nodes in the network.
From the frequency bands f k extracted in the previous section we extract models of pairwise correlations at the corresponding frequencies. For each frequency band, we infer an Ising model P f k (s) solving the corresponding inverse Ising problem, using a coordinate descent algorithm (see Methods) for fitting the parameters h i and J i j that reproduce the means and correlations found in the series of states s for the description of phase-locking relations at each frequency.
The accuracy of the inferred models can be evaluated by testing how much of the correlation structure of the data is captured. One measure to evaluate this is the ratio of multi-information between model and real data [32]. In our case, our data limits us to computing the entropy of small sets of nodes (between 5 and 7, see SI Appendix). Limiting our entropy calculations to random sets of five to seven nodes, we can see in Figure S2 and Table S3 that our models are able to capture around 70% of the correlations in the data for subsets of the indicated sizes (see SI Appendix for a detailed description).
Once we have extracted a battery of models P f k (s), indicating the probability distributions of phase-locking configurations at different frequency bands, we explore the thermodynamic (macroscopic) properties associated to them. First, we observe how all the models are poised at critical points. One signature of criticality we find is that the probability distribution of P f k (s) follows a Zipf's law ( Figure 2.A), specially for slower values of f k . Finding a scale-free distribution in our model is consistent with power laws appearing in the dynamics of the temporal series of tweet activity found in this data set [27] or in structural parameters in similar data sets [22]. Furthermore, the Ising models allow us to find further evidence of the critical behaviour of the model by exploring divergences in their heat capacity. By introducing a fictitious temperature value changing the temperature parameter T (previously assumed to be equal to 1), we compute the heat capacity of the system as: where H[P(s)] is the Shannon Entropy of the probability distribution of an Ising model. A divergence in the heat capacity of the system is an indicator of critical phenomena. As we observe in Figure 2.B, for all f k the peak of the heat capacity is around the value T = 1 suggesting that the models are poised just at critical points. Inferring the Ising models to match subsets of the network nodes (see SI Appendix), we observe how the normalized peak in the heat capacity diverges with the system size (the specific -although representative -case of f 5 is shown in Figure 2.C), where C(T )/N grows with N with a linear rate in the range [0.012, 0.021] (see SI Appendix). Together with the Zipf distribution, the divergence of the heat capacity strongly suggests that social coordination phenomena in the 15M social network are operating in a state of criticality [32]. The fact that all frequency bands are operating near critical points does not mean that they are displaying the same behaviour. We can extract more information about the behaviour of the system at each frequency by analysing the presence of locally-stable or metastable states in the system. Metastable states are defined as states whose energy is lower than any of its adjacent states, where adjacency is defined by single spin flips. This means that in a deterministic state (i.e. a Hopfield network with T = 0) these points would act as attractors of the system. In our statistical model metastable states are points in which the system tends to be poised, since their probability is higher than any of its adjacent states. Finding the metastable states of the models at each frequency, we observe how the number of metastable states increases for slower frequencies ( Figure S4.B), as the model presents a higher number of negative (inhibitory) couplings J i j (see Figure S4.A).
Moreover, if we count the number of nodes that are phase-locked (i.e. the sum of all nodes with s i = 1) for each metastable state represented in Figure 3, we observe important distinctions among frequency bands. For faster values of f k there are only a few metastable states: a state where all nodes are not phase-locked (i.e. the system is completely desynchronized), and a few values where almost all nodes are phase-locked. Thus, at fast frequencies synchronization rapidly spreads from zero to all nodes in the network. On the other hand, for slower frequencies the number of metastable states grows and the number of phase-locked nodes for each state decreases. This shows that slow frequency synchronizations allows the creation of a variety of clusters of partial synchronization, allowing parts of the network to sustain a differentiated behaviour.
These results suggests that fast and slow synchronization frequencies in the network operate in complementary regimes -all operating near critical points-the former rapidly propagating information to all the network and the latter sustaining a variety of configurations responding to specific situations. Systems in critical points present a wide range of dynamic scales of activity and maximal sensitivity to external fluctuations. These features may be crucial for large systems that are self-organized in a distributed fashion. The presence of these complementary modes of critical behaviour at different frequency bands suggests  that the system might be operating in a state of self-organized criticality, in which frequency bands adaptively regulate each other in order to maintain a global critical behaviour.

Cross-scale interactions in synchronization dynamics
Modelling phase-locking statistics provides a characterization of the interactions within frequency bands of synchronization. Furthermore, differences in the metastable states at each frequency band suggest what kind of interactions take place between distinct temporal scales. Because our definition of phase-locking statistics is restricted to interactions within the same frequency, we cannot use the computed phase-locking statistics to directly model inter-scale phase-locking between different frequencies (e.g. 2:1 phase-locking). However, we can use the thermodynamic descriptions of the system provided by maximum entropy models to simplify the analysis of inter-scale relations in real data. Analysis of multiscale causal relations is typically a difficult task, and in our case we have to deal with a system of a high number of dimensions (15 · 8 = 120 dimensions). Nevertheless, the Ising models describe the stability of the configurations of the 15 nodes in the network at each frequency band with an energy value. Thus, an easier way to describe multiscale interactions is to observe how fluctuations in the energy at one level affect the energy of the system at other levels, reducing the dimensions we have to deal with to only the 8 frequencies of synchronization.
We characterize the information flow between frequency bands using transfer entropy [33] between energy levels at each frequency E f k (s). Transfer entropy captures the decrease of uncertainty in the state of a variable Y derived from the past state of other variable X: where x t denotes the state of X at time t and τ indicates the temporal distance used to capture interactions. In order to compute transfer entropy over energy values between timescales, we discretize the values of energy E f k (s) into a variable with 3 discrete bins E * f k (s) using the Jenks-Caspall algorithm [34]. The value of 3 bins was selected to optimize the computation of joint probability density functions (see SI Appendix) although we tested values from 2 to 6 bins with similar results. Using transfer entropy we estimate the causal interactions between energetic states at each timescale by computing the values of T E * observe upward and downward flows of information. As we can see, upward flows decrease importantly with distance between scales. In contrast, downward flows increase slightly with distance between scales. These results show an interesting picture of cross-scale interactions. While in upward interactions energy at each frequency band only influences neighbouring slower bands, in downward interactions slow frequency bands modulate distant faster bands. We also observe this in the schematic in Figure 5.A, where for simplicity only the largest values of F down ( f k , d) and F down ( f k , d) are displayed for each frequency band. These results suggest that there might be general rules for scaling up and scaling down social coordination dynamics in a nested structure of frequency bands. The mechanisms involved might resemble those found in neuroscience, where upward cascades have been found to take place in the form of avalanches propagating local synchrony and downward cascades take the form of phase-amplitude modulation of local high-frequency oscillations by large-scale slow oscillations [16]. Future research is required for testing the application of these rules to other social coordination phenomena and the specific mechanisms operating behind upward and downward cross-scale interactions.

Discussion
It is appealing to think that general coordinative mechanisms may be suited to explain the behaviour of social systems at different scales. Here, using a large-scale social media data set, we have shown how the application of maximum entropy inference methods over phase-locking statistics at different frequencies offers the prospect of understanding collective phenomena at a deeper level. The presented results provide interesting insights about the self-organization of digitally connected multitudes. Our contribution shows that phase-locking mechanisms at different frequencies operate in a state of criticality for rapidly integrating the activity of the network at fast frequencies while building-up an increasing diversity of distinct configurations 7/9 at slower frequencies. Moreover, the asymmetry between upward and downward flows of information suggests how social systems operating through distributed transient synchronization may create a hierarchical structure of temporal timescales, in which hierarchy is not reflected in a centralized control but in the asymmetry of information flows between the coordinative structures at different frequencies of activity. This offers a tentative explanation of how an unified collective agency, such as the 15M movement in Spain, might emerge in a distributed manner from mechanisms of transient large-scale synchronization. Of particular interest would be to test the extent our findings about the structural and functional relations of social coordination apply to other self-organizing social systems, or their relation with mechanisms of cross-scale interactions known from large-scale systems neuroscience. A new generation of experimental findings based on statistical mechanics models may provide the opportunity to discover the mechanisms behind multitudinous social self-organization.

Data availability
The data employed in this study was kindly provided by the authors of reference [20].
Learning pairwise maximum entropy models from data Ising models are inferred using an adapted version of the coordinate descent algorithm described in reference [35]. The coordinate descent algorithm works by iteratively adjusting a single weight h i or J i j that will maximize an approximation of the change in the empirical logarithmic loss between the observed data and the model, computed through the means and correlations present in the empirical data and the model. The code implementing the coordinate descent algorithm is available at https://github.com/MiguelAguilera/ising.

Data preprocessing
We use a data set of 1,444,051 tweets from 181,146 users, collected between 13 May 2011 and 31 May 2011. This data set was extracted from the Twitter streaming API, which provides information on the time and content of the tweet, as well as information on the sender, including location. Messages were captured when they contained one of the following hashtags or keywords (which were selected as some of the most relevant during the emergence of the 15M movement): #15M, 15-M, #democraciarealya, #tomalacalle, #Nolesvotes, #spanishrevolution, #acampadasol, #acampadabcn, #indignados, #notenemosmiedo, #nonosvamos, #yeswecamp. We filter messages in the data set using the location field in the description of the user that sent the message. Since the 15M was (at least during the first days) mainly an urban phenomena, we analyse geographical interactions between the 15 cities with more activity in Twitter during 17 days of the protests. We find the 15 names of cities most repeated in the data set, and counted messages corresponding to a specific city when the city name appeared in the location field. Since the location is a field of the description of the user, it does not necessarily correspond to the real location of the user at the moment the message was sent. We ran a test on geolocalized Twitter data from Spain, observing that for a set of 20.000 random tweets in a 80.25% the profile location corresponds with the actual geolocation of the user, giving the information of the user's location field a moderately high reliability.

Phase-locking statistics
Time series of activity at each city are generated by counting the number of messages from users located at the city in intervals of 60 seconds for a period comprising 17 days, starting at 2AM May 14th 2011. Each time series is filtered using Morlet wavelets at different frequencies. For each city i and frequency f we extract the phase content θ x ( f ,t) for each moment of time t, with a frequency span between [1.67 · 10 −3 Hz,9.26 · 10 −5 Hz] (from 10 minutes to 3 hours) mapped into a logarithmic sequence with intervals of 10 0.01 .
Phase-locking values are defined for each pair of cities i and j as defined in Equation 1. We introduce a corrector factor A i j (t) to remove spurious synchronization when the network is inactive. A i j (t) is zero when the mean activity of node i or node j for a moving window of 30 minutes is below a threshold of 0.25 times its mean activation, which generally happened during some periods at night.
From phase-locking values we extract phase-locking links, which are activated when the phase-locking value is higher than 99% of a set of 200 surrogate time series we generate for purposes of statistical validation, as indicated in Equation 2. Surrogate time series are generated using amplitude adjusted Fourier transform using the TISEAN software (Available at http://www.mpipks-dresden.mpg.de/˜tisean/Tisean_3.0.1/). Amplitude adjusted Fourier transform surrogates are time series that preserve the power spectrum of a distribution and a distribution of values, but remove the temporal correlations present in the original signal.

Detection of salient synchronization frequency bands
We localize frequency bands synchronization by detecting peaks of salient phase-locking links in the logarithmic frequency space. We compute the mean number of synchronization links for each frequency as the temporal mean of phase locking links at that frequency S( f ) = ∑ i, j Φ i j ( f ,t) ( Figure S1.A). As S( f ) increases with slower frequencies following approximately a log-linear trend, we approximate the trend computing a least squares first order polynomial fit respect to the logarithm of f and remove it from S( f ) obtaining a detrended function. S( f ) * In order to robustly detect peaks, we apply a twodimensional wavelet transform of the detrended S( f ) over the vector of logarithmic frequencies. Using 10 Ricker wavelets  Figure S1.B. of widths from 1 to 10 steps in the selected logarithmic range of frequency (i.e. a range of [1.67 · 10 −3 Hz, 9.26 · 10 −5 Hz] logarithmically distributed with intervals of 10 0.01 ) we compute the wavelet transform matrix and detect its ridge lines to find eight peaks of salient synchronization (code available at https://github.com/scipy/scipy/blob/v0.14. 0/scipy/signal/_peak_finding.py#L410). As the position of the detected peaks vary slightly depending on the parameters employed, we adjust the position of each peaks by climbing to the nearest local maxima if one is found within a distance of two steps. In Figure S1.B we observe the result of the process and the 8 detected peaks. From these peaks we extract 8 frequencies f k , with k = 1, ..., 8, indicating the position of the peaks in S( f ) ( Table S1).

Number of samples required to compute probability distributions
When we compute multi-informations and transfer entropies from the data set, we face a compromise between the size of the probability distribution function of the system (corresponding to 2 N states) and the number of samples we employ for calculating it. In order to correctly compute these probability distributions, we need to ensure that the number of samples found in our data is sufficiently large for describing the frequency of occurrence of all possible states. Although the number of samples in our data is large, as data changes at different frequencies, slower frequencies may present a smaller number of transitions between states than fast frequencies, therefore offering a reduced effective sample of visited states. In order to quantify the number of states visited at each frequency, we count the number of transitions between states s used to infer the Ising models at each frequency (Table S2). Knowing that number, we can estimate a threshold of how many states can have a probability distribution to be accurately estimated from our samples at different frequencies. We arbitrarily establish a requirement of the number of transitions being larger than 2 4 times the number of possible states of the objective probability distribution function. Although the exact value of the threshold is arbitrary, during the analysis we tried different thresholds to ensure the robustness of the results.

Multi-information measures for assessing the accuracy of maximum entropy models
Once we infer the maximum entropy models that correspond to the means and correlations found in phase-locking data, it is important to characterize what is the accuracy of the model, that is, to what extent the statistical model generated is mapping the data we used in the inference. The accuracy of the model can be further evaluated by asking how much of the correlative structure found in the data is captured. We can measure the overall strength of correlations in the network using multi-information, which is defined as the total reduction in entropy relative to an independent model H[P r ] is the entropy of the distribution of the real system whose data we are analysing and H[P 1 ] is the entropy of an independent model. In our case, an independent model would be the equivalent of adjusting an Ising model in which the couplings are zero, and thus its energy function is defined as E(s) = − ∑ i h i s i . Multi-information can as well be used to compute the reduction of entropy of the distributions P 2 of the pairwise Ising model we inferred from data as The ratio between these two quantities gives the fraction of the correlations captured by the pairwise Ising model: Unfortunately, the data available is not enough for reliably computing P r . The probability distribution P r has a number of possible states of 2 15 , while in our data the number of different states transited by the system is one or two orders of magnitude inferior, depending on the frequency. However, we can compute accurately subsets of the complete probability distribution P r (s i ), with {s i } ⊂ {s i }. For each frequency, we count the number of transitions between states found in the time series in our data, and contrast that number with the dimension of the subset of the probability function using a number n of nodes, i.e. 2 n . We use an arbitrary threshold requiring the number of states being at least 2 4 times larger than the number of values of the probability distribution function. Different thresholds yield slightly different results, although they don't change significantly the final results. We find that for frequencies from f 4 to f 8 we can compute reliably subsets with up to 5 nodes. For frequencies f 2 and f 3 the number increases to up to 6 and for f 7 it is 7 nodes.
In Figure S2 we can observe the distribution of the values of I 2 I for 100 random choices of subsets for each number of nodes. We can observe that most of the subsets the values of I 2 I indicate that between 60% and 80% of the correlations are captured (Table S3) The limited availability of data, specially for slower frequencies, prevents us to compute the accuracy of the model for subsets with larger number of nodes. Future analysis applied to larger data sets should test if the accuracy of the model holds for capturing the correlations between larger subsets of nodes.

Parameters of the Ising models
Here we display the parameters h and J inferred for the Ising models at each frequency. As we observe in Figure S5, as we move from faster to slower frequencies, the amplitude of h and J increases. As well, the percentage of negative couplings increases. We compute the ratio of negative couplings as:  Table S3. Distributions of multi-information ratios. Mean and standard deviation for each distribution in Figure S2. In figure S3.A we can observe how the ratio of negative coupling increases with slower frequencies. It is known from spin glass theory that metastable states emerge when some of the couplings between variables are negative. In Figure S3 we can observe how there is a correlation between the ratio of negative couplings and the number of metastable states.

Divergence of the heat capacity
At critical points, derivatives of thermodynamic quantities of the system as the entropy may diverge. An example of this is the heat capacity, whose divergence is a sufficient condition for criticality (though not a necessary one). To test the divergence of the heat capacity of the system, we extract Ising models of different sizes related to each frequency f k . From sizes 5 to 15, we extract models inferring the set of means and correlation from the first N nodes of the system in order of increasing number of tweets emitted. For each model, we compute the normalized heat capacity C(T )/N. In Figure S4.A we observe the divergence of the maximum value of the peaks for sizes N = 6, 9, 12, 15. As size increases, the peak is higher and closer to T = 1. In Figure S4.B we represent the linear trend of the peaks from size 5 to 15. Trends are computed using a least squares first order polynomial fit. We identify trends with slopes in the range [0.012, 0.021]. A linear trend of max[C(T )/N] corresponds to a quadratic increment in the peak of the heat capacity C(T ) as N increases, suggesting a divergence of the heat capacity of the system.

Transfer entropy
Using the energy of the Ising models E f k at different frequencies, we compute transfer entropy by discretizing energy functions into clusterized variables E * f k . We apply natural Break classification through the Jenks-Caspall algorithm (code available at https://github.com/domlysz/Jenks-Caspall.py), which for each cluster minimizes the average deviation from the cluster's mean to determine the best arrangement of values intro different clusters. Since computing transfer entropies requires to compute joint probability functions with three variables, to meet the same criteria we used to compute multiinformation, we use a number of 3 clusters to ensure that for all the frequencies we have a number of samples of transited states which is at least 2 4 times larger than the values in the probability distribution. After discretizing the energy functions we compute the values of transfer entropies T kl (τ) = T E *