^{1,2}

^{2,3}

^{2,3}

^{2,4}

^{1,2}

^{2}

^{1}

^{2,3}

^{1,2}

^{1}

^{2}

^{3}

^{4}

The application of Graph Theory to the brain connectivity patterns obtained from the analysis of neuroelectrical signals has provided an important step to the interpretation and statistical analysis of such functional networks. The properties of a network are derived from the adjacency matrix describing a connectivity pattern obtained by one of the available functional connectivity methods. However, no common procedure is currently applied for extracting the adjacency matrix from a connectivity pattern. To understand how the topographical properties of a network inferred by means of graph indices can be affected by this procedure, we compared one of the methods extensively used in Neuroscience applications (i.e. fixing the edge density) with an approach based on the statistical validation of achieved connectivity patterns. The comparison was performed on the basis of simulated data and of signals acquired on a polystyrene head used as a phantom. The results showed (i) the importance of the assessing process in discarding the occurrence of spurious links and in the definition of the real topographical properties of the network, and (ii) a dependence of the small world properties obtained for the phantom networks from the spatial correlation of the neighboring electrodes.

The concept of brain connectivity (i.e., how the cortical areas communicate one to each other during the execution of a specific task) is central for the understanding of the organized behavior of cortical regions beyond the simple mapping of their activity [

Cortical connectivity estimation techniques aim at describing interactions between cortical areas as connectivity patterns holding the direction and strength of the information flow between such areas. The functional connectivity between cortical areas is then defined as the temporal correlation between spatially neuronal events and it could be estimated by using different methods both in time as well as in frequency domain based on bivariate or multivariate autoregressive models [

The extraction of salient characteristics from brain connectivity patterns is a challenging topic, given the often complex structure of the estimated cerebral networks. For this reason, in the last ten years, a graph theoretical approach was proposed for the characterization of the topographical properties of real complex networks [

The computation of graph indexes can be performed on adjacency matrices achieved by applying a threshold on the estimated connectivity values obtained by means of different estimators. The application of a thresholding procedure allows to convert the connectivity values into edges. An edge connecting two nodes exists if the connectivity value between those nodes is above a certain threshold; otherwise the edge is null. The choice of the threshold should not depend on the application and if done in an arbitrary way could affect the results. In fact, the threshold influences the number of connections considered for the subsequent graph analysis and thus affects the indices extracted from the networks [

All the approaches described above are empirical and do not take into account the intrinsic statistical significance of the estimator used in functional connectivity estimation process. In fact, when the adjacency matrix is achieved by imposing a threshold and fixing the number of residual connections of the network, we cannot exclude a priori that a percentage of such residual connections is estimated by chance. The idea is thus to take into account the statistical significance of the estimator used for functional connectivity estimation in the construction of adjacency matrix. In the case of PDC, the threshold is extracted by applying a percentile, for a defined significance level, on the distribution achieved for such estimator in the null case. Thus, an edge exists in the adjacency matrix describing the considered network only if it is statistically different from the null case.

Due to the nonlinear dependence of PDC estimator from the parameters of MVAR, the theoretical distribution of PDC in the null case in not known, so it should be constructed is an empiric way. The shuffling procedure, which has been introduced in 2001 for the similar estimator of directed transfer function (DTF) [

The general aim of this study is to understand how the methods for extracting the adjacency matrix could affect the graph theory indices and their interpretation, in order to define a reliable approach for the derivation of salient indices from connectivity networks estimated by means of multivariate methods. In particular, we used two different datasets with the purpose of comparing one of the methods extensively used in graph theory applications for extracting adjacency matrices from the connectivity patterns (i.e., the method based on fixing the edge density) with the statistical validation of achieved connectivity patterns by means of a shuffling procedure. The first dataset we used consisted of a set of random uncorrelated signals, which should represent a null model for functional connectivity estimates and a random case for graph theory indices. In fact, since no correlation exists between signals, the connectivity estimation process should almost entirely discard the information flows between signals, leaving only a few percentage of connections, estimated by chance and organized according to a random network. This dataset can be seen as an ideal “null case” model, but it does not take into account some factors strictly related to an electroencephalographic recording, such as the existence of a correlation between the recorded signals, due to effects of volume conduction, to the spatial positions of electrodes disposed on the scalp, and to the location of the reference [

We estimated the functional connectivity patterns associated to both applications and we extracted the correspondent adjacency matrices by means of two approaches: fixed edge density

The PDC [

It is then possible to define PDC as

PDC values are in the interval

Even if this formulation derived directly from information theory, the original definition was modified in order to give a better physiological interpretation to the estimation results achieved on electrophysiological data. In particular, two modifications have been proposed. First, a new type of normalization, already used for another connectivity estimator such as directed transfer function [

Random correlation between signals induced by environmental noise or by chance can lead to the presence of spurious links in the connectivity estimation process. To assess the significance of the estimated patterns, each value of functional connectivity has to be statistically compared with a threshold level which is related to the lack of transmission between the considered signals at a certain probability. A possible procedure is to generate an empirical distribution of the null case based on the generation of sets of surrogate data [

The statistical validation process has to be applied on each couple of signals for each frequency sample. This leads to the execution of a high number of simultaneous univariate statistical tests, with consequences in the occurrence of type I errors (false positives). The statistic theory provides several techniques that can be usefully applied in the context of the assessment of connectivity patterns in order to avoid the occurrence of false positives [

This means that if

This

The Bonferroni method can be too conservative, for instance when the statistical tests are highly dependent, like in the case of physiological measurements. This may lead to an increase of Type-II errors (false negatives). To mitigate the severity of Bonferroni approach, the false discovery rate (FDR) approach was proposed [

Let

At the end, the hypothesis

A graph consists of a set of vertices (or nodes) and a set of edges (or connections) indicating the presence of some sort of interaction between the vertices. The adjacency matrix

Once the functional connectivity pattern is estimated, it is necessary to define an associated adjacency matrix for each network, on which graph theory will be applied to extract salient indices able to characterize the network properties. The generic

Different approaches have been developed for evaluating the threshold values, as already described in Section

Different indices can be defined on the basis of the adjacency matrix extracted from a given connectivity pattern. In this study, we evaluated the most commonly used, described as follows.

The characteristic path length is the average shortest path length in the network, where the shortest path length between two nodes is the minimum number of edges that must be traversed to get from one node to another. It can be defined as follows:

The clustering coefficient describes the intensity of interconnections between the neighbors of a node [

A network

The first dataset we used to compare the two approaches was generated to build the null case (complete lack of correlation between the signals). To this purpose, we generated random datasets of signals with the same average amplitude and the same standard deviation of the data acquired on the mannequin head (see following paragraph for details) to avoid differences between the two datasets due to different signals amplitudes. In particular, each dataset is composed by 20 signals segmented in 50 trials of 3s each. 20 electrodes are the typical number of sensors used for connectivity measures estimated by means of multivariate method on scalp EEG signals.

In the following, we will refer to this dataset as “simulated data”.

We simulated an EEG recording on a head of a synthetic mannequin by using a 61-channel system (Brain Amp, Brain-Products GmbH, Germany). The sampling frequency was set to 200 Hz. In order to keep the impedance below the 10 kΩ, the mannequin was equipped with a cap positioned over a humidified towel. It must be noted that there were not electromagnetic sources inserted within the mannequin’s head, that is instead composed only by polystyrene. Thus, the mannequin head cannot produce any possible electromagnetic signals on the electric sensors disposed on the recording cap. Figure

Experimental setup employed for the simulated electrical recording on a mannequin head by means of a 61-channel EEG cap. The polystyrene mannequin head was posed in front of a screen to include the interferences on signals due to the presence of a monitor.

We referred to this dataset as “mannequin data.”

Both datasets were subjected to the same signal processing procedure, made by the following steps:

generation of 20 simulated signals (simulated data) or selection of 20 channels randomly chosen among the 61 used for the recording (mannequin data);

functional connectivity estimation, performed by means of sPDC;

extraction of the correspondent binary adjacency matrices by applying a threshold

by means of shuffling procedure for a significance level of 5% in two conditions: (i) not corrected for multiple comparisons and (ii) adjusted for multiple comparisons by false discovery rate, and

by fixing the edge density

extraction of the graph indices described above from the adjacency matrices achieved with both methodologies;

normalization of the indices achieved at point 4 with those extracted from 100 random graphs generated by maintaining the same number of connections of the correspondent adjacency matrix, to normalize the values to the model dimension.

The signal processing procedure (point 1 to 5 of the previous paragraph) has been repeated 50 times to increase the power of the statistical test (ANOVA) computed for comparing the two different modalities used for the extraction of the adjacency matrices.

We computed a two-way ANOVA with each graph index as dependent variable. The main factors were

the method used for extracting adjacency matrices (METHOD), with two levels;

shuffling procedure,

fixed Edge Density procedure;

the edge density (EDGE) corresponding to two cases:

Case 1: percentage of edges survived to the shuffling procedure for a significance level 5% not corrected. This percentage was resulting from the application of the shuffling procedure and was consequently imposed also to the fixed edges procedure, to avoid different performances due to different densities,

Case 2: percentage of edges survived to the shuffling procedure for a significance level 5% corrected by FDR. Same procedure described above.

To describe how we selected the edge density to be used in the two approaches, we reported in Figure

Distribution of the edge density characterizing the adjacency matrices extracted during the different iterations of the connectivity estimation process on simulated data in two different cases: Case 1 (a) → percentage of edges survived to shuffling procedure for a significance level of 5% not corrected for multiple comparison; Case 2 (b) → percentage of edges survived to shuffling procedure for a significance level of 5% corrected for multiple comparisons by means of FDR.

This first result confirmed the importance of statistical validation process combined with the correction for multiple comparisons. In fact, only the application of the shuffling procedure in the FDR case allowed to discard spurious links (obtained in this case on random, uncorrelated signals) at the correct level (below 5%). The edge densities obtained for the shuffling procedure, reported in Figure

The two approaches were statistically compared by means of an ANOVA performed as described in Section

The ANOVA analysis was computed considering the small-worldness index as dependent variable and the methods used for adjacency matrices extraction (METHOD) and the edge density of the achieved adjacency matrix (EDGE) as within main factors. The main factor METHOD was composed by two levels: shuffling procedure, fixed edge density method. The main factor EDGE was composed by two levels: Case 1 (edge density associated to significance level 5%, not corrected for multiple comparisons) and Case 2 (edge density associated to significance level 5%, FDR corrected). Results revealed a statistical influence of the main factors METHOD (

In Figure

Results of ANOVA performed on the small-world index computed on networks inferred from simulated data, using METHOD and EDGE as within main factors. The diagram shows the mean value for the small-worldness computed on the adjacency matrices extracted by means of the shuffling procedure (blue line) and fixed edge density method (red line) in Case 1 (edge density as described in Figure

To understand if the erroneous attribution of small-worldness to the networks achieved by means of the fixed edge density method is mainly due to the clustering coefficient or to the characteristic path length, correlations between the small-worldness index and these two indices were computed for the two different edge densities. The results achieved in the Case 2 (edge density as Figure

Scatterplot of small-worldness versus clustering coefficient (a) and small-worldness versus path length (b) for each iteration of the adjacency matrix extraction process computed by means of fixed edge density method for edge densities correspondent to those achieved in Case 2 (as from Figure

The simulated dataset used as null model for functional connectivity estimations represents an ideal case, because it does not take into account the spatial correlation between neighboring electrodes which always occurs during an EEG recording. For this reason, we used a second dataset, composed by signals acquired simultaneously from a mannequin head equipped with a cap positioned over a humidified towel, which, with its absence of physiological signals but with its correlation between neighboring electrodes, represents the null model for connectivity inferred from signals acquired during an EEG experiment. In the second dataset, we randomly selected 20 channels among the 61 acquired (same number of signals used for simulated data) and subjected them to functional connectivity estimation process. Then the correspondent adjacency matrix was extracted by means of the two considered methods and some graph indices, such as small-worldness, path length, and clustering coefficient, were computed. The indices were normalized with the values obtained from 100 random graphs generated by keeping the number of connections of the correspondent adjacency matrix. This process was repeated 50 times in order to increase the robustness of the following statistical analysis.

The shuffling procedure was applied for a significance level of 5%, both in the not corrected case and in the case of FDR correction. In Figure

Distribution of the edge density characterizing the adjacency matrices extracted during the different iterations of connectivity estimation process on mannequin data in two different cases: Case 1 (a) → percentage of edges survived to shuffling procedure for a significance level of 5%, not corrected for multiple comparisons; Case 2 (b) → percentage of edges survived to shuffling procedure for a significance level of 5%, FDR corrected.

The same statistical analysis described in the previous paragraph for simulated data was computed on graph indices extracted from mannequin data networks. In the ANOVA, computed considering the small worldness as dependent variable and the methods use for adjacency matrices extraction (METHOD) and the edge density of the achieved adjacency matrix (EDGE) as within main factors, the main factor METHOD was composed by two levels: shuffling procedure and fixed edge density method. The main factor EDGE was composed by two levels: Case 1 (edge density as in Figure

In Figure

Results of ANOVA performed on the small-worldness index computed on networks inferred from mannequin data, using METHOD and EDGE as within main factors. The diagram shows the mean value for the small worldness computed on the adjacency matrices extracted by means of Shuffling procedure (blue line) and fixed edge density method (red line) in two cases, Case 1 (edge density as in Figure

In order to understand which indices, between the clustering coefficient and the characteristic path length, mainly contributed to the small worldness of the networks achieved by means of shuffling procedure and fixed edge density method, correlations between the small-worldness index and these two indices were computed for the two edge density cases. The results achieved in the case of edge density correspondent to Case 2 (edge density as Figure

Scatterplot of small-worldness clustering coefficient ((a) and (c)) and small- worldness versus path length ((b) and (d)) for each iteration of the adjacency matrix extraction process computed by means of shuffling procedures (first row) and fixed edge density method (second row) for edge densities correspondent to those achieved in Case 2 (edge density as in Figure

The strong dependence of graph measures from the number of nodes, the edge density, and the degree of the networks under analysis should lead to reflect on the modalities used for adjacency matrix extraction [

The results presented in this section allow to discuss about some open problems which affect the application of graph measures to the functional connectivity estimates.

The first issue addressed in the present paper is the necessity to statistically validate the connectivity measures in order to discard the spurious links due to random fluctuations of the signals considered simultaneously in the multivariate [

A second issue to be considered as relevant in graph theory concerns the modality in which the adjacency matrix is extracted from the connectivity network. As already said in the previous sections, the threshold choice is crucial for the computation of graph measures because it affects the topographical properties of real networks. In the present study, we made a comparison between one of the methods extensively used in graph theory applications for extracting adjacency matrices from the connectivity patterns (i.e., the method based on fixing the edge density) and an approach based on the statistical validation of achieved connectivity patterns by means of a shuffling procedure, to describe the effects of the modalities for adjacency matrix extraction on the “small-world” properties of the network. The results achieved on simulated data highlighted small-world properties of the analyzed networks even in random, uncorrelated data, when the fixed edge density method was applied. Such small-worldness is mainly correlated with an increase of the clustering coefficient and disappeared when shuffling procedure was used. The fixed edge density criterion led to an erroneous diagnosis of small-worldness for the connectivity patterns estimated on simulated data, independently from the edge density chosen. In fact, the simulated data, being uncorrelated, should produce connectivity patterns without any topographical properties of small-worldness. These results led to two conclusions. The first is that the shuffling procedure does not just preserve the strongest connections, as demonstrated by different results obtained by means of fixed edge density which is based on this criterion. It means that the significance of a link is not merely related to its strength. The second conclusion is that the choice of an empirical threshold can affect so much the topography of the network that an erroneous definition of small-worldness could result. Thus, a statistical validation, combined with multiple comparisons adjustments, to be applied on connectivity networks, is necessary to define the significance of each edge within the adjacency matrix, in order to extract graph measures able to describe the real properties of the considered network.

The results achieved on mannequin data showed small world properties of the networks extracted by applying both methodologies. In this case, the shuffling procedure couldn’t prevent the description of mannequin networks as small world networks, even applying the corrections for multiple comparisons, but the entity of small-worldness is lower than those achieved by means of fixed edge density method. In both cases the small-worldness is equally correlated with an increase of the clustering coefficient and with a decrease of the path length. This effect could be explained with the existence of real correlations between electrodes, which can occur in real EEG data, due to volume conduction effect and to the location of the reference [

The present work aims at highlighting some erroneous results that can be obtained by the application of commonly used approaches for the extraction of adjacency matrix from connectivity patterns, and to describe how such procedures can affect the topographical properties of a network inferred by means of graph measures. For this reason, we computed a statistical comparison between one of the methods extensively used in graph theory applications for extracting adjacency matrices from the connectivity patterns (i.e., fixing the edge density) with an approach based on the statistical validation of achieved connectivity patterns by means of a shuffling procedure. The results achieved on simulated data highlighted the importance of a statistical validation of connectivity patterns which allows from one side to prevent the occurrence of false positives due to random fluctuations of signals, and from the other side to extract graph measures able to describe the real properties of the considered network. The results achieved on mannequin data showed an effect of the spatial correlations between electrodes and of the location of the reference on small-worldness index. Such effect could be mitigated by applying methodologies for the reconstruction of cortical sources.

This work is supported by a grant of Ministero dell’Istruzione, dell’Università e della Ricerca, in a bilateral project between Italy and Hungary and by the European STREP Program—Collaborative Project no. FP7-287320-CONTRAST. Possible inaccuracies of information are under the responsibility of the project team. The text reflects solely the views of its authors. The European Commission is not liable for any use that may be made of the information contained therein.