MIC as an appropriate method to construct the brain functional network.

Using an effective method to measure the brain functional connectivity is an important step to study the brain functional network. The main methods for constructing an undirected brain functional network include correlation coefficient (CF), partial correlation coefficient (PCF), mutual information (MI), wavelet correlation coefficient (WCF), and coherence (CH). In this paper we demonstrate that the maximal information coefficient (MIC) proposed by Reshef et al. is relevant to constructing a brain functional network because it performs best in the comprehensive comparisons in consistency and robustness. Our work can be used to validate the possible new functional connection measures.


Introduction
Functional connectivity between brain areas is a hotspot in the field of cognitive neuroscience. In brain functional networks, connectivity is often measured using some form of statistical correlation. Brain activity in one region is correlated with activity in another region to quantify the strength and identify statistical dependency. When repeated for every possible pair of regions, the result is a network characterization of the brain's connectivity, in which brain regions represent network nodes and correlation strengths correspond to connection weights [1,2].
To date, the main methods of measuring brain functional connectivity include correlation coefficient (CF), partial correlation coefficient (PCF), mutual information (MI), wavelet correlation coefficient (WCF), and coherence (CH). Correlation analysis is the simplest method for analyzing brain functional connectivity; and it is widely used [3][4][5]. Zalesky et al. [1] recommended using this method to construct a brain functional network. However, contaminations from noise, such as cardiac and blood vessel activities in the brain, could also lead to high correlations [6], and the correlation coefficient only measures linear dependence [7]. Similar to many commonly used statistics, a correlation coefficient is not robust; its value may be misleading if outliers are present [8,9]. Partial correlation refers to the normalized correlation between two time series after each series has been adjusted by regressing out all other time series of network nodes. One attractive feature of this method is that it attempts to distinguish direct from indirect connections [10]. Marrelec et al. [11] advocated partial correlation as a suitable method to construct a brain functional network. Some researchers have used this method to study the default mode network or the difference between patients and controls [12][13][14][15]. Partial correlation is a type of conditional correlation; it still cannot measure a nonlinear association [16,17]. Coherence is the spectral representation of correlation in the frequency domain and was proposed by Sun et al. [18]. The expression of correlation in the frequency domain enables researchers to study the time course relationship in a 2 BioMed Research International natural and intrinsic manner [19]. This method has recently been used by Chang and Glover [20] to investigate nonstationary effects in resting functional magnetic resonance imaging (fMRI) data. However, coherence provides vague information on the actual cortical areas involved because of the complex relationship between the active brain areas and sensor recordings [21]. Wavelet correlation refers to the correlation between wavelet coefficients, which can be obtained from a discrete wavelet transform (DWT) of the time series. Bullmore et al. [22] studied wavelets and the statistical analysis of fMRI of the human brain. Skidmore et al. [23] used wavelet correlation to construct a brain functional network and identified the differences between healthy subjects and subjects with Parkinson's disease. Mutual information (MI), a method in information theory, quantifies the shared information between two variables and can reflect both linear and nonlinear dependencies [24]. This method is popular for measuring the brain functional connectivity [25][26][27].
In 2011, Reshef et al. [28] proposed a new measure named maximal information coefficient (MIC), which can capture both linear and nonlinear association between two variables. Because of its outstanding performance in measuring different kinds of dependences, it is considered the correlation for the 21st century [29]. Reshef et al. [30] noted that MIC is more equitable compared with natural alternatives, such as mutual information estimation and distance correlation. Since it was proposed, it has been widely used [31][32][33][34][35][36]. Su et al. [37] were the first to use MIC to construct a brain functional network. However, they only demonstrated that the brain functional connectivity between healthy subjects and schizophrenia subjects calculated using MIC is suitable for classification. As a novel method applied in cognitive neuroscience, there is no explanation accounting for the rationality or advantages of MIC.
The paper demonstrates that MIC is a suitable method to measure brain functional connectivity and illustrates its advantages in constructing a brain functional connectivity network. Thirteen healthy subjects with minimal differences were selected from 75 healthy subjects, and the brain functional network was constructed using MIC, as well as other methods (CF, PCF, MI, WAF, and CH). Based on the node importance, we compared the consistency and robustness of the methods from different aspects of the network. Compared with other methods, MIC performed better in terms of consistency and robustness. Although there are many measures provided to capture the functional connections between brain areas, there is no work to compare the performance of the measures. Our work can be used to validate the possible new functional connection measures.

Subject Information and Data
Preprocessing. The data utilized for this paper is available for download at http://fcon 1000.projects.nitrc.org/indi/retro/cobre.html. The study comprised 75 healthy samples (ages ranged from 18 to 65 All fMRI data were processed using SPM8 (http://www.fil .ion.ucl.ac.uk/spm/) and DPARSF-V2.0 (http://www.restfmri .net/forum/index.php) [38]. For each subject, we removed the first 10 volume images from the RS-fMRI data for scanner stabilization and subject adaptation to the environment, which left 140 volumes for further analysis. Then, we performed slice timing to correct for the acquisition time delay between slices within the same TR; realignment to the first volume to correct inter-TR head motions was performed, followed by spatial normalization to a standard MNI template and resampling to a voxel size of 3 × 3 × 3 mm 3 . No spatial smoothing was applied based on methods from previous studies [39][40][41]. Finally, we performed bandpass filtering for each voxel in the frequency of 0:01-0:08 Hz to reduce low-frequency drift and high-frequency physiological noise. The RS-fMRI data for each subject were checked for head motion. No subject was excluded according to the criteria that the translation and rotation of head motion in any direction were not more than 1.5 mm or 1.5 ∘ . To obtain signals for each region, we applied an automated anatomical labeling (AAL) atlas [42] to parcellate the brain into 90 regions of interest (ROIs) (45 per hemisphere). The names of the ROIs and their corresponding abbreviations are listed in Table 2. The time series for each ROI was calculated by averaging the signals of all voxels within that region.

Maximal Information
Coefficient. The MIC, introduced by Reshef et al. [28] in 2011, was used as a measure of association between two random variables and . The MIC can capture wide range of relationships. The MIC( , ) is the mutual information ( , ) between random variables and normalized by the minimum entropy min{ ( ), ( )} of and , which can be written as where ( | ) is the conditional entropy, which is the amount of information needed to describe the outcome of given that the value of is known. For a pair of variables and , 0 ≤ MIC( , ) ≤ 1 and MIC( , ) = 0 if and only if and are independent. The MIC is robust to outliers because the estimations of Shannon entropy and conditional entropy are robust [36,43].

Construct Network.
For each subject, we obtained a 90 × 90 dependence matrix by calculating the connection strength using one of the 6 previously described methods (CF, PCF, MIC, MI, WCF, and CH) between all ROI pairs. One can check the details about the definitions and computations of the methods in the references. Figure 1 is the visualization of the six dependence matrices of a randomly selected sample. The correlation matrix was thresholded into a binary matrix. By taking each ROI as a node and the functional connectivity as an edge, we obtained a 90 × 90 adjacency matrix for each subject. The adjacency matrix can be defined as where = ( ) is the connection matrix, is the threshold value, and = ( ) is the adjacency matrix under threshold . In other words, if the absolute value of between a pair of brain regions and exceeds a given threshold , an edge is constructed to connect the brain regions; otherwise, there is no edge between them. We know that is an external variable which determines the size of the network. If is too large, we will obtain a network with fewer edges, which may lead to a disconnected network. If is too small, some of the connection strength may be too weak to be significant. To balance these two aspects, 1200 edges were selected as a representative network size. We also have obtained results on different network sizes, and the results were nonsensitive to network size.

Comparison Scheme.
We try to conduct both intermethod and intersample comparisons to validate the consistency and robustness. The important results, say key nodes, obtained by a good method should not conflict with that obtained by known methods. So we conduct intermethod comparisons to see which method is more consistent with CF 10 20 30 40 50 60 70 80 90   10 20 30 40 50 60 70 80 90   10 20 30 40 50 60 70 80 90   10 20 30 40 50 60 70 80 90  10 20 30 40 50 60 70 80   other methods. Such an aspect we compare is called consistency. Furthermore, a good method should not be sensitive to its operation objects. With 13 subjects of similar physical condition, a good method should obtain similar network properties. This approach represents a type of repeated trials; that is, a good method should always be good, regardless of the experimental targets. Such an aspect we compare is called robustness.
2.4.1. Node Importance. We use two criterions to evaluate the relative importance of a node in a network [43]. One criterion is the degree centrality (DC), which is proportional to the degree of the node. Another criterion is Shannon-Parry centrality (SPC). The former is a popular method. The latter is based on the Shannon-Parry measure of a network and the relative importance of the th node is proportional to ( ) * V( ), where = ( (1), (2), . . . , ( )) and = (V(1), V(2), . . . , V( )) are the left and right eigenvectors of the adjacency matrix of the network. The SPC can effectively illustrate the node importance by synthesizing the node properties and network topology structure [43,44].

Comparisons on Consistency and
Robustness. We conduct the comparisons of the consistency and robustness of the six methods by the flowchart in Figure 2.

Consistency.
We compared the consistency from top 10% important nodes (A) and the total Euclidean distance between importance vectors (B). Since there were two importance criterions DC and SPC, there were four comparisons. The computational results are listed in Tables 3-5. Table 3 shows the comparison of important nodes; for MIC, the important nodes defined by DC included MEG. L, MEG. R, ROL. L, SFGmed. L, SFGmed. R, INS. L, INS. R, PoCG. R, STG. L, and STG. R. When the node importance was defined by the SPC, nearly the same important nodes were obtained (STG. R instead of SMG. R). MIC had the best rank if the node importance criterion was SPC and ranked the third if SPC was replaced by DC. In the latter case, the score of MIC (33) was very close to the highest score 34.5.
The comparison of the total Euclidean distances of all nodes is shown in Table 4; MIC had the best rank if the node importance criterion is SPC and ranked the second if SPC was replaced by DC. In the latter case, the total distance of MIC was 127.84, which was almost the same as the smallest total distance 126.4. Table 5 collects the ranking information of the comparisons of consistency.

Robustness.
Since robustness is an important feature of a method, we conducted six comparisons to examine the robustness of the methods. We calculated the total Euclidean distances between the importance vectors (C) and the importance ranking vectors (D) of the 13 samples. We had four comparisons since there were two importance criterions. On the other hand, we calculated the VC (variation coefficient) of the degrees of the 90 nodes, as well as the voting entropy of all nodes. The results are listed in Tables 6-10 and Figures  3 and 4. As shown in Tables 6 and 7, in the comparisons of the total distance of all nodes, MIC was ranked the second. In the comparison of the distance between the importance ranking vectors, MIC was ranked first. The box plot (Figure 3) shows the average VC of all nodes, MIC performed the second, and PCF was ranked the first. The results were verified by the  Table 3(a), importance of nodes was defined by degree centrality (DC). In Table 3(b), importance of nodes was defined by the Shannon-Parry centrality (SPC). Each column included the important nodes and the score information for a method. In this table, the top 9 (10%) nodes were chosen as the important nodes. If a method obtained a higher score, it was more consistent with all other methods. In Table 3(a), CF, MI, and MIC had the relatively higher score. Although MIC was ranked the third, its score was very close to CF which is ranked the first. In general, the network constructed by MIC had the relatively better consistency compared with other methods in terms of important nodes. In Table 3 Table 4(a), we used degree centrality (DC) to measure the importance of node in network. In Table 4(b), we used Shannon-Parry centrality (SPC) to measure the importance of node in network. So each method got a 1 * 90 vector to measure the importance of nodes by calculating nodes' average importance from 13 sample networks. The value in the middle of table was the Euclidean distance between vectors from different method. The value of last row was the sum of Euclidean distance from one method to the others. If this sum of distance was smaller, corresponding method had better consistency. From  two-sample t- Figure 2: The flowchart summarizes the process of our comparison scheme on two aspects: consistency and robustness. Looking from the top to the bottom following the arrow and the branches, we can have a clear vision of our steps and the steps can be easily realized. We will conduct ten comparisons, four for consistency and six for robustness. To make it clear, "voting" is a metaphor. For instance, when we use 6 methods to extract the top 9 important nodes, we think that the 6 methods are "holding" a vote for their top 9 important nodes from the 90 nodes. The same goes for our samples. In the left branch of consistency part, the six methods have their own important nodes sets after "voting, " but the sets are different; that is to say, the six methods have different "opinions" on important nodes. We give each method a score, respectively, to decide which method's "opinion" is the best. One method's score is the total votes it receives from other methods, including itself. The method which acquires the most votes pools other methods' "opinions" together and is considered more reliable. In the first left branch of the robustness part, the 13 samples "vote" (9 ≤ ≤ 18) important nodes from the 90 nodes. After "voting, " the 90 nodes have their voting numbers and voting rates (voting number/sum of vote). We can calculate the entropy according to the probability distribution induced by the voting rates. If the entropy is small, it means that the 13 samples have consensus on important nodes, and the corresponding method is more robust. Table 6: C: robustness in the importance of all nodes. We used degree centrality (DC) and Shannon-Parry centrality (SPC) to measure the importance of node in network and put the importance vector calculated in the same method from different sample as the row to construct matrix, so each method obtained a 13 * 90 matrix. The value in the table was the sum of Euclidean distance between different row vectors for one method. D: robustness in the ranking of all nodes in importance. We used degree and Parry measure to measure the importance of node in network and put the importance vector calculated in the same method from different sample as the row to construct matrix, so each method obtained a 13 * 90 matrix. Then we got the rank of node in each row, so each method got a 13 * 90 rank matrix. The value in the table was the sum of Euclidean distance between different row vectors from rank matrix of one method. If this sum of distance was smaller, corresponding method had better robustness. In the aspect C, MIC is ranked the second and is only bigger than PCF. In the aspect D, MIC had the smallest sum of distance, so MIC had better robustness than other methods.    Table 9: We conducted one side two-sample -test between each two methods under the confidence level of 95%. In this chart we showed the value of the test in which the method's mean in the row was smaller than that in the column. We arranged the methods according to their rank. The events in the symmetric position of the  Figure 4: The horizontal axis is the number of important nodes for every subject according to the Shannon-Parry centrality (SPC). The vertical axis is the entropy as a metric to measure results; smaller entropy indicates better performance. Performance results were compared, and MIC performed best. For statistical tests, see Table 9.
two-sample t-tests under the confidence level of 95%. In the comparison of the voting entropy of nodes between the 13 samples, MIC performed the best when the top 9-18 (10-20%) nodes were regarded as the important nodes, as shown in Figure 4. MIC passed all the two-sample t-tests under the confidence level of 95%. The ranking information is listed in Table 10. Overall, for the robustness comparisons, MIC was consistently ranked in the top two methods.

Discussions
We compared MIC to five existing methods comprehensively from consistency and robustness. According to the results in Tables 3-5, MIC, CF, and MI are more consistent than WCF, CH, and PCF. The consistency scores of MIC, CF, and MI are very close. Combine the two comparisons in important nodes with criterions DC and SPC in Table 3, the total scores of MIC, CF, and MI are 65, 64.5, and 64.5, respectively. So MIC is more consistent than CF and MI. For the comparisons based on total distance, one can see from Table 4 that MIC also is more consistent than CF and MI. From the four comparisons in consistency, MIC was ranked 3, 1, 2, and 1, CF was ranked 1, 3, 1, and 3, and MI was ranked 2, 2, 3, and 2. The sum of the ranks is 7, 8, and 9 for MIC, CF, and MI, respectively. We conclude that MIC was more consistent than CF and MI because the sum of the ranks is smaller. For the robustness comparison, we compare six aspects. MIC was ranked the first and PCF was ranked the fifth in the comparisons of the total Euclidean distance between the importance (DC, SPC) ranking vectors and the voting entropy. In the remaining three comparisons, MIC was ranked the second, while PCF was ranked the first. PCF is very robust in the comparisons of total Euclidean distances between the importance (DC, SPC) vectors and variation coefficients of degrees. But it does not mean it is a good method because it performs the worst in all of the consistency comparisons. In fact, it regresses out all the other 88 nodes' influence when calculating the correlation coefficient between a pair of nodes. This leads to relatively uniform results that lack discriminability. We refer to Figure 1 for a typical visualization of a PCF matrix.

Conclusions
In this paper, we demonstrate that MIC can be used to construct a brain functional network. We compared MIC with five other methods (CF, PCF, MI, WCF, and CH) and ensured that it is suitable for brain functional network construction. In the comprehensive comparisons in consistency and robustness, MIC performs the best, and the results are convincing.