^{1}

^{2}

^{1}

^{1, 3}

^{1}

^{2}

^{3}

Graphical models appear well suited for inferring brain connectivity from fMRI data, as they can distinguish between direct and indirect brain connectivity. Nevertheless, biological interpretation requires not only that the multivariate time series are adequately modeled, but also that there is accurate error-control of the inferred edges. The PC_{fdr} algorithm, which was developed by Li and Wang, was to provide a computationally efficient means to control the false discovery rate (FDR) of computed edges asymptotically. The original PC_{fdr} algorithm was unable to accommodate _{fdr} algorithm and propose a multisubject, error-rate-controlled brain connectivity modeling approach that allows incorporation of prior knowledge of connectivity. In simulations, we show that the two proposed extensions can still control the FDR around or below a specified threshold. When the proposed approach is applied to fMRI data in a Parkinson’s disease study, we find robust group evidence of the disease-related changes, the compensatory changes, and the normalizing effect of L-dopa medication. The proposed method provides a robust, accurate, and practical method for the assessment of brain connectivity patterns from functional neuroimaging data.

The interaction between macroscopic brain regions has been increasingly recognized as being vital for understanding the normal brain function and the pathophysiology of many neuropsychiatric diseases. Brain connectivity patterns derived from neuroimaging methods are therefore of great interest, and several recently published reviews have described different modeling methods for inferring brain connectivity from fMRI data [

Graphical models, when applied to functional neuroimaging data, represent brain regions of interest (ROIs) as nodes and the stochastic interactions between ROIs as edges. However, in most nonbrain imaging graphical model applications, the primary goal is to create a model that fits the overall multivariate data well, does not necessarily accurately reflect the particular connections between nodes. Yet in the applications of graphical models to brain connectivity, the neuroscientific interpretation is largely based on the pattern of connections inferred by the model. This places a premium on accurately determining the “inner workings” of the model such as accounting for the error rate of the edges in the model.

The false discovery rate (FDR) [

Naively controlling traditional type I and type II error rates at specified levels may not necessarily result in reasonable FDR rates, especially in the case of large, sparse networks. For example, consider an undirected network with 40 nodes, with each node interacting, on average, with 3 other nodes; that is, there are 60 edges in the network. An algorithm with the

Recent work in the machine learning field has started to investigate controlling the FDR in network structures using a generic Bayesian approach and classical FDR assessment [

Li and Wang proposed a network-learning method that allows asymptotically control of the FDR globally. They based their approach on the PC algorithm (named after Peter Spirtes and Clark Glymour), a computationally efficient and asymptotically reliable Bayesian network-learning algorithm. The PC algorithm assesses the (non)existence of an edge in a graph by determining the conditional dependence/independence relationships between nodes [_{fdr} algorithm, is capable of asymptotically controlling the FDR under prespecified levels [_{fdr} algorithm does this by interpreting the learning of a network as testing the existence of edges, and thus the FDR control of edges becomes a multiple-testing problem, which has a strong theoretical basis and has been extensively studied by statisticians [

Beside giving an introduction of these recent advancements, this paper will present two extensions to the original PC_{fdr} algorithm, the combination of which leads to a multisubject brain connectivity modeling approach incorporating FDR control, _{fdr} algorithm allows for more flexibility in using the method and potentially leads to greater sensitivity in accurately discovering the true brain connectivity.

The second extension to PC_{fdr} algorithm is a combination of the PC_{fdr} algorithm and a mixed-effect model to robustly deal with intersubject variability. As neuroimaging research typically involves a group of subjects rather than focusing on an individual subject, group analysis plays an important role in final biological interpretations. However, compared with the extensive group-level methods available for analysis of amplitude changes in blood-oxygen-level-dependent (BOLD) signals (e.g., Worsley et al. [_{fdr} algorithm (or the extended

Several methods have been proposed to infer group connectivity in neuroimaging. Bayesian model selection [

The major distinguishing feature of the proposed approach compared to these aforementioned approaches is that the current data-driven approach aims at controlling the FDR directly at the group-level network. We demonstrate that in simulations that, with a sufficiently large subject size, the proposed group-level algorithm is able to reliably recover network structures and still control the FDR around prespecified levels. When the proposed approach, referred as the

Graphical models, such as Bayesian networks, encode conditional independence/dependence relationships among variables graphically with nodes and edges according to the Markov properties [

Since a graphical model is a graphical representation of conditional independence/dependence relationships, the nonadjacency between two nodes is tested by inspecting their conditional independence given all other nodes. As multiple edges are tested simultaneously, FDR-control procedures should be applied to correct the effect of multiple testing.

Given two among

Given a multivariate probability distribution whose conditional independence relationships can be perfectly encoded as a Bayesian network according to the Markov property, two nodes

Based on Proposition

The initial version of Li and Wang’s [_{fdr} algorithm, was proved to be capable of asymptotically controlling the FDR. Here we present an extension of the PC_{fdr} algorithm which can incorporate _{fdr} algorithm. We name the extension as the _{fdr} algorithm can thus be regarded as a special case of the extended algorithm, by setting

according to prior knowledge, the undirected edges

and the FDR level

the edges in

denotes an undirected edge.

conditional independence between

(1) Form an undirected graph

(2) Initialize the maximum

(3) Let depth

(4)

(5)

(6)

(7) Test hypothesis

(8)

(9) Let

(10)

(11) Run the FDR procedure, Algorithm

(12)

(13) Remove these edges from

(14) Update

(15)

(16)

(17)

(18)

(19)

(20)

(21)

(22)

(23) Let

(24)

A heuristic modification, named the

is removed from

Before we present theorems about the asymptotic performance of the

The multivariate probability distribution

The number of vertices is fixed.

Given a fixed significance level of testing conditional independence, the power of detecting conditional dependence approaches 1 at the limit of large sample sizes.

The union of

Assumption (A1) is generally assumed when graphical models are applied, and it restricts the probability distribution

The detection power of the

Assuming (A1), (A2), and (A3), both the

It should be noted that Theorem

The FDR of the

Assuming (A1), (A2), (A3), and (A4), the FDR of the set of edges inferred by the

Theorem

Theorems _{fdr} algorithm, its performance should be very similar. The numerical examples of the PC_{fdr} algorithm in Li and Wang’s [

The detailed proofs of Theorems

The majority of the computational effort in the

The computational complexity of the FDR procedure, Algorithm

the test statistics).

In practice, the

It should be noted that controlling the FDR locally is not equivalent to controlling it globally. For example, if it is known that there is only one connection to test for each node, then controlling the FDR locally in this case will degenerate to controlling the point-wise error rate, which cannot control the FDR globally.

Listgarten and Heckerman [

In this section, we propose another extension to the PC_{fdr} algorithm: from the single subject level to the group level. Assessing group-level activity is done by considering a mixed-effect model (Step 7 of Algorithm _{fdr} algorithm where “g” indicates that it is an extension at the group level. When also incorporating

level

denotes vertices adjacent to

(1) Form an undirected graph

(2) Initialize the maximum

(3) Let depth

(4)

(5)

(6)

(7) Test hypothesis

(8)

(9) Let

(10)

(11) Run the FDR procedure, Algorithm

(12)

(13) Remove these edges from

(14) Update

(15)

(16)

(17)

(18)

(19)

(20)

(21)

(22)

(23) Let

(24)

_{fdr}

algorithm to obtain the

the undirected edges

knowledge, the undirected edges

level

Suppose we have

For clarity, in the following discussion we omit the subscript “

To study the group-level conditional independence relationships, a group-level model should be introduced for

The group model we employ is

Because

Replacing Step 7 of the single-subject PC_{fdr} algorithm (i.e., the intrasubject hypothesis test) with the test of

Here we compare the performances of the proposed _{fdr} algorithm, using time series generated from two dynamic Bayesian networks in Figure

Simulation results for the _{fdr} algorithm, the red solid lines represent the

Figure _{fdr} algorithms can both control the FDR under or around 5%. For both methods, the detection power increases as the sample size increases. However, we can see that the _{fdr} algorithm does. As mentioned earlier in the Introduction Section, the

The simulations here serve two purposes: first, to verify whether the proposed gPC_{fdr} algorithm for modeling brain connectivity can control the FDR at the group level, and second, to compare the gPC_{fdr} algorithm with the single-subject PC_{fdr} algorithm proposed in [

The simulations were conducted as follows. First, a connectivity network is generated as the group-level model. Individual subject-level networks are then derived from the group-level model by randomly adding or deleting connections with a small probability, and subject-specific data are generated according to individual subject networks. Next, the network-learning methods, that is, the proposed gPC_{fdr} algorithm, the single-subject PC_{fdr} method with pooling together the data from all subjects, and the IMaGES algorithm, are applied to the simulated data. Finally, the outputs of the algorithms are compared with the true group-level network to evaluate their accuracy.

The data generation process is as follows.

Randomly generate a directed acyclic graph (DAG) as the group-level network and associate each connection with a coefficient. The DAG is generated by randomly connecting nodes with edges and then orienting the edges according to a random order of the nodes. The connection coefficients are assigned as random samples from the uniform distribution

For each subject, a subject-level network is derived from the group-level network by randomly adding and deleting connections. More specifically, for each of the existing connections, the connection is deleted with probability 0.05, and for each of the absent connections, a connection is added with probability 0.01. The corresponding connection coefficients are randomly sampled from the uniform distribution

Given a subject-level network, the subject-specific data are generated from a Gaussian Bayesian network, with the additional Gaussian noise following the standard Gaussian distribution

In the first simulation, we compare the performances of the proposed gPC_{fdr} algorithm, the original PC_{fdr} algorithm, and the IMaGES algorithm [_{fdr} algorithm. For reliable assessment, this procedure is repeated thirty times.

Simulation 1: assessing the effects of connection strength on the learned group networks. (a) The group-level network, with 20 nodes and an average of two connections per node. (b) The FDR curves (with standard deviation marked) of the gPC_{fdr} algorithm, the original PC_{fdr} algorithm by pooling all subject data together, and the IMaGES algorithm. (c) The type I error rate curves. (d) The detection power curves. The

Figures _{fdr} algorithm steadily controls the FDR below or around the desired level and accurately makes the inference at the group level. The detection power of IMaGES algorithm is higher than that of gPC_{fdr} algorithm, but it fails to control the FDR under the specified 5% level. Its higher detection power is achieved by sacrificing FDR. This is reasonable, since IMaGES is not specifically designed to control the FDR error rate.

In the second simulation, we test the performances of the algorithms as a function of the number of subjects within the group. The group-level network is the DAG in Figure

Simulation 2: assessing the effects of increasing the number of subjects on the learned group networks. (a) The group-level network, with 20 nodes and an average of two connections per node. (b) The FDR curves (with standard deviation marked) of the proposed _{fdr} algorithm by pooling all subject data together, and the IMaGES algorithm. (c) The type I error rate curves. (d) The detection power curves. The

Figure _{fdr} algorithm is able to keep the FDR below or around the specified level. The detection power gradually increases as the number of subjects increases. When there are more than 15 subjects, the gPC_{fdr} algorithm seems that it can achieve higher (better) detection power and lower (better) FDR and type I error rate than the IMaGES algorithm does. It suggests that when the number of subjects is large enough, the proposed gPC_{fdr} algorithm can jointly address efficiency, accuracy, and intersubject variability. The original PC_{fdr} algorithm of simply pooling the data together fails to control the FDR, and the resulting FDR does not decrease as the number of subject increases, probably due to the increasing heterogeneity within the group. In order to investigate the effects of the number of ROIs, we also investigate two networks with 15 and 25 nodes, respectively, and repeat the simulations (not shown here). The results are qualitatively similar to what we show here.

In order to assess the real-world application performance of the proposed method, we apply the g

Three groups were categorized: group N for the normal controls, group P_{pre} for the PD patients before medication, and group P_{post} for the PD patients after taking L-dopa medication. For each subject, 100 observations were used in the network modeling. For details of the data acquisition and preprocessing, please refer to Palmer et al. [

Brain regions of interest (ROIs).

Full name of brain region | Abbreviation |
---|---|

Left/right lateral cerebellar hemispheres | lCER, rCER |

Left/right globus pallidus | lGLP, rGLP |

Left/right putamen | lPUT, rPUT |

Left/right supplementary motor cortex | lSMA, rSMA |

Left/right thalamus | lTHA, rTHA |

Left/right primary motor cortex | lM1, rM1 |

“l” or “r” in the abbreviations stands for “Left” or “Right,” respectively.

We utilized the two extensions of the PC_{fdr} algorithm and learned the structures of first-order group dynamic Bayesian networks from fMRI data. Because the fMRI BOLD signal can be considered as the convolution of underlying neural activity with a hemodynamic response function, we assumed that there must be a connection from each region at time _{pre}) and after (group P_{post}) medication are compared in Figure _{pre} subjects, the left cerebellum now connects with the right SMA, and the right SMA _{pre} group, presumably as a compensatory mechanism. After medication (P_{post}), the left SMA

(a) Learned brain connectivity for the normal group (group N). (b) Learned brain connectivity for the PD group before medication (group P_{pre}). (c) Learned brain connectivity for the PD group after medication (group P_{post}). Here “L” and “R” refer to the left and right sides, respectively. The solid lines are predefined connectivity, and the dashed lines are learned connectivity.

Up to now, graphical models to infer brain connectivity from fMRI data have implicitly relied on the unrealistic assumption that if a model accurately represented the overall activity in several ROIs, the internal connections of such a model would accurately reflect underlying brain connectivity. The PC_{fdr} algorithm was designed to loosen this overly restrictive assumption and asymptotically control the FDR of network connections inferred from data.

In this paper, we first presented the _{fdr} algorithm, which allows for incorporation of prior knowledge of network structure into the learning process, greatly enhancing its flexibility in practice. The

It is interesting that the

When we compared the _{fdr} algorithm, both of them successfully controlled the FDR under the target threshold in simulations, providing a practical tradeoff between computational complexity and accuracy. However, the _{fdr} algorithm. Incorporating prior knowledge into PC_{fdr} algorithm therefore enhances inference accuracy and improves the flexibility in using the method.

Another extension to PC_{fdr} algorithm we described here was the ability to infer brain connectivity patterns at the group level, with intersubject variance explicitly taken into consideration. As a combination of the PC_{fdr} algorithm and a mixed-effect model, the gPC_{fdr} algorithm takes advantage of the error control ability of the PC_{fdr} algorithm and the capability of handling intersubject variance. The simulation results suggest that the proposed method was able to accurately discover the underlying group network and steadily control the false discovery rate. Moreover, the gPC_{fdr} algorithm was shown to be much more reliable than simply pooling together the data from all subjects. This may be especially important in disease states and older subjects. Compared with the IMaGES algorithm, gPC_{fdr} demonstrated better control of the FDR.

As with all group models, a limitation of the proposed gPC_{fdr} algorithm is the requirement of a sufficient number of subjects. While it is appreciated that in many biomedical applications data collection is resource intensive, and if the number of subjects is insufficient, the gPC_{fdr} algorithm may give unreliable results. Nevertheless, the group extension to the PC_{fdr} algorithm is one attempt to make brain connectivity inference using error-rate-controlled exploratory modeling.

When applying the proposed g

To assist the reading, we list below notations frequently used in the proof.

all the nodes in a graph,

the skeleton of the true underlying directed acyclic graph (DAG),

the event that edge

the value of

a certain vertex set that

the

the value in (

If

For the proof of this lemma, please refer to Li and Wang’s [

If there are

For the proof of this lemma, please refer to Li and Wang’s [

If there is not any true edge in

In the following part of the proof, we assume

Let

and

Given any FDR level

For the proof of this lemma, please refer to Li and Wang’s [

Given any FDR level

The corollary can be easily derived from Lemma

Let _{fdr}-skeleton algorithm stops.

The theorem is proved through comparing the result of the

For a vertex pair

Let us design a virtual algorithm, called

For any vertex pair

When all the true edges in the test set are recovered by the

Let

The

The computational complexity of the FDR procedure, Algorithm

This work was partially supported by the Canadian Institutes of Health Research (CIHR) Grant (CPN-80080 MJM) and the Canadian Natural Sciences and Engineering Research Council (NSERC) Grant (STPGP 365208-08).