A Theoretical Investigation of the Relationship between Structural Equation Modeling and Partial Correlation in Functional MRI Effective Connectivity

An important field of blood oxygen level dependent (BOLD) functional magnetic resonance imaging (fMRI) is the investigation of effective connectivity, that is, the actions that a given set of regions exert on one another. We recently proposed a data-driven method based on the partial correlation matrix that could provide some insight regarding the pattern of functional interaction between brain regions as represented by structural equation modeling (SEM). So far, the efficiency of this approach was mostly based on empirical evidence. In this paper, we provide theoretical fundaments explaining why and in what measure structural equation modeling and partial correlations are related. This gives better insight regarding what parts of SEM can be retrieved by partial correlation analysis and what remains inaccessible. We illustrate the different results with real data.


Introduction
Blood oxygen level dependent (BOLD) functional magnetic resonance imaging (fMRI) is an imaging technique that allows to dynamically and noninvasively follow metabolic and hemodynamic consequences of brain activity [1,2]. Since Biswal et al. [3], an increasing number of studies have suggested that fMRI data could be used to explore how brain regions interact to perform functional tasks. A key concept in investigation of functional brain interactions is effective connectivity, which has been defined as the influence that regions exert on one another [4].
Path analysis, or structural equation modeling (SEM), has been the major way to examine effective connectivity in fMRI [5][6][7]. Starting from a set of D regions, a model is set a priori that expresses the time course z i (t) of each region as a linear function of the time course of other regions with some coefficients λ i j being constrained to 0, the others are free to vary. λ i j quantifies the strength that region j exerts on region i. Setting an SEM is equivalent to defining a directed graph, where each node stands for a region, a given arrow j → i is present if and only if the corresponding coefficient λ i j is not constrained to zero, and, finally, λ i j represents the intensity of arrow j → i. Once the structural model is completely set, the unconstrained coefficients λ i j are estimated. To this aim, the model covariance matrix Σ, which is a function of the parameters, is compared to the sample covariance matrix S using a discrepancy function that is minimized [8,9]. In fMRI data analysis, the following maximum likelihood function is often used [7]: where tr(·) stands for the standard matrix trace function.
The major flaw of this approach is that it requires the prior definition of a structural model, that is, of regions and arrows, each arrow requiring itself information regarding connection and direction. By contrast, information regarding the functional interactions present within the network of interest is likely to be scarce, since it is often the very reason why an fMRI study of effective connectivity is carried 2 Computational Intelligence and Neuroscience out. This is all the more problematic that the approach does not really provide any clear way to challenge the model or to provide information relative to where or how the model under investigation could be improved. We recently proposed a novel approach to gain insight on effective connectivity. We first showed that, unlike marginal (i.e., regular) correlation, conditional correlation could account for many patterns of interaction as modeled by SEM [10,11]. We then proposed to focus on a specific set of conditional correlations, namely partial correlations [12]. Given a set of D regions, denoted by R, and a variable y i associated to each region i (of which z i (t) mentioned in (1) is a realization), the method estimates the partial correlation of any region pair (i, j) given the set of D − 2 remaining regions, On both real [13] and synthetic data [14], it was observed that a large partial correlation value between two regions was often associated with the presence of an effective connectivity between these regions. However, the reason for such a behavior remained unclear. In the present paper, we further delve into the relationship between SEM and partial correlation in order to better understand why and in what measure partial correlation can extract information that is relevant for effective connectivity analysis. To this aim, we provide a theoretical relationship between SEM and partial correlation through the computation of the inverse covariance matrix (also-called concentration or precision matrix). To illustrate the results so obtained, we use a dataset on which SEM analysis has already been performed and published [7].

From SEM to Partial Correlation
2.1. Bullmore et al. [7] SEM Study. We here quickly recall the essentials of a previous study on which our investigation of partial correlation relies. For more detail, we refer the reader to Bullmore et al. [7]. The study focused on D = 5 left hemispheric cortical regions of interest: the ventral extrastriate cortex (VEC), the prefrontal cortex (PFC), the supplementary motor area (SMA), the inferior frontal gyrus (IFG), and the inferior parietal lobule (IPL). Each region was associated to a time course for a total of five time courses of length T = 96 time samples. The sample marginal and partial correlation matrices corresponding to these time courses are reported in Table 1. The time courses were a group average over the subjects, and the correlation matrix corresponds to the correlations of the averaged time series. A plausible structural model, henceforth referred to as the "theoretically preferred model" (or "TP"), was proposed and is represented in Figure 1(a). Using the correlation matrix of Table 1, a procedure implemented in the LISREL proprietary software package (http://www.ssicentral.com/ lisrel/) computed a so-called "best fit" model from the data, henceforth referred to as such (or "BF") and represented in Figure 1(b). While similar in some ways, the two models had different features: (i) VEC → IPL and SMA → IFG were present in the theoretically preferred model but were not selected in the best fit model; (ii) PFC → IFG and SMA → IPL were absent in the theoretically preferred model but appeared in the best fit model.
We now go back to a different perspective. Indeed, the structure of any SEM entails specific constraints on the covariance matrix, as well as other matrices characteristic of the process, such as the concentration matrix and the marginal and partial correlation matrices.

SEM Modeling.
Generally speaking, a structural model can be defined in matrix form as where y is the D-dimensional variable characterizing the state of each region and e is a temporally independent and identically distributed (i.i.d.) Gaussian noise with diagonal covariance matrix. K = (K i j ) i, j=1,...,D contains the path coefficients. The N time samples (z(t n )) n=1,...,N , where z(t n ) is the signal measured in each of the D regions at time t n , are supposed to be N i.i.d. realizations of y. The matrices corresponding to the theoretically preferred and the best fit models are, respectively, given by (see also Figure 1)

SEM and Covariance.
Classically, we further assume that the noise e of (4) is composed of spatially and temporally  Figure 1: Structural models and path coefficients corresponding to the theoretically preferred (a) and best fit (b) models (from [7]).
independent Gaussian variables with diagonal covariance matrix: Since (4) rereads y = (I − K) −1 e, where I stands for the Ddimensional unit matrix, it is straightforward to show that y is also Gaussian distributed with covariance matrix [15] where "t" stands for matrix transposition. Since K is a function of the path coefficients, so is Σ. This relationship is central to SEM analysis, for most methods rely on comparing the covariance matrix Σ implied by a structural model to the data sample covariance matrix using normal theory maximum likelihood-leading to the discrepancy function of (2)-, generalized least squares, or ordinary least squares [8,9]. Note that, in (2), Σ only appears through its inverse Υ = Σ −1 . Υ is called the concentration, or precision, matrix and it is on this matrix that we will focus to get a better understanding of the data structure.

SEM and Concentration.
Indeed, Υ has intriguing structural properties when related to a structural model. Using (7), this matrix is given by V being a diagonal matrix, the expression for each element Υ i j of the concentration matrix can easily be expanded as Given that K ii = 0, the previous equation yields and, for i / = j, Equation (11) can be used to compute the concentration coefficients corresponding to the TP and BF structural models. For instance, we have for the TP model From this example, we see that two cases can arise. In the first case (e.g., Υ 13 ), the value of the concentration coefficient is equal to zero, not because of the specific numerical values that have been assigned to the path coefficients, but because of the structure of the SEM itself. In the second case (e.g., Υ 12 or Υ 14 ), the concentration coefficient is equal to zero only if the path coefficients are set to certain values (e.g, λ 21 = 0 for Υ 12 ; λ 51 = 0 or λ 54 = 0 for Υ 15 ). For our purpose, the exact values taken by the nonzero Υ i j are of minor importance; we rather focus on the elements that, such as Υ 13 , are structurally equal to zero, that is, that are equal to zero independently of the values taken by the path coefficients. More generally, it can be shown using (11) that Υ i j is identically equal to zero regardless of the numerical values of the path coefficients if and only if the three terms of the right-hand side of (11) are equal to zero, that is, invalid for the pair i-j, thereby leading to Υ ij / = 0 or, equivalently, regions is not directly connected in the structural model or both regions do not jointly point to any common region, the coefficient of partial correlation between these two regions is expected to be structurally equal to zero. On the other hand, if either condition is not satisfied, the corresponding coefficient of partial correlation is not structurally equal to zero (see Figure 2). Turning our attention back to the TP model, we see that, while regions VEC and SMA satisfy both (C 1 ) and (C 2 ) (implying Π 13 = 0), regions VEC and PFC do not satisfy (C 1 ) (since we have VEC → PFC) and regions VEC and IFG do not satisfy (C 2 ) (since we have VEC → IPL←IFG). As a matter of fact, all cases can be found in both the TP and the BF models, as shown in Tables 2 and 3. Using the aforementioned rule, we are able to retrieve the following structural constraints for partial correlation:

SEM and Partial Correlation.
As correlation matrices are often easier to interpret than covariance matrices, we can decide to examine partial correlation matrices rather than concentration matrices. The partial correlation coefficient between two regions i and j, denoted by Π i j , is here defined as a particular conditional correlation coefficient; it is the correlation between these two regions conditioned on the set of remaining regions, that is, There are hence D(D − 1)/2 partial correlation coefficients (10 in our example) that form the D-by-D partial correlation matrix Π = (Π i j ). Π can readily be calculated from Υ through the following relationship [17]: for two distinct regions i and j, and Π ii = 1. Consequently, we have and what has been said about the relationship between the structural model and the structural zeros of the concentration matrix, namely conditions (C 1 ) and (C 2 ), also holds for the partial correlation matrix. Furthermore, since the partial correlation coefficients are correlation coefficients, they are not influenced by any scale effect and remain between −1 and 1; for this reason, they are much easier to analyze and interpret than elements of the concentration matrix.

Validating Partial Correlation Structures
As we saw, a structural model has unique implications in terms of the structural pattern of partial correlation that can be expected from the data. Since the partial correlation matrix is a quantity that can be inferred from the data, we can use partial correlation analysis as a way to validate a structural model by comparing what is expected and what is observed.

Local Hypotheses.
The approach consists of translating the structural hypotheses in terms of partial correlation. Indeed, according to Tables 2 and 3, the two structural models entail different hypotheses in term of partial correlation.
For the theoretically preferred model, we have and, for the best fit model,

Inference.
Assessing the validity of the various hypotheses can be done by first estimating the partial correlation matrix. Inference of Π can be performed in a Bayesian framework using a numerical sampling scheme ( [11,13], see also the appendix). While direct computation of p(Π | z) is rather complex, this technique provides a simple approximation by sampling L (e.g., L = 5000) matrices (Π [l] ) l=1,...,L from p(Π | z). We then quantify the relevance of all hypotheses as follows. First, the probability of a coefficient Π i j to be higher than 0 can be approximated by where "#" stands for the cardinal of a set (i.e., its size). The probability p − i j of a coefficient to be lower than 0 could be approximated in a similar way, but only one of these two If e i j is large and positive, we are more inclined to accept Π i j > 0, while, if it is large and negative, we are more inclined to accept Π i j < 0. Usually, a value of 10 dB can be considered as good evidence in favor of the hypothesis (see Table 4 for some relationships between p + i j and e i j ). We finally take |e i j | as a measure of how Π i j differs from zero and, hence, as a way to quantify the deviation of the data from hypothesis Π i j = 0: values close to zero indicate a coefficient close to zero, while large values suggest a large coefficient value.
Since we here focus on the partial correlation constraints entailed by the structural models, (16) and (17), we only need the corresponding log odd ratios, summarized in Table 5. If all these hypotheses were true, then we would expect the absolute values of all log odd ratios to be lower than 10 dB. While this is the case for the three hypotheses related to the BF model, it is not the case for two of the four hypotheses related to the TP model: according to these results, (H TP2 ) and (H TP4 ) are rather unlikely to be true.

Discussion and Perspectives
In this paper, we further examined how partial correlation could be used to investigate effective connectivity in fMRI. We introduced theoretical fundaments explaining why and in what measure the structure of the partial correlation matrix 6 Computational Intelligence and Neuroscience Table 3: Partial correlation constraints in the TP and BF models (2/2). For each link between regions and each model, examination of whether (C 1 ) and (C 2 ) are satisfied.
Link i-j TP model BF model  can be related to a structural model. More precisely, we showed that, given a structural model, the partial correlation Π i j between i and j is structurally equal to zero if and only if (C 1 ) neither region i nor region j has an effect on each other, and (C 2 ) regions i and j do not jointly influence a third region l; in other words, if and only if none of the following patterns are observed: i ← j, i → j, or i → l ← j for any l. From there, the definition of a structural model entails a unique set of constraints that can be tested from the data, supporting or invalidating the plausibility of the corresponding structural model. When examining the global relevance of partial correlation analysis to the investigation of effective connectivity, we must jointly consider two complementary effects, namely, the theoretical relationship between structural models and partial correlation matrices on the one hand and, on the other hand, the quality of the inference process. From a purely theoretical standpoint, this result shows that partial correlation analysis comes up as a combination of two effects. First, constraints (C 1 ) and (C 2 ) imply that where ¬ stands for the negation. In other words, a zero partial correlation between i and j implies the absence of a direct link between these two regions. Were there only (C 1 ), this implication would be an equivalence and having Π i j / = 0 would imply a direct link between i and j. However, this is not true in general and, more specifically, for any pair of regions for which constraint (C 2 ) is satisfied. Such pairs are not connected but still have a nonzero partial correlation coefficient. As a consequence, all that can be said is that the set of set of pairs of regions with a zero partial correlations is a subset of the sets of pairs not directly connected in the structural model or, equivalently, that the set of pairs of regions connected in the structural model is a subset of the set of pairs of regions with a nonzero partial correlations. These features can easily be related to basic graph theoretic concepts. Condition (C 1 ) states that regions i and j are not neighbors; condition (C 2 ) states that i and j satisfy the so-called Wermuth condition [17]. As a consequence, the partial correlation constraints imposed by a structural model can be read off the graph obtained by adding undirected edges to eliminate all Wermuth configurations (for condition (C 2 )) and transforming all arrows into undirected edges (for condition (C 1 )). Such a graph is called the moral graph associated with the structural model. Depending on how many variables share common parents, the moral graph can be more or less close to the structural graph. For instance, in each of the two models used in this paper, condition (C 2 ) was only met once. Whether this is a general feature of fMRI data or only a characteristic induced by the structure selected remains to be cleared.
Another theoretical issue that needs to be tackled is the fact that having a partial correlation that is not constrained to 0 (e.g., Π 14 for the theoretically preferred model) does not preclude its value to be equal to zero, due to a numerical coincidence. Indeed, (11) shows that specific values of K and V could be selected to induce Υ i j = 0 and, consequently, also Π i j = 0. Even though this event is possible, it should be considered as rather unlikely, unless there is an underlying constraint at stake that forces the coefficient values to respect a certain relationship.
Another, more important issue deals with inference and how confident we can be in the partial correlation estimates and, critically, in the tests that their values are different from zero. The major difference between partial correlation and marginal correlation is that the former is obtained by removing the effect of D − 2 regions as evidenced by (14). Importantly, the partialization process tends to decrease the value of correlation regardless of the exact relationship between the two variables and the conditioning set. Consequently, the values of partial correlation coefficients usually tend to be lower than their marginal counterparts; this is an observation that we have made consistently, and with only few exceptions. Also, as a rule of thumb, the posterior variance associated with a (marginal or partial) correlation coefficient (e.g., Var[Π i j | y] for partial correlation) is roughly a decreasing function of the absolute value of its posterior mean (e.g., E[Π i j | y] for partial correlation). For instance, it is asymptotically (1 − Π 2 i j ) 2 /(N − 1) (which is indeed a decreasing function of Π i j ) for partial correlation and a similar result hold for marginal correlation [15]. A lower mean value therefore also implies a higher variance and, essentially, a bigger difficulty to discriminate a nonzero value from zero.
Altogether, these various factors, both theoretical and inferential, have different consequences on the relationship between the inferred pattern of partial correlation and the underlying structural model. Although we have observed a rather good agreement between expected and inferred patterns so far, in the lack of gold standard, these consequences must be further investigated.
Still, one of the main reasons why partial correlation analysis might become an important tool for the investigation of effective connectivity is that it is, to our knowledge, the only fully exploratory approach. Its key feature is its ability to retrieve local patterns of interaction. Indeed, while the method developed for the estimation of structural parameters, for example, (2), globally assesses the goodness of fit of the whole model and accordingly provides a general measure of it, partial correlation analysis provides a rather local assessment of effective connectivity, since the fact that two regions have a nonzero partial correlation depends on their connection with each other and of a potential connection with a common third region. For instance, in our example, while Bullmore et al. [7] concluded that the data did not contain enough evidence to prefer the BF model over the TP model (global statement), we showed that the TP model entails two partial correlation constraints (Π 24 = 0 and Π 35 = 0) that are rather unlikely to be true in the data (local statements). According to this result, we should discard the BF model or, at least, exert great caution is proportional to the sample covariance matrix, and z t is the temporal mean [21]. Calculation of the posterior probability density function (pdf) of the partial correlation matrix, p(Π | z) cannot be performed in close form from this distribution. To approximate this distribution, we can nevertheless resort to the following sampling scheme [11,13]. For sample l, (1) sample Σ [l] according to its inverse Wishart distribution ( [21], Appendix A); (2) calculate Υ [l] = (Σ [l] ) −1 , and Π [l] from Υ [l] according to (14).
Once a large number L of samples have been drawn following this process, the marginal pdf of a given quantity can be approximated by the frequency histogram obtained from the sample. Likewise, all statistics and estimators can be approximated by their sample counterparts. For instance, One can also compute evidence as explained in the main text.