Comparative Study of Elastic Network Model and Protein Contact Network for Protein Complexes: The Hemoglobin Case

The overall topology and interfacial interactions play key roles in understanding structural and functional principles of protein complexes. Elastic Network Model (ENM) and Protein Contact Network (PCN) are two widely used methods for high throughput investigation of structures and interactions within protein complexes. In this work, the comparative analysis of ENM and PCN relative to hemoglobin (Hb) was taken as case study. We examine four types of structural and dynamical paradigms, namely, conformational change between different states of Hbs, modular analysis, allosteric mechanisms studies, and interface characterization of an Hb. The comparative study shows that ENM has an advantage in studying dynamical properties and protein-protein interfaces, while PCN is better for describing protein structures quantitatively both from local and from global levels. We suggest that the integration of ENM and PCN would give a potential but powerful tool in structural systems biology.


Introduction
Proteins rarely act alone: in the great majority of cases they perform a vast array of biological functions by forming functional complexes [1,2]. The study of protein complexes not only elucidates the molecular mechanism of many diseases [3] but also provides structural information of proteinprotein interactions [4]. With the increasing number of structural data, a lot of regularities have been found for protein complexes based on their topological structures [5]. However, the structural and assembly principles underlying protein complexes organization are not yet fully understood, which poses a great challenge in structural systems biology [6]. A well-studied example of protein complex is hemoglobin (Hb) tetramer, which contains two and two subunits as a dimer of dimer [7]. Hbs exist in three quaternary conformations: the low-affinity (deoxy, ) state and the high-affinity (oxy, ; carbonmonoxy, 2) states. Hbs are never present in cells as monomers. Therefore, Hbs were considered as a sort of 'obliged' allosteric protein complexes and, even thanks to the great amount of both structural and physiological data, attracted a lot of attentions [8][9][10].
Network theory has become a versatile method to study structures and dynamics of biological systems [11][12][13]. As a dynamical-based method introduced by Tirion [14], Elastic Network Model (ENM) allows performing normal mode analysis at network level. Two mostly used ENM methods, Gaussian Network Model (GNM) and Anisotropic Network Model (ANM), were further proposed by Bahar and coworkers [15,16]. ENM is an efficient computational tool to describe the essential vibrational dynamics encoded in the molecular topology [17][18][19][20]. It has been proved that the low-frequency modes of ENM are critical of collective motions [21], while the high-frequency modes can identify hot spots for proteinprotein interactions [22].
The approach of Protein Contact Network (PCN) was proposed by Kannan and Vishveshwara [23] and now has become a new paradigm in protein ontology [24][25][26][27][28]. In a PCN, nodes correspond to , while edges exist if two amino acid residues (nodes) are close to each other under different 2 BioMed Research International cutoffs [29]. Based on this graphical representation, different topological parameters have been developed to describe protein structures and functions from both the global and the local prospective [30][31][32].
Both ENM and PCN offer computationally efficient tools to study the structure and function of protein complexes [33,34], from predicting functionally important residues [35,36], to characterize protein-protein interactions [37,38] and allosteric communication paths [39,40]. Of course, both models have strengths and weaknesses and their comparative study is needed.
In this paper, we have analyzed and compared four applications of ENM and PCN on Hb structures: conformational change characterization, modular analysis, allosteric mechanisms investigation, and interface characterization. Although there are several works reported on the ENM [41][42][43] and PCN [44,45] studies of Hb independently, this work revisits Hb as case study and mainly focuses on the methodology comparison of ENM (specifically GNM and ANM) and PCN.

Gaussian Network Model and Anisotropic Network Model.
GNM [15] describes a protein as a network of connected by springs of uniform force constant if they are located within a cutoff distance (7Å in this study). In GNM, the interaction potential for a protein of residues is [46] where and 0 are the equilibrium and instantaneous distance between residues and , and D is × Kirchhoff matrix, which is written as follows: ( Then, square fluctuations are given by The normal modes are extracted by eigenvalue decomposition: Γ = Λ , where is the orthogonal matrix whose th column is th mode eigenvector. Λ is the diagonal matrix of eigenvalues, . ⟨Δ ⋅ Δ ⟩ can be written in terms of the sum of the contribution of each mode as follows: Thus, the cross-correlation can be calculated by The cross-correlation value ranges from −1 to 1: positive values mean that two residues have correlated motions, while the negative values mean that they have anticorrelated motions.
In ANM [16], the interaction potential for a protein of residues is [46] The motion of the ANM mode of proteins is determined by 3 × 3 Hessian matrix , whose generic element is given as follows: where , , and represent the Cartesian components of residues and is the potential energy of the system. used here is 13Å. Accordingly, ANMs provide the information not only about the amplitudes but also about the direction of residue fluctuations. The similarity between two ANM modes, and V , evaluated for proteins with two different conformations can be quantified in terms of inner product of their eigenvectors [39]; that is, The degree of overlap between th ANM modes and the experimentally observed conformation change Δ of Hbs among different states is quantified by ((Δ ⋅ )/|Δ |). Therefore, the cumulative overlap CO( ) between Δ and the directions spanned by subsets of ANM modes is calculated as follows: The Markov model coupled with GNM was used for exploring the signal transductions of perturbations in proteins [47,48]. The affinity matrix describes the interactions BioMed Research International 3 between residue pairs connected in GNM; its generic element is defined as follows: where is the number of atom-atom contacts between residues and based on a cutoff distance of 4Å and is the number of side-chain atoms in residue . The density of contacts at each node is given by The Markov transition matrix , whose element = −1 , determines the conditional probability of transmitting a signal from residue to residue in one time step. Accordingly, the hitting time for the transfer of a signal from residue to is given by [47] ( , where D is Kirchhoff matrix obtained by GNM. The average hit time for th residue ⟨ ( )⟩ is the average of ( , ) over all starting points . The commute time is defined by the sum of the hitting times in both directions; that is, ( , ) was defined as the corresponding distance, as the weight of the edge between node and in the network.

Protein Contact Networks (PCNs).
Protein Contact Networks (PCNs) provide a coarse-grained representation of protein structure [49], based on coordinates from PDB files: network nodes are the residues, while links exist between nodes whose Euclidean distance (computed with respect to -carbons) is within 4 to 8Å, in order to account only for significant noncovalent intramolecular interactions [24,50,51].
After building up the network, it is possible to quantify its features through the adjacency matrix Ad, whose generic element Ad is 1 if th and th nodes are connected by a link; otherwise it is 0.
The most basic descriptor is the node degree, defined for each node as the number of links involving the node itself: Given a set of vertices , the shortest path sp ,V between two nodes , V ∈ is the minimum number of edges connecting them (Figure 1). Its role is crucial since it has been demonstrated that the lower the network average shortest path (or characteristic length, computed as the average value over the whole number of node pairs), the higher the efficiency of signal transmission through the network [52]. In PCNs the average shortest path describes the protein attitude to allosteric regulation.
The betweenness centrality of a node describes the number of shortest paths passing by it. Given a set of vertices , the betweenness centrality of node ∈ is defined as follows: where V, is the total number of the shortest paths connecting two nodes , V ∈ , whereas V, ( ) represents the number of shortest paths connecting the nodes and V passing on as well. Therefore, high betweenness centrality nodes take part in many shortest paths, so their removal is likely to be noxious for the whole network connectivity. We computed the betweenness centrality by means of the algorithm described in [53]. Closeness centrality describes the general closeness of a node to all other nodes, in terms of length of shortest paths: Closeness centrality of residues in PCNs has been demonstrated to describe conformational transitions occurring in protein response to environmental stimuli through cooperative processes [54]: residues in the active site of enzymes show both high degree and closeness centrality; however, it does not provide any clue about allosteric regulation in the enzyme-substrate binding.
The Guimerà-Amaral cartography [55] provides a framework to classify nodes according to their topological role in the network. It is based on network clustering into nodes groups (clusters). We applied a spectral clustering procedure, previously demonstrated to catch functional modules in protein structures [56].
The spectral clustering algorithm [57] applies to the Laplacian matrix defined as the difference between the adjacency matrix Ad and the degree matrix (a diagonal matrix whose generic element is th node degree). We applied the eigenvalue decomposition to : the spectral clustering decomposition refers to the eigenvector V 2 corresponding to the second minor eigenvalue.
The procedure applies iteratively to get the final desired number of clusters (set by defining the number of iterations); nodes are parted according to the sign of corresponding V 2 components. So, for instance, if it is required to part the network into four clusters, the first partition produces two clusters, whose V 2 components have opposite signs and, successively, both clusters undergo the same procedure, applied to single cluster nodes subset.
We represented the clustering partition in two ways: first, we reported on the ribbon representation residues pertaining to different clusters in different colors, to identify at once clusters on the three-dimensional structure representation. Second, we reported the clustering color map, a matrix whose generic element is colored not in blue if residues corresponding to indices pertain to the same cluster and in blue, background, if corresponding residues do not belong to the same cluster. This representation helps understanding the distribution of clusters along sequence.
After clustering partition, it is possible to compute for each node (residue) the participation coefficient P, defined as follows: is the overall degree of the node, is the node degree in its own cluster (number of links the node is involved into with nodes pertaining to its own cluster).
A complementary descriptor is the intramodule connectivity -score , defined as follows: where and SD are the average value and the standard deviation of the degree extended to the whole network. The descriptor catches the attitude of nodes to preferentially connect with nodes in their own clusters; strongly correlates with node degree, so high residues are mostly responsible for global protein stability. The participation coefficient has been previously demonstrated of a crucial importance in identifying key residues in protein structure with a functional role [43,56,58]; residues with values higher than 0.75 are mostly devoted to the communication between modules (clusters), since they spend more than half of their links with residues pertaining to clusters other than theirs. In other words, signaling pathways between clusters pass by them.
-maps show a peculiar shape ("dentist's chair") for PCNs [58]: high residues show low values, meaning the role of nodes (communication, high , and ) are well separated. We previously reported [35] that in protein-ligand Intramodule connectivity and communication [27] binding shifts from nonnull to null values for residues close to an active site in allosteric proteins.
We computed for each structure profile and -maps. Then, for the two pairs apo-holo forms we report the heat maps of variation on the ribbon structure, so to highlight regions in the protein structure undergoing changes upon ligand binding.
The analysis was performed by means of a purposed software implemented in Matlab environment v 2014a, including functions from Bioinformatics Toolbox. Heat maps of variation (comparison between holo and apo forms), Guimerà-Amaral cartography and clusters onto the protein ribbon representation, have been produced by means of a purposed Python script compiled in the embedded Python environment; for further details and application of the method see [37]. Table 1 sums up PCN descriptors, along with their structural and biological relevance.

Results and Discussion
3.1. ENM Results. GNM and ANM are simple yet effective methods [33]. GNM can only describe the amplitude of residue fluctuations, but ANM can give the direction of the motions. In this section, ANM was used to investigate conformational change between -Hb and -Hb, and describe the dynamical properties of protein-protein interfaces. GNM was employed for the modular analysis of Hbs, which was coupled with Markovian stochastic analysis to study the allosteric mechanisms of Hbs. Expecting for the conformational change, we only chose -Hb to exhibit these investigations. ENM results for other two states of Hbs show similar results, as shown in the supporting information.   For the first mode -Hb, the two chains exhibit different dynamical behavior with two chains, but two dimers of 1 1 and 2 2 show similar global dynamics (the blue line). For the first mode of -Hb, the mean-square fluctuations profile of -chain is very similar to that of the -chain (the red line). Comparing these two structures the fact that chains are more stable and chains are less stable in state than in state emerges.
Overlaps of each ANM mode with the structural difference between and conformations were calculated to detect which individual mode contributes significantly to the structural differences between the results from experimental study and are calculated by (9). Figure 2(c) shows that the transformation from into is favored by the second mode of -Hb with the highest overlap (0.84). In this mode, the global motion involves quaternary changes of two dimers, namely, 1 1 dimer, exhibiting a torsional rotation in an opposite direction with 2 2 dimer (Figure 2(d)). Furthermore, this mode is also coordinated by hinge sites at 1 -1 and 2 -2 interface. Tekpinar and Zheng [42] have previously performed the ENM study of conformation changes from to 2 structures, in which they found the first two modes contribute significantly to the conformational change. Our revisiting is in accordance with their results, because mode 2 observed herein seems like the combination motion of their two modes.

Modular Analysis.
In their recent work, Li et al. [59] developed a new method based on GNM and ANM for dividing a protein into intrinsic dynamics modular analysis. Here, we adopted a much simpler way, just based on the analysis of the GNM lowest mode. Correlation maps for cross-correlation not only describe collective motion but also reflect the symmetry of proteins [36]. To our aim, the correlation map for the first GNM mode was used for the modular analysis of Hb [60]. In the map, red indicates the highly correlated motions, blue represents the anticorrelated motions, and green is for the uncorrelated motions. As shown in Figure 3(a), the correlation map shows that -Hb tetramer is divided into two modules, which correspond to 1 1 dimer and 2 2 dimer. Two red blocks indicate that 1 and 2 move in the same direction with 1 and 2 , respectively. Blue blocks indicate that opposite motions are observed between these dimers.
Although the first GNM mode can only generate two modules, it can provide more dynamical information. After diagonalizing the Kirchhoff matrix, the first eigenvector corresponding to the highest eigenvalue can be derived and interpreted to represent the shape of a mode [61]. Figure 3(b) demonstrates that the shape corresponds to GNM mode of the Hb tetramer. It is easy to see that the shape of 1 1 dimer distributes under zero and 2 2 dimer above zero. Thus, the eigenvectors also partition the structure into two modules. In addition, some hinge sites were predicated at near zero positions, which are Thr41, Ala88, and Pro95 in Chain A, His146 in Chain B, Phe98, Leu105, and Ser138 in Chain C, and His2 and His146 in Chain D. Note that these hinge sites always locate at 1 -2 and 2 -1 interfaces. ENM results for modular analysis of -Hb and 2-Hb are shown in Supplementary  Figure S2.

Allosteric Mechanisms. Communication inside protein
complexes is implicit in collective motions which are inherent to the network topology [62]. Based on this idea, the signalprocessing properties of residues can be investigated by Markovian stochastic analysis coupled with GNM [63,64]. His122, and Leu125 in 1 chain and Ala27, Val109, Cys112, and Gln127 in 1 chain are residues with highest communication abilities. It is worth mentioning that the two chains and two chains have the same profile shapes. The distributions of these residues in 1 chain and 1 chain are also displayed in the three-dimensional representation (Figure 4(c)). It was found that Arg31, Cys104, Val107, and His122 in 1 chain and Cys112 and Gln127 in 1 chain are located at 1 -1 interface. Likely, the same region was also found at 2 -2 interface. ENM results for modular analysis of -Hb and 2-Hb are shown in Supplementary Figure S3.

Interface Characterization.
Protein interfaces are the sites where proteins or subunits physically interact. Identification and characterization of protein interfaces are not only important to understand the structures of protein complexes and protein-protein interactions, but also disease phenotypes [65]. Both GNM and ANM have been used to investigate protein-protein interfaces. Kantarci et al. [66] firstly applied GNM to classify interfaces of p53 core domain into the dimerization interface and crystal interface on the base of interfacial dynamics. Zen et al. [67] extended this method to study the interface of 22 representative dimers. More recently, Soner et al. [68] developed a web server to discriminate obligatory and nonobligatory protein complexes. Although GNM is the most used method to study protein-protein interfaces, we have showed here that ANM is also powerful to explore interfacial dynamics of Hbs.
Two kinds of interfaces have been classified in the Hb tetramer: allosteric sites located at 1 -1 and 2 -2 interfaces, which could be intended as allosteric interfaces. Hinge sites are detected always at 1 -2 and 2 -1 interfaces, providing structural interfaces. The analyses are in accordance with the results in Tekpinar and Zheng [42], which showed that allosteric interfaces are dynamically variable regions but not necessarily structural interfaces. In this section, square 1 subunits in isolated and tetrameric states based on the first 20 ANM modes, while 2 and 2 subunits show similar behavior. Although and subunits are structurally identical, they are different in length and sequence. Accordingly, 1 and 1 subunits show different types of fluctuations, which have also been predicted by the previous molecular dynamics (MD) study [69]. The mobility of 1 -1 and 1 -2 interfacial residues of 1 subunit is reduced in the tetramer. The same happens for the mobilities of 1 -1 and 2 -1 interfacial residues of 1 subunit. Therefore, the flexibilities of residues located at both kinds of interfaces in bound states are lower than in unbound monomers. This kind of dynamical property of interfacial residues has also been detected by the MD simulation [70].
In addition, a similar region (residues 45-57) with high mobility in both isolated states was found, which corresponds to a long loop distributing between two adjacent subunits. The mobility of this region in 1 subunit was reduced, while no reduction was observed in 1 subunit. This may suggest that this long loop in subunits is an allosteric region controlled by interfacial residues.
Among the interface residues, hotspots are defined as residues that have the greatest contribution to the binding energy. The prediction of hotspots is helpful not only to guide drug design but also to understand disease mutations [71]. Based on ENM results, Chennubhotla et al. [72] revealed that hot spots residues show a moderate-high flexibility at global modes. On the other hand, hot spots correlated very well with the residues with high mean-square fluctuations in the highest frequency modes in both GNM [73,74] and ANM [22]. Ozbek et al. [74] have found that hot spots predictions based on the highest, the second and third highest, and the average three and five highest GNM modes show similar accuracies. Our calculation demonstrates that the square fluctuation based on two highest ANM modes is enough to predict the distribution of of Hb-tetramer hot spots. The result for -Hb is shown in Figure 5(c). It is surprising that hot spots have been predicted only at 1 -1 and 2 -2 interfaces: Phe28, Arg31, Phe36, and Val107 in subunits and Arg31, Asn108, Val111, Cys112, Gln127, and Ala129 in subunits. Note that Arg31 and Val107 in subunits and Arg31, Cys112, and Gln127 in subunits are overlapped with allosteric sites. It also proved that allosteric interfaces rather than structural interfaces take part in the complex formation. The hotspots predicted for -and 2-Hbs show small differences but still located at the same interfaces (Supplementary Figure S4).

PCN Results.
In this section, results from the application of PCN method and spectral clustering are reported for the three structures under enquiry. Figures 6 and 7 and Supplementary Figure S5 clearly show that cluster partition satisfactorily matches with chains, yet with some divergences (region pertaining to a chain falling in a cluster mainly composed of residues belonging to a second chain, "whiskers" in the clustering color map). In comparison with ENM, PCN results of three states of Hbs exhibit quite high similarity, even between -and 2-states, as emerged mainly from the distribution of along the ribbon structures (see Figures 8 and 9). maps show the typical profile for PCNs (and not for other real world networks), with most residues having = 0 (only contacts with residues belonging to their own clusters). Residues with > 0 are mostly interesting for protein functionality, since they account for signaling transmission through the protein structure (global property of protein structure). High residues are spotted in the structure and mostly (but not necessarily) placed in the interchain region. In previous works [35,37,51,75], we demonstrated that the participation coefficient addresses the functional role of residues in protein binding and, in general, identifies residues with a key role in protein structural and functional features. maps instruct a cartography, addressing a specific role to residues, as reported in Table 2. Hubs are nodes with > 2.5, while values address the role of nodes to connect different clusters. The Guimerà-Amaral cartography of the three Hbs is reported in Figures 8 and 9 and Supplementary Figure S6, as original form, on maps, and projection on ribbon structures.
Noticeably, in PCNs 6 and 7 nodes are absent and only few 5 nodes are present, all at = 0. In other words, high nodes correspond to residues in charge for protein stability, while nonhub connector nodes are responsible for interdomain (intercluster) communication. Lys60 in 1 , Glu26, His63, Lys66 in 1 , and Leu28, Lys65, Leu68 in 2 in -Hb, whereas Gly24, Lys61, and Leu141 in two chains in      -Hb belong to 5 nodes. It is easy to note that 5 nodes distribute at chains within the protein interior.
As previously stated [58], 4 nodes (nonhub kinless nodes) are crucial for the allosteric signal propagation: their kinless nature poses in the gray zone where residues acting at a global level play, so their role in the protein functionality is central. It was found that Leu91 and Arg92 in 1 , His2, His116, and Ala129 in 1 , His89 in 2 , and Thr38, His116, and Phe118 in -Hb and Trp37, Cys112, Phe122, and Pro124 in -Hb belong to 5 nodes. Except His2, 4 residues were found at all four interfacial regions.

Comparison between ENM and PCN Results.
We finally explicitly superpose the ENMs and PCNs results, in order to better specify key residues and features in allosteric regulation of Hb. Average commute times predict allosteric sites at both protein interior and two interfaces ( 1 -1 and 2 -2 interfaces). In PCN, 4 nodes include allosteric sites at interfaces and 5 nodes include allosteric sites at protein interior. Combined with modular analysis and hot spots prediction, the use of ENM has advantage to classify proteinprotein interfaces.
On the other hand, PCNs analysis relies upon a set of network descriptors to approach the study of protein structures quantitatively. Table 3 reports the Pearson correlation coefficients between mean fluctuations and network descriptors, closeness centrality, betweenness centrality, and participation coefficient.
Betweenness centrality poorly correlates (negatively) with mean fluctuations, while closeness anticorrelates more strongly with mean fluctuations, especially in the more rigid structure ( 2/complex, Figure 10). The hyperbolic shape of the distribution confirms closeness is a general stiffness descriptor for protein structure. This property may indicate that closeness in PCN could provide an additional evidence to detect hinge sites. There is a relatively poor one-to-one correspondence of functional sites obtained between ENM and PCN, and thus the combination of these two approaches would improve the prediction.

Conclusion and Perspectives
ENM and PCN are light yet effective computational methods which simply require the three-dimensional coordinates of atoms in protein structures. In this work, the combination of the ENM and PCN methodologies has provided a plenty of information regarding the dynamic behavior of Hbs. It is noteworthy that the two classes of methods are able to catch the same features without a common, interexchange ground. In comparison with PCN, ENM can find the dominate motion for the conformational change of proteins and detect the dynamics of protein-protein interfaces observed by MD. Except for the topological parameters used in our work, there are more local and global network parameters that can be calculated in PCN to describe protein structures quantitatively. For example, residue centrality as a local network parameter was proposed to identify allosteric sites [76], and coefficient of assortativity as a global network parameter is related to the rates of protein folding [77]. In addition, we have found some correlations between ENM and PCN results. In previous studies [78], the average path lengths are highly correlated with residue fluctuations. Here, we show an additional positive correlation between residue fluctuations predicted by ENM and closeness centrality calculated by PCN. Although the general relationship between dynamical properties and more network parameters is needed to be established, we can conclude that ANM and GNM have advantages in studying dynamical properties and protein-protein interfaces, while PCN is better for describing structures quantitatively from both local and global levels.
In future, the combined description by means of these methods will largely contribute to understanding the dynamic behavior of complexes without heavy computational approaches, such as molecular dynamics (MD). Evidently, MD will anyway provide a very complete and fine description of dynamics, but the combination of lighter methods, such as ENM and PCN, will, for instance, guide MD simulations with well-grounded preliminary results, as preliminary approached in our previous works [79]. On the other hand, the two methods may help understanding the relationship between local fluctuation of residues and protein stability and functionality, being a primer for identifying key residues, responsible for lethal mutations. For example, the first attempt to combine ENM and PCN has been reported to investigate allosteric communication pathways [80]. In our work, we only indicate that ENM and PCN can be applied to four types of structural and dynamical paradigms. More detailed analysis for each case is needed. Although the integration of these two methods is just at the beginning, it would give a potential but powerful tool in structural systems biology.