Transcriptional Protein-Protein Cooperativity in POU/HMG/DNA Complexes Revealed by Normal Mode Analysis

Biomolecular cooperativity is of great scientific interest due to its role in biological processes. Two transcription factors (TFs), Oct-4 and Sox-2, are crucial in transcriptional regulation of embryonic stem cells. In this paper, we analyze how Oct-1 (a similar POU factor) and Sox-2, interact cooperatively at their enhancer binding sites in collective motions. Normal mode analysis (NMA) is implemented to study the collective motions of two complexes with each involving these TFs and an enhancer. The special structure of Oct proteins is analyzed comprehensively, after which each Oct/Sox group is reassembled into two protein pairs. We subsequently propose a segmentation idea to extract the most correlated segments in each pair, using correlations of motion magnitude curves. The median analysis on these correlation values shows the intimacy of subunit POUS (Oct-1) and Sox-2. Using those larger-than-median correlation values, we conduct statistical studies and propose several protein-protein cooperative modes (S and D) coupled with their subtypes. Additional filters are applied and similar results are obtained. A supplementary study on the rotation angle curves reaches an agreement with these modes. Overall, these proposed cooperative modes provide useful information for us to understand the complicated interaction mechanism in the POU/HMG/DNA complexes.


Introduction
Embryonic stem cells (ES cells) possess the pluripotency of differentiating into all the three germ layers (endoderm, mesoderm, and ectoderm), which correspond to hundreds of cell types. These pluripotent stem cells are transcriptionally regulated by a number of transcription factors (TFs) [1]. A specific TF called Oct-4, belonging to the POU class of homeodomain proteins, is regarded as a necessity for maintaining the undifferentiated state of embryonic ES cells. Generally, Oct-4 interacts with other TFs as a group to affect the gene expression of mouse ES cells in early embryo development [2], and Oct-4 coupled with its cofactor Sox-2 (HMG-box domain) is at the center of this group. Botquin and Nishimoto have both proven the cooperative effects of Oct-4 and Sox-2 on the expression of several genes in mouse embryonic ES cells [3,4]. Dailey and Basilico further bring forward the idea that the interaction within the POU/HMG group, especially for groups composed of Oct and Sox proteins, at DNA binding sites is a fundamental mechanism for transcriptional regulation in early embryo development [5].
At the early stage of transcription, TFs bind to specific regulatory DNA regions to cooperatively affect the transcription sites. Enhancers, which act as activators or stimulators for transcription [6], are a major type of regulatory DNA regions. Unlike promoters, enhancers may be located kilobases away from their target genes, but geometrically they are most probably close to the genes due to the supercoil structure of DNA molecules, and thus there can be direct contacts between the enhancer-TF complexes and the transcription sites. Studies on the enhancer-TF complexes are very important for understanding the complicated mechanism of transcriptional regulation.
On the other hand, molecular dynamics are involved in many biological processes [7,8], such as reproduction, regulation of gene expression, and protein interaction. As an indispensable component of gene expression, transcription must undergo a series of dynamical changes of biomolecules. Therefore, studies on dynamics of the aforementioned enhancer-TF complexes would provide a deep insight into their properties and functions in the transcriptional regulation. Specifically, deciphering the roles of Oct and Sox in the interaction mechanism of their enhancer-bounded complexes in the collective dynamics is of great scientific interest. Further, the cooperativity of the two proteins is a major research topic in these studies.
In our work, the dynamics of the POU/HMG group at its enhancer binding sites, referred to as POU/HMG/DNA complexes, are surveyed. Two POU/HMG/DNA complexes, which are DNA-binding portions of a POU factor Oct-1 and an HMG factor Sox-2 bound to an enhancer, are specifically studied from a structural and molecular dynamic view. Normal mode analysis (NMA) is implemented to study the collective or cooperative motions of these POU/HMG/DNA ternary complexes, after which the interaction of the POU and HMG factors at their DNA binding sites in these collective motions is explored. We propose a segmentation idea for the proteins to construct an equal-length-chain comparison and measure the correlation of each protein segment pair using the linear correlation. A statistical analysis on the significantly correlated pairs provides useful information on how these TFs have a synergistic control on enhancer DNAs in transcriptional regulation.

Normal Mode Analysis (NMA)
2.1.1. Introduction. NMA is an efficient method to detect the most cooperative or collective motions (essential modes) of large harmonic oscillating systems. With the constraint that the studied conformations are in the vicinity of the systematic equilibrium, which exists in most harmonic oscillating systems [9], NMA is useful for studying large structural deformations or motions of these systems. The idea is to use harmonic potentials to approximate a multidimensional energy landscape around an energy minimum for a system and to detect the most easily accessible modes on this energy landscape. NMA is broadly used to analyze the structural dynamics of biomolecules.
Specifically, if we describe an -site-system with a position vector , in which the three-dimensional coordinates of each site ( 1 , 1 , 1 ), ( 2 , 2 , 2 ), . . . , ( , , ) are used, we can mathematically expand the potential energy in a second-order Taylor series around the equilibrium conformation 0 [9]. Finally, we obtain a quadratic approximation as follows: Here Δ stands for the systematic structural changes relative to 0 , and is a 3 × 3 Hessian matrix, whose elements have the following form: Subsequently, the kinetic energy is brought in to slightly modify the Hessian to a mass-weighted one. These Hessian matrices contain key structural information for our observed systems. One broadly used construction method for the Hessian matrices is the elastic network models (ENMs) [9][10][11][12], which include the Gaussian network models (GNMs) [11] and anisotropic network models (ANMs) [12] as representatives. When ENM is applied, the equilibrium exploration can be skipped since the starting state is designed for this equilibrium. When constructing the ENM structure, the original system can be transformed into a network with nodes (CGsites) and connecting springs, and a cutoff distance is used to define all the connecting springs [9,10]. Gaussian network model (GNM) selects representatives for substructures in the system, such as using C -atoms for amino acids [9,11], to further lower the computational cost, leading to the potential form shown as (3) ( or represents a CG-site): Similarly, ANM proposes the potential form in (4) and ignores some influences caused by the distance vectors: Each eigenvalue of an above-constructed Hessian matrix denotes the associated systematic energy for the observed system, and its corresponding eigenvector represents the direction of a specific normal mode motion. Among the obtained 3 normal mode directions, the first six are trivial since they all correspond to zero eigenvalues, which means these structural changes have no effect on the systematic potential energy. For the remaining 3 − 6 eigenvectors, we will select a small set that corresponds to small eigenvalues (essential modes) for analysis [9]. In previous research, the first 10∼15 essential modes are chosen by many researchers for their work [13][14][15], and the first 10 are analyzed in our work.  [16] is utilized in our experiments. It is an ENM modelbased method. The implementation of a rotation-translation block approach [16] and an ARPACK library for the sparse matrix data storage and decomposition [17] in the computations of Hessian matrices makes it possible to retain up to 100,000 atoms for each structure. In our work, when calculating the motions using NOMAD-Ref, all atoms in POU/ HMG/DNA ternary complexes are used, while only motions of the POU and HMG proteins are analyzed since only protein-protein interactions in POU/HMG complexes at the DNA binding sites are of interest here.   composed of a POU factor Oct-1 (very similar to Oct-4), an HMG factor Sox-2 and an enhancer element. Figure 1(a) displays the 3D structure of complex 1GT0 and the diagram is produced using UCSF Chimera [19]. In 1GT0, the bounded DNA piece is a fibroblast growth factor 4 enhancer (FGF4) [20]; in 1O4X, the homeobox B1 (Hoxb1) enhancer is bounded by the two TFs [21]. Furthermore, each Oct protein contains two subunits (POUS and POUHD) that are connected by a flexible linker and control DNAs in a bipartite manner [21]. Based on the special structure of Oct proteins, we regard Oct-1 and Sox-2 in each complex as the two protein pairs for further investigation, namely, POUHD and Sox-2 as pair 1 and POUS and Sox-2 as pair 2, both of which are shown in Figure 1(b).

Analysis of Correlative Motions.
After generating the motions of the two POU/HMG/DNA ternary complexes using NMA, we observe how the two protein pairs behave at the enhancer binding sites in these most collective or cooperative motions.
For each protein pair in each ternary complex, we analyze the first 10 obtained essential modes. In each mode, we firstly refine an observed pair at the residue level from a view of motion magnitude. This can be achieved by calculating the motion magnitudes for all the atoms in each protein and subsequently computing the motion magnitude of each residue in this protein by averaging the motion magnitudes of all component atoms (see (5)): Here atoms = 1 ∼ comprise the residue ; ( 0 , 0 , 0 ) and ( , , ) represent the positions of atom in its equilibrium position and in a specific mode, respectively. Therefore, for each mode, we will obtain a motion magnitude curve for each protein in an observed pair, and each curve point corresponds to a residue along the protein sequence (Figures 2(a) and 2(b)).
Next, in each protein pair we observe the potential protein-protein cooperativity in these motions based on the correlations of motion magnitude functions. An effective method to measure the dependence between two quantities is the Pearson product-moment correlation coefficient [22][23][24] also is usually called the correlation coefficient. This coefficient is calculated based on the expected values ( ) and standard deviations ( ) of the two variables (X and Y), as shown in (6): We adopt this correlation coefficient in our studies. However, since each protein has a different length, we investigate the most cooperative/correlated segments among each protein pair in each mode. We introduce a segment length parameter here. For an observed pair of proteins that have different lengths of and , with a specific ( ≤ < ) defined in a mode, we shift one motion magnitude function along the other to find the -length-segments which share the largest absolute correlation value (Figure 2(c)). We could further describe the process as follows: Here 1 and 2 represent motion magnitude functions of the two proteins in an observed pair, in a specific essential mode; Seg and Seg denote -length-segments of 1 and 2 , respectively.
In each modes with a list of values defined for each protein pair, we obtain a series of most cooperative segment pairs having correlation values , where denotes different values and (1∼10) represents different modes. Here we replace by = / for easier illustration and is the shorter length in the observed pair. Since larger absolute value of correlation demonstrates more correlated segments (positively or negatively), we investigate how | | distribute for the two protein pairs in each complex. For each in an observed pair, the median value (8) is extracted and explored. Furthermore, the performances (based oñ) of the two pairs in each complex are compared: Now, we use medians in (8) as a filter and investigate how those | | larger thañ(supposed to be significant) Pair 1 Pair 2 C 1 2 (p 1 = 0.9) C 1 6 (p 1 = 0.5) C 2 2 (p 2 = 0.9) C 2 6 (p 2 = 0.5) distribute. For each protein in an observed pair, we can obtain a logic matrix that reflects this process: Here [ ] gives the matrix that is composed of . We subsequently examine the relationship between the two protein pairs in each complex based on these logic matrices. The idea is to explore that in a single essential mode whether only one significantly correlated segment pair (either in protein pair 1 or pair 2) is involved or both pairs are involved. To balance the segment lengths ( ) used by the two pairs, we take into consideration all the length pairs ( 1 , 2 ) between the two pairs, as presented in Figure 3. Here we use superscripts to distinguish pairs 1 and 2.
To fulfill the aforementioned operation, we conduct several iterations for all the values and combine the results of these iterations. We now take rows in 1 (denoting a specific 1 value) and show how the whole procedure is accomplished. In each iteration, we firstly expand the involved row (identified with a subscript ) into a matrix in (10) and then carry out statistics on the cases where three situations occur: (a) 1 (index 1 )-only the significantly correlated segment pair in pair 1 is detected in a single essential mode with a length pair ( 1 , 2 ), (b) 1 (index 2 )-only the significantly correlated segment pair in pair 2 is detected, and (c) (index )-both pairs 1 and 2 are detected. The statistical analysis is based on logic operations, as shown in (11), which combines all the iterations to derive the final indexes for the two pairs: Computational and Mathematical Methods in Medicine Here "⋅×" means a batch of multiplications of the corresponding elements in two matrices, and sum( ) counts the number of ones (a logic "true" value) in a logic matrix . Indexes 1 , 2 , and separately show three cooperative modes (corresponding to the aforementioned three cases) between the two protein pairs in a POU/HMG/DNA complex. We exhibit some representative cooperative modes in Section 3, where we also list the above-mentioned indexes for the two complexes. Furthermore, to take the signs of the correlations into consideration, we introduce another logic matrix that describes the signs of , as stated in (12). Through combining logic operations of and (13), we can divide the situations ( 1 , 2 , and ) into subtypes (positive and negative), and all these subtypes are analyzed in Section 3: 1, = ( 1 , 1 , . . . , 1 ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ ) 6 , To compare the scenarios where different filters are applied, we, respectively, apply the first tertile, the first quartile, and the mean value as filters to investigate the corresponding results. The mean filter can be described as (14), and the quantile filter as (15), where Pr represents probability.
Specifically, the tertile and quartile filters correspond to situations where = 1/3 and = 1/4, respectively. A series of operations are then carried out based on these filters, to reveal how the observed complexes behave in these situations: Finally, to gain a deep insight into the motions of these two complexes, we have also observed the rotation angles of the corresponding protein chains. In the above discussion, we regard residues as basic units in protein sequences, and here we consider the links between each consecutive two residues (Figure 2(b)). The angles between each pair of corresponding links in the original structure and in deformed structures (modes) are studied. We obtain a rotation angle function for each protein of a protein pair in each essential mode. Afterwards, we conduct a similar analysis as aforementioned on these rotation angle functions as a supplementary study. Principal component analysis (PCA) is implemented to reduce the effect of noisy rotation angles. We also investigate the suitability of the Fourier transform for data analysis.

Motion Magnitude Functions.
For each protein in an observed pair of a ternary complex, we calculate the motion magnitude functions (5) for the first 10 essential modes. Figure 4 shows the motion magnitude curves for the two observed protein pairs in 1GT0 for the first essential mode.
After defining a list of values, we calculate the most cooperative/correlated segment pairs among each protein pair in a complex for the 10 essential modes, using the mechanism discussed in Section 2.2.2. Since small values correspond to shorter segment matching, whose results may be trivial due to the high correlation possibilities, we use a set of values starting at 0.5 to 1.0 at a step of 0.1. Table 1 shows the results of correlations for the most cooperative segment pairs among protein pair 1 of 1GT0.
The larger the absolute value of correlation is, the more the two compared segments correlate with each other, either positively or negatively. Now we examine how the absolute correlation values | | distribute, for the two protein pairs in each complex. The values are presented in Figure 5, where parts (a) and (b), respectively, show the values for the two pairs in 1GT0, and parts (d) and (e) show those for 1O4X. We can see from these diagrams that | | becomes larger when gets smaller, and this can also be detected from the median valuẽshown with a pink circle in each box (denoting a specific ). To give a comparison between the performances of the two pairs in each complex, we extract the above-mentioned median values̃for each pair and present them in parts (c) (1GT0) and (f) (1O4X). In diagrams (c) and (f), especially (f), pair 2 presents a higher̃than pair 1, which  to some extent implies that pair 2 may behave as a leading role in the Oct/Sox interactions. Next, we use the above-mentioned medians as a filter and investigate how those | | larger thañ(supposed to be significant) distribute. For each protein in an observed pair, we calculate its logic matrices and (Section 2.2.2), which correspond to (9) and (12), respectively. We subsequently study the logic matrices of the two protein pairs ( 1 and 1 , 2 and 2 ) in each complex, after which we propose several cooperative modes between the two pairs and conduct statistical analysis according to (10) and (11). In detail, these modes include (a) mode 1 (index 1 )-only the significantly correlated segment pair in pair 1 is detected in a single essential mode with a length pair ( 1 , 2 ), (b) mode 1 (index 2 )-only the significantly correlated segment pair in pair 2 is detected, and (c) mode (index )-both pairs 1 and 2 are detected. To visually show the cooperative modes and , we select parts of the results of 1GT0 for 1 = 2 = 0.5 as a display in Figure 6, in which 1 , 2 , and modes are, respectively, presented with the significantly correlated segment pairs colored.
Mode denotes that only one protein pair, either pair 1 ( 1 ) or pair 2 ( 2 ), is significantly involved in a specific collective motion. This indicates that only one subunit, either POUHD or POUS, is significantly involved in the cooperativity with Sox-2 in an essential mode. Mode implies that both subunits are involved in the interactions with Sox-2. Detailed statistical results are reported in Table 2. In this table, cooperative mode occurs more frequently than modes 1 and 2 in the two complexes, which have the tuples of (82, 82, 98) and (85, 85, 95) for the indexes ( 1 , 2 , ), respectively. This implies that, compared with mode 1 or 2 , both subunits of Oct-1 frequently participate in the interactions with Sox-2 at the same time, as mode .
Furthermore, we divide the modes and into subtypes, positive subtype and negative subtype, and their statistics are evaluated using (13) and listed in Table 2 a positive sign of for the significantly correlated segment pair in protein pair 1 or 2, and the negative one ( 1,negative and 2,negative ) indicates a negative sign. In mode , the positive subtype ( positive ) denotes a scenario where both significantly correlated segment pairs in the two protein pairs share the same sign of (+/+ or −/−), and the negative one ( negative ) represents two different signs (+/− or −/+). From Table 2 we notice that, for mode 2 in both complexes 1GT0 and 1O4X, the positive subtype has a lead; for modes 1 and , the negative subtype is in the lead for 1GT0 while the positive one is in a dominant position for 1O4X.
We have also applied the first tertile, the first quartile and the mean value as filters and similarly conducted the statistical analysis as illustrated above. Tables 3, 4, and 5 present the results for these three scenarios, respectively. As shown in these tables, the gap between the occurrence frequencies of mode and mode 1 (or 2 ) becomes larger, and the dominant occurrences of mode are demonstrated. Besides, for modes 1 and , complexes 1GT0 and 1O4X have the opposite subtype distributions, while for mode 2 , they present a similar distribution. Overall, these additional results are consistent with the previous one (the median filter).

Rotation Angle
Functions. Subsequently, we calculate the rotation angle functions for each protein in each complex in the first 10 essential normal modes (described in Section 2.2.2). Figure 7 shows the rotation angle curves of proteins in the two protein pairs of 1GT0 in the first essential mode. Table 2: Statistics on the occurrences of the cooperative modes and , and their subtypes, in the 10 essential modes for all the pairs (p 1 , p 2 ), using the median filter for motion correlations.  Since the rotation angle functions contain a lot of noise, we apply the principal component analysis (PCA) to the 10 Table 4: Statistics on the occurrences of the cooperative modes and , and their subtypes, in the 10 essential modes for all the pairs (p 1 , p 2 ), using the first quartile as a filter for motion correlations.  Table 5: Statistics on the occurrences of the cooperative modes and , and their subtypes, in the 10 essential modes for all the pairs (p 1 , p 2 ), using the mean value as a filter for motion correlations. rotation angle curves of each protein in the two complexes to obtain the first principal component (PC), leading the rotation angle curves ( = 1∼10) of each protein to a single condensed PC curve. We similarly carry out the correlation analysis of the PC curves in each pair. Table 6 shows the statistical results for the two complexes. Intuitively, 1GTO presents the distinct cooperative mode of 1 , where pair 1 shows more significantly correlated segment pairs (with a positive subtype), while mode is the dominant one in 1O4X, where many significantly correlated segment pairs occur in both pairs (with a positive subtype). Now we apply the Fourier transform to analyze these noisy rotation angle values. Simply, the magnitudes of the transformed signals are regarded as our new data. The segmentation and correlation calculation are implemented, after which the statistical analysis is carried out. As an example, we use the first quartile as a filter for the correlations of rotation angle functions. The results are listed in Table 7, where we can see that the negative subtype of each cooperative mode is concealed after the transform. This implies that the Fourier transform may not be a suitable tool for handling these rotation angle values. More efficient strategies should be explored in the future to deal with these data.

Conclusions
In this paper, we performed NMA to study the collective motions of two TFs, Oct-1 and Sox-2, at their enhancer binding sites, aiming to gain an insight into the cooperative manner of these two TFs through the dynamics of their enhancerbounded complexes. Based on the special structure of Oct  Table 7: Statistics on the occurrences of the cooperative modes and , and their subtypes, in the 10 essential modes for all the pairs (p 1 , p 2 ), using the first quartile as a filter for the correlations of rotation angle functions. proteins, we treated an Oct/Sox group as two protein pairs and comparably investigated how these two pairs behave in the collective motions. A segmentation idea was introduced to explore the most correlated segments in each protein pair, according to the correlations of motion magnitude curves (or their segments). A median analysis on these correlations was conducted, which shows the leading role of subunit POUS (pair 2). Furthermore, based on statistics of the correlated segment pairs having a correlation value above the corresponding median, we proposed several motion cooperative modes ( 1 , 2 , and ) and their subtypes (positive or negative). The first tertile, the first quartile, and the mean value provide consistent results. Moreover, the supplementary study on the rotation angle functions presents a consensus about these modes. These proposed modes provide a clue that when binding to different regulatory DNA regions or involved in different collective motions, Oct-1 has a synergistic relationship with Sox-2 either with one of the components, POUS or POUHD, or both of them, POUS and POUHD at the same time. Cooperativity, in protein-DNA [25] and protein-protein [26] interactions, is an important feature in biomolecular interactions. In our work, we carried out a series of studies on the cooperative manner of Oct and Sox at their enhancer binding sites, which are important elements in the transcriptional regulation of embryonic stem cells. This work reveals how the two proteins work together physically and structurally at two specific DNA biding sites. The method developed here can be useful for the analysis of molecular interactions in other protein-protein and protein-DNA complexes.