The Evidence of Cooperative Binding of a Ligand to G4 DNA

Intrinsic constants of the ligand binding with G4 DNA (guanine-rich DNA sequence) using quantitative standards can be convenient providing the assessment for elucidating the possibility of such structures participation in biochemical processes. In the present communication, the hard + soft modelling approach to calculate intrinsic constants of a ligand binding with short DNA molecule, particularly such as G4 DNA, has been proposed. The suggested approach has focused upon the quantitative evaluating of a mutual influence between sites and between bound ligands. The cross-validation between a new hard + soft modelling and conventional stepwise complex formation algorithm has been conducted. A number of simulated examples will illustrate the methodology. The experimental mole-ratio titration of TMPyP4 by G4 DNA [(CG3)2CGC(AG3)2G] has been reexamined. The [(CG3)2CGC(AG3)2G] that folds from a G-rich sequence found in the promoter region of c-kit oncogene can be considered as a molecule with two equivalent mutually influence binding sites.


Introduction
In the past decade, a great research effort around fourstranded guanine-rich DNA sequences known as G4quadruplexes (G4) [1] and small molecule ligand interactions was motivated as both innovative basis for the design of novel inhibitors of telomerase activity [2] and selective fluorescent probes [3]. Recent bioinformatics studies have shown that in the human genome there are approximately 350,000 sequences that can potentially form G4 structures [4]. The current interest in G4 interactions is based on the concept that small molecules binding would stabilize these structures' formation, which causes an inhibition of telomerase activity, which plays a role in the immortal growth of cancer cells. Such small molecules-ligands are promising as novel anticancer drug candidates. Information in huge number of publications would be convenient to systematize using the stability of the complexes formed by the components of the systems in question or by the ability of ligand to bind G4 molecules present in the solution. However, in the present, such systematization is difficult to perform. The lack of reliable, quantitative characteristics for G4-quadruplexes binding properties complicates the development of bioinformatics tools, particularly for the analysis of involvement G4 in the inhibition of telomerase activities in vivo and for the discovery of new anticancer agents. To address this issue, robust binding models are needed to investigate G4-ligand interactions.
The TMPyP4 (5,10,15,20-tetrakis-(1-methyl-4-pyridyl)-21H,23H-porphyrin) is one ligand shown to be capable of binding to the G4-structures. The interaction between TMPyP4 and G4 has been extensively studied since it can inhibit the activity of telomerase upon binding to human telomeric sequences [5]. However, its binding mode is still controversial [6]. This cationic porphyrin contains a large -planar structure and cationic functional groups providing strong -stacking and electrostatic interactions with Gquartets and phosphate groups on the G4 DNA. The variations in the UV-Vis absorption and fluorescence spectra of the binding TMPyP4 ligand are normally the result of a change in the molecular conformation of the ligand in addition to the change in the environmental polarity and strong transition moment dipole coupling between the base pairs of DNA and the ligand. The cationic porphyrin G4 interactions have been studied extensively using a wide variety of spectroscopic techniques and isothermal titration calorimetry (several examples reported previously are summarized in Table S1; see Table S1 in the Supplementary Material available online 2 Journal of Analytical Methods in Chemistry at https://doi.org/10.1155/2017/6780521). The UV-visible optical spectroscopy (UV-VIS), fluorescence resonance energy transfer (FRET), and circular dichroism (CD) via the spectral changes provide crucial information about ligand G4 DNA interactions, such as stoichiometry, stabilization strength, and especially the selectivity for G4 in comparison to ds DNA [7]. However, the detail of the binding mode remains to be completely elucidated. Several models of the binding mode have been proposed to date, especially for intramolecular quadruplexes from human telomeric sequence [8].
The nature of the G4 DNA binding site is unmistakably crucial for understanding the binding stoichiometry which is one of the most important hard modelling parameters. The TMPyP4 planar molecule favours -stacking with Gquartets, including end-stacking and intercalation, but also groove binding, and direct coordination to bases or the phosphate backbone is possible [7]. The cationic properties of TMPyP4 allow electrostatic interactions with the negative backbone of G4 DNA. In addition, the coexisting binding modes have been recently revealed [9]. As the porphyrin concentration increased, TMPyP4 started to stack along the DNA. The central flat part and peripheral N-methylpyridyl moiety of TMPyP4 have a common structure for self-stacking and intercalation; hence the patterns of shifts in UV-Vis and CD spectra were almost identical in the mentioned binding modes [10]. For either intercalation or end-stacking the stoichiometric ratio of ligand/G4 was found to be 2 : 1 in most cases [11]. However for an intramolecular antiparallelparallel hybrid structure [d(AG3(T2AG3)3)], the weak anticooperativity of TMPyP4 accommodation onto two from three nonidentical sites was detected [12]. A complete picture is thus obtained, revealing the coexisting binding modes of intercalation and surface binding [12]. Subsequently, we can more precisely predict binding mode if in addition to the patterns of shifts in UV-Vis and CD spectra the stoichiometry and intrinsic binding constants are determined from experimental data.
The evaluation of intrinsic binding constants which belong to the G4 DNA-ligand-binding site seems to be the first important step toward understanding the ligand exertion of influence on G4 DNA. These interactions are difficult to monitor because of complex structures and topologies of G4-quadruplexes, different sites and potential modes of binding, and the presence of other secondary nucleic acid structures. The problem is related to the high and very dynamic topological plasticity of quadruplexes, and it is very difficult to study the binding of high-affinity ligands without evoking the great topological consequences of the ligands binding that in turn strongly influence the second binding event. Quantitative description of interaction G4 DNA with TMPyP4 has been performed using either the direct fit method [13] or the independent binding sites model [14] or the conventional stepwise complex formation model [15]. All these methods do not take into account the mutual influence (MI) in the course of successive ligand binding. Reasonably in addition to the different classes of binding sites, the cooperativity of interactions should be taken into account [16].
The thermodynamic effect of cooperativity may be due to changes in the macromolecule's conformation or redistribution of the electron density at the site due to the influence through oligomers bonds or by the MI between bound ligands via space [17]. A variety of methods have been developed for computational explanation of cooperative interactions of a ligand with set of binding sites [18]. If these sites are equivalent and independent, relevant experimental data may be analysed toward the thermodynamic binding constant and the number of binding sites in terms of a linear Scatchard plot [19]. Since this presumes a most simple binding model, such an approach is susceptible to substantial misinterpretations in practice [20].
The McGhee and von Hippel model which takes into account the MI between nearest neighbouring bound ligands has been the most widely used [21] in combination with the Scatchard equation for description of small molecules with homogeneous polymers. Sometimes those equations are incorrectly used to describe interactions with G4 and other DNA nonhomogeneous oligomers even though the authors of the model have concluded that the binding of ligands known to have heterogeneous binding sites on the DNA cannot be adequately treated based on the conditional probability approach [21]. Although Scatchard analysis is very much attractive owing to pictorial rendition, it has serious implications of unreliability regarding the shape of a Scatchard plot and it is evident that this method is not appropriate for quantitative evaluation of the binding of TMPyP4 with G4. The Job plots analysis of the ligand binding with the G-quadruplex sites accounting for their MI has been described in [12].
To compare the interpretation of spectrometrically measured data using the canonical stepwise model with that of mutually influencing binding sites, we considered the combination of hard model-based fitting and soft modelfree Singular Value Decomposition (SVD) [22]. The Rank Annihilation Factor Analysis (RAFA) has also aided in unambiguous revealing of the cooperative interactions. This solution should improve our understanding of the mechanism of the G4 folding and binding at a molecular level, which in turn will help in highly efficient and selective telomerase inhibitors discovery.

Description of the Model.
To simulate G4-ligand interactions the model based upon a multiple set of binding sites has been proposed [12]. When binding with G4 DNA is encountered, then each monomeric ligand is assumed to be bound to one of the sites on the DNA oligomer. The cooperativity is the enhancing of binding affinity in comparison with the sole binding, but anticooperativity is the reduction of binding affinity. The secondary interaction is an important source of ligand binding cooperativity. In DNAligand complexes, the arrangement of the ligands decides the total stability of the complex and elucidates that secondary interactions between neighbouring ligands can contribute additional stabilization [23]. To find a sign of cooperative interactions just the influencing of ligand on another ligand Journal of Analytical Methods in Chemistry 3 bound in the close vicinity has been taken into account. Consider intrinsic constants which govern the binding of ligand to a set of sites. In general case, the binding of ligands among vacancies gives 2 possible variants of coordination. The one-dimension matrix of site-specific equilibrium constants K 1-N (K 1-N are constants for a ligand binding to each isolated site 1 ⋅ ⋅ ⋅ ) and matrix Ω of the MI corrections (cooperativity parameters [ 1 2 ⋅ ⋅ ⋅ ]) are constructed in accordance with hypothesis about structure of adducts formed in DNA oligomer saturation by a ligand. The values of K 1-N and i could be found simultaneously as was described previously [24].
We will start here with looking at the simplest of situations for ligand binding, namely, the interaction with bidentate oligomer. The matrix of configurations M has been introduced in order to describe the detailed set of configurations adopted by the macromolecule in microstates. Each line of the matrix M displays a microstates configuration through a sequence of zeroes in those positions where the sites of multidentate macromolecule are not occupied by a ligand, and with a sequence of ones in those positions where the sites are bound to the ligand. In the course of the macromolecule sites saturation, previously bound ligands can affect the binding of the following ligands. For the oligomer containing two nonidentical binding sites, the matrix M * can be written as ] . (1) The product of the nonzero elements in M * provides the column of stability constants of microscopic species here the matrix E is all ones and the same size as M. The estimates of the true nonlinear variables K 1-2 and allow calculating the matrix C DNAL of DNA oligomer species of concentration for the th total concentration of ligand and DNA, correspondingly and .
] . (3) Here This species in conventional stepwise complex formation model is regarded as the 1 : 1 complex. Once we know C DNAL , we can calculate concentrations of free sites and these bound with a ligand.
2.2. The Validation of the Proposed Model. According to Lambert-Beer's law, a spectrometrically measured data matrix A is the product of a concentration matrix C and a matrix S of molar absorptivities of species that contribute to the absorption of light at any wavelength (species pure spectra). The corresponding general matrix equation is given as here is a matrix of residuals. The number of rows in matrix A is equal to the number of studied solutions with different components concentrations. The number of columns in matrix A is equal to the number of wavelengths. To simulate the data matrix A a concentration matrix C = C MonL has been computed in accordance with equilibrium model of twostep ligand binding. The total light absorbance is due to the absorption of the occupied Mon 1 , Mon 2 and unoccupied sites Mon 1 , Mon 2 of oligomer molecules and absorption of free and coordinated ligand . The data set was designed to simulate an interaction ligand with oligomer (TMPyP4 with G-quadruplex DNA), which has two binding sites. Since the rank of data matrix A is equal to the rank of C or S, whichever is smaller, and since the rank of either C or S can be no larger than the total number of spectral species, the rank of A can be no larger than the total number of spectral species, provided the number of wavelengths and the number of studied solutions are both greater than the number of spectral species. For simplicity in the current simulations residuals were put to zero. Thus the rank is the number of singular values of AA that are bigger than machine precision (10 −10 -10 −13 ).

The Least Square (LS) Fitting
Procedure. The fitting procedure should deconvolve the data matrix A into matrices of concentrations C and molar absorptivities of a matrix S. Multiplication of both sides of (5) on a pseudoinverse [22] matrix C + = (C T C) −1 C T , computed with an initial estimate of the required parameters, gives a current estimation S * of the true matrix of molar parameters S. The best set of linear parameters S * for system in question can be calculated explicitly in a linear least-squares calculation by In order to evaluate linear as well as nonlinear parameters for a chemical model such as that described above, a decomposition of the experimental data matrix A is executed under a nonnegativity constraint. The product of S * and C + gives A calc at the current values for nonlinear parameters. 4

Journal of Analytical Methods in Chemistry
Consequently, the iteration formula for refinement of the required nonlinear parameters can be written as where [⋅ ⋅ ⋅ ] >0 is the operator that substituted negative values to zero values (nonnegativity constraint only for spectral profiles was applied); is the residual matrix with unexplained data variance (here is machine precision). Relevant experimental data should be analysed toward the intrinsic association binding constants K 1-2 and intrinsic parameter of cooperativity to obtain a best fit to the data. When optimal nonlinear parameters were found, the pure species spectra have been calculated in accordance with (6). The Levenberg-Marquardt nonlinear LS algorithm is used for refinement of parameters [25]. The iterative procedure described above is used to find the minimum of the sum of squares of deviations of all absorbance data matrix elements which were calculated using theoretical model, from the corresponding matrix considered on the current iterative step, calculated in turn by using (7). Refinement was stopped when the relative difference between consecutive iterations fell below a threshold value. The Hamilton factor was used in the present work to compare matrices by applying the following equation: where Trace is the sum of the diagonal elements. Equation (8) was used to calculate lack of fit (lof). In this case, F exp = A exp ; F calc = A calc . Second, (8) was used to compare the distribution diagrams calculated by application of conventional stepwise chemical model from the ones simulated by the cooperative model. In this case, a concentration matrix F exp = C DNAL was calculated using a model, and a matrix F calc = C calc was calculated in turn by using (3) on the last LS iterative step. In all cases, Matlab-20145 software was used in the calculations. A 1a ). The UV-Vis absorption spectra changes of an aqueous solution in binding reactions of TMPyP4 and G4 in all known cases are typically similar. To simulate data set one have to construct bathochromic red shifts 15-20 nm and hypochromicities at max (60-70%) of the Soret band were observed during coordination TMPyP4 to G4. Spectra S L and S Mon used to build the data sets A exp are shown in Supplementary Material ( Figure S1). In the simplest case, we assume that spectrum S MonL is distorted superposition of S L and S Mon . This situation is a binding without spectral interference. More general case of spectral interference (SI) is the mutual perturbation of the ligand spectra bound in the vicinity.

Cooperative Binding (Data Matrix
Let us consider the model defined by equilibrium constants lg K 1-2 = [6.5 6.5] and = 2.0, governing the formation of a Mon species (two equal sites plus a positive binding cooperativity between sites). Simulated data matrix (see Supplementary Material, Figure S2) A exp should be written as Here elements of the matrix are put to zero; S L is molar absorptivities of free ligand. Given that two sites are equivalent and have no SI the matrix of molar absorptivities of free and occupied sites I-II is S SVD applied for A 1a yields = 3 indicating the three optically active chemical species in the full spectrum and local rang = 2 indicating the species in spectral window of TMPyP4.
To fit the data, in accordance with proposed approach, the conventional noncooperative model of stepwise complex formation first has been tested.
However the LS fitting gives a poor fit to the target matrix A exp . The reproduced A 1a matrix is computed as Here S F = [S DNA S DNAL S DNAL 2 ] ; is lof. Once we analysed simulated data matrix A 1a using a model of independent binding sites, taking into account the existence of 2 sites of 2 types, the best fit was model with two equivalent binding sites lg K 1-2 = [7.04 7.04], and a lof is 0.2%. Figure 1(a) shows distribution diagrams of hypothetical DNA oligomer between species DNA, DNAL, and DNAL 2 . As can be seen from Figure 1(a), the resolved distribution of the species significantly differs from the theoretical one (PE 18.4%). Figure 1(b) shows simulated and resolved pure species spectra. The shape of spectral profiles is due to calculated monomers spectra. The spectrum of species DNA 2 is doubled spectrum of DNA , 2S DNAL = S DNAL 2 . A 1a . Given that spectra S DNA and S L are easily measured experimentally, we can calculate the matrix A .

RAFA of the Data Matrix
With the result of (13), A exp can be divided into two parts, where the known species and DNA only contribute to the one part. The other part A contains linear combination of the unknown spectra. The SVD of matrix A 1a indicated no rank annihilation of A 1a including concentration window of the ligand. This result conforms that spectral species are inaccurately recognized. The difference between theoretical and calculated fractions of free sites and these bound with a ligand, that is, Mon 1 , Mon 2 , Mon 1 , Mon 2 , Mon 2 , also can be seen. However, the concentration of free monomer is calculated correctly. The matrix A may be written in the form: The reduced data matrix A 1a has been obtained by subtracting the contribution of the proposed model species from the complete data set. The rank of the matrix A 1a is less than the rank of the full A 1a matrix = − 1. So in the full spectral range one species is correctly recognized.
The rank annihilation completes the conclusion that sites are indistinguishable monomers. A 1b ). To simulate a data matrix A 1b obeying (9), the same spectra as in the previous example were used (see Figure S1). Concentrations are calculated from the model lg K 1-2 = [6.5 6.5], = 0.3. Fitting the absorbance data matrix using conventional stepwise model without cooperativity gives the lof(A 1b ) ≈ 0.0%. Here the best fit gives model considering 2 sites of 2 different types lg K 1-2 = [6.79 5.16]. The stepwise model provides a good fit to simulated data. Consequently, we have a model ambiguity as a result of the rotational ambiguity. All species fractions quantified very well (see Table 1).

Anticooperative Binding without SI (Data Matrix
RAFA of the Data Matrix A 1b . As can be seen from Table 1, SVD is applied on A 1b , A 1b , and A 1b giving the different result as for the above case of positive cooperativity. One can see the rank annihilation in full spectral range and in a spectral window of ligand. This result suggests that subtracted spectral species are correctly recognized. The rank of the matrix A 1b is less than the rank of the full A 1b matrix = − 1. So in the full spectral range extra one species is correctly recognized. A 1c ). In the previous examples, we have used an assumption that all G4-ligand complexes formed possess the same absorbance spectrum, irrespective of the binding mode. However this assumption is violated in most actual systems, probably in each system with multiple binding sites. We will use SI when binding in each site leading to a change in the spectrum of ligand in the vicinity. Given that bound ligands have SI, an experimental matrix A 1c should obey (12) where the matrix of molar absorptivities is

(3)
Anticooperative binding with SI is concentration matrix calculated from the before used binding model lg K 1-2 = [6.5 6.5], = 2.0. The distribution plot of resolved oligomer species is very similar to these shown on Figure 1. SVD is applied on A 1c yielding four main factors defining absorbance variance ( = 4), and the intrinsic absorption of the ligand is determined by three factors ( = 3). With the application of (13), the rank annihilation was not achieved. This suggests that all forms are calculated with poor accuracy. Application of (14) for total spectral range gives = 3 which means the correct calculation of only the monomer contribution. A 1d ). Consider the binding model lg K 1-2 = [6.5 6.5], = 0.3. The model is used to simulate anticooperative binding with SI. In this example, the model ambiguity is the result of LS fitting. The presence of SI does not affect the shape of oligomer distribution diagram shown on Figure 2. However, the result of RAFA is not the same. The identification of sites identity is ambiguous in this case. As can be seen from Table 1, the result of SVD application on matrices A 1d , A 1d , and A 1d is the same as for these for two types of sites with different spectral characteristics (see below). A 2a ). Consider lastly an oligomer which has different binding sites. Consequently each site should have different affinities for the ligand and may have a different absorbance signal. The model with a strong binding site and weaker binding site lg K 1-2 = [7 6] to simulate the data matrix was used. Simulated data can be decomposed using the model of two equivalent binding sites and anticooperativity lg K 1-2 = [6.74 6.74], = 0.58 with excellent precision (see Table 1). In this case to simulate data use (12), but reproduced data matrix is calculated in accordance with (9). Resolved distribution of G4 oligomer species well reproduce those used for simulation. The rank of a data matrix constructed in accordance with (12) is equal to the rank of C ( = 4). The rank of both A 2a and A 2b in spectral window ( = 3) is less than the number of spectral species in full spectral range because the light absorption by monomer species is zero in this interval. Nevertheless, the rank of A along with A two units less compared with A can be seen due to rotational ambiguity. Moreover we can see the same sequences of ( ), ( ), and ( ) regardless whether or not there is spectral MI. Thus, given example of binding shows inapplicability of RAFA to indicate the fact that sites are different or the same sites bind in anticooperative fashion.

The Binding to G4 DNA with Different Sites (Data Matrix
3.6. The TMPyP4 Binding with Nucleotide Sequence 5 - To illustrate the applicability of the suggested approach to model porphyrin binding with a macromolecule, a porphyrin TMPyP4 binding to 5 -(CG3)2CGC(AG3)2G-3 was selected as experimental example. UV-absorbance experimental data published previously [15] has been used for further numerical treatment. In the original work [15] mole-ratio experiments to study the G4-ligand interaction were performed. When solution of G4 is added to solution of TMPyP4 in buffer pH 7.1, at 25 ∘ C the induced bathochromic red shifts and hypochromicities for the Soret band clearly indicate the DNA-ligand complexation. A stepwise complex formation model has been applied in [15] to interpret the experimental data (matrix A 3e ). Conventional stepwise complex formation algorithm best fits the data matrix which included one strong binding site and one weak binding site (lg K 1-2 = [7.07 5.79]), lof(A 3e ) ≈ 0.25%. However, the same fit gives a model with two equivalent anticooperative binding sites. The calculated intrinsic binding constants K 1-2 and the cooperativity parameter are summarized in Table 2. The calculated distribution diagrams for this mole-ratio spectrometric titration of G4 by TMPyP4 for both models are compared in Figure 2. For this G4 DNA example of binding interaction, the LS fitting discloses a model ambiguity. The values of ( ) we have given in Table 2 equal the values of all these quantities which are given in Table 1 for anticooperative binding with identical sites without SI. Based on this observation, one can suggest a model of anticooperative assembling of TMPyP4 and G4 DNA with identical binding sites without SI. In addition, the calculated pure spectra for species involved in the equilibrium are shown in Figure 2. The spectra are calculated according to the following equation: The spectra associated with the ligand are compared in Figure 2(b). Some difference between Mon 1 and Mon 2 spectra can be seen. But this very little difference is probably due to insufficiency of free sites and free ligand fixed spectra measurements. The calculation of pure absorption monomers spectra is impossible because of rank deficiency after subtracting contribution of free ligand absorbance from full data matrix A 3e (see Table S2 in the Supplementary Material). In summary, the methodology of the absorbance data matrix numerical treatment for G4 DNA/ligand equilibrium system has been successfully developed using a hard modelling matrix method and Rank Annihilation Factor Analysis. Accurate identification of binding sites identity   is the rank of full data matrix A; is the rank of data matrix part covering spectral window of a ligand 340-540 nm. and the mutual influence between bound ligands, which is essential to G4 DNA in vivo function annotation, can be done for sites involved in cooperative ligand binding. Unfortunately the discovery of the sites identity is hindered when sites are involved in anticooperative binding. It is shown that the spectrometrically measured titration curves can be described ambiguously using the model of two different and independent binding sites as well as that of two mutually influencing identical sites in the case of anticooperative binding.

Conflicts of Interest
The author declares that they have no conflicts of interest.