3 D-QSAR Study on a Series of VEGFR-2 Kinase Inhibitors : 3-Pyrrole Substituted Indolin-2-Ones Compounds

The vascular endothelial growth factor receptor-2 kinases (VEGFR-2) are attractive targets for the development of anticancer agents. Self-organizing molecular field analysis (SOMFA) (a simple three-dimensional quantitative structure-activity relationship (3DQSAR) method) is used to study the structure-activity correlation of 3-pyrrole substituted indolin-2-ones VEGFR-2 inhibitors. The statistical results, cross-validated r2 CV (0.5267) and non-cross-validated r 2 (0.5623), show a reliable predictive ability. The contributions of shape and electrostatic fields are 42.7% and 57.3%, respectively. Analysis of SOMFA models through shape and electrostatic grids provide useful information for the design and optimization of new 3-pyrrole substituted indolin-2-one based VEGFR-2 inhibitors.


Introduction
Cancer is the second leading cause of death in the world.Researchers have found that receptor tyrosine kinases (RTKs) play an important role in oncogenic transformation of cells [1].VEGFR-2 (vascular endothelial growth factor receptor-2), as a member of the family of RTKs, is widely investigated in the pathogenesis of several disorders [2][3][4][5][6][7].It is not only widely distributed in the organization of vascular endothelial cells but also distributed in some tumor cells; it plays an important role in the cell signalling of VEGF and tumor proliferation [8].Recent research has shown that the blockade of VEGFR-2 signalling by small molecular inhibitors to the kinases domain can inhibit the growth of solid tumors [9][10][11][12].Therefore, inhibition of the VEGFR-2 has become an important research direction in the treatment of cancers [13].
The self-organizing molecular field analysis (SOMFA) [24] is a simple 3D-QSAR technique, which has been developed by Robinson et al.The method has similarities to both comparative molecular field analysis (CoMFA) [25] and molecular similarity studies.Like CoMFA, a grid-based approach is used; however, SOMFA only uses steric and electrostatic maps; which are related to interaction energy maps; no probe interaction energies need to be evaluated.The weighting procedure of the grid points by mean-centered activity is an important ingredient of the SOMFA procedure.Like the similarity methods, it is the intrinsic molecular properties, such as the molecular shape and electrostatic potential, which are used to develop the QSAR models.
A SOMFA model could suggest a method of tackling the all-important alignment, which all 3D-QSAR methods have faced.The inherent simplicity of this method allows the possibility of aligning the training compounds as an integral part of the model derivation process and of aligning prediction compounds to optimize their predicted activities.The purpose of this paper is to describe the application of self-organizing molecular field analysis (SOMFA) on a series of 3-pyrrole substituted indolin-2-one compounds, a novel class of VEGFR-2 kinases inhibitors.Therefore, our main objective is to provide some useful structure-activity information by SOMFA analysis and design novel inhibitors of VEGFR-2 kinases in the hope that these molecules will be developed into powerful anticancer agents.

Materials and Method
2.1.Data Set.The structures and bioactivities of 3-pyrrole substituted indolin-2-one compounds 1-34 were chosen from the papers by Sun et al. [26,27].They were classified into three groups according to the common parent structure, shown in Table 1.
Thirty-four compounds were divided into two sets.Compounds 1-25 were used as a training set while the remaining 9 compounds were used as a test set.The training set was used to build SOMFA models and the test set was used to evaluate the models.Their activities shown in Table 1 were described by IC 50 (M), which were converted into pIC 50 .Higher pIC 50 value indicated greater inhibitory activity.

Molecular Modeling and Alignment.
The 3D structures of all analogues were constructed with ArgusLab 4.01 software [28], running on an AMD Athlon 64 X2 Dual Core Processor 5000+ CPU/Microsoft Win XP platform.
Unless otherwise indicated, parameters are default.Full geometry optimizations are performed by MM2 method in the ArgusLab software.The conformations are then performed by RMS overlapping and fitted with compound 4 as a reference.Two different alignments are selected to define overlap.The first superposition of molecules, which are optimized by energy minimization and overlaid based on common structure using alignment A, has been shown in Figure 1.The second superposition of molecules, which are optimized by energy minimization and overlaid based on common structure using alignment B, has been shown in Figure 2.
For all analogues, the final active conformations search has also been performed by dock into active site method in eHiTS software [29].The crystal structure of VEGFR-2 in the complex with its inhibitor 00J (PDB entry code 2XIR, resolution: 1.5 Å) is downloaded from RCSB Protein Data Bank (http://www.rcsb.org/pdb/home/home.do).After removing inhibitor 00J and water molecules and adding all the hydrogen atoms, the site where 00J binds with protein was defined as the active binding site.

Journal of Chemistry
The third superposition of molecules, which are obtained based on the binding conformation and their alignment in active site of VEGFR-2, has been shown in Figure 3.
According to the three alignments of these analogues, they are then performed using SOMFA analysis.Using VEGA software [30,31], the electrostatic parameters of overlaid analogues are assigned charges by three different Hamiltonian semiempirical methods (AM1, PM3, and MNDO).After calculation, they are converted into CSSR file format, the only file format which the SOMFA2 program can accept to process a SOMFA analysis.
For all of the studied compounds, shape and electrostatic potential are generated.To sum up the predictive power of these two properties into one final model, we combine their individual predictions using a weighted average of the shape and electrostatic potential based QSAR, using a mixing coefficient ( 1 ) as illustrated in the following [24]: Clearly, multiproperty predictions could have been obtained through multiple linear regression.
Using (1) instead gives greater insight into the resultant model by allowing the study of the variation in predictive power with different values of  1 .
With the highest value of  2 , the SOMFA models then are derived by the partial least squares (PLS), implemented in SPSS software [34] with cross-validation.
The predictive ability of the model is quantitated in terms of  2  CV which is defined in the following: where PRESS = ( pred −  actual ) and SD = ( actual −  mean ).SD is the sum of squares of the difference between the observed values and their meaning and PRESS is the prediction error sum of squares.The final models are constructed by a conventional regression analysis with the optimum value of mixing coefficient ( 1 ) being equal to that yielding the highest  2 and  2 CV value according to (2).

Results and Discussion
SOMFA, a novel 3D-QSAR methodology, is employed for the analysis with the training set composed of 25 various compounds, whose biological activities have been known.Statistical results of 27 SOMFA models are listed in Table 2.Among the 27 models, the 20th model is the best predictive model, which showed cross-validated  2  CV value of 0.5267, non-cross-validated  2 value of 0.5623, standard error of 0.5165, and  value of 41.11, proved a good statistical correlation and predictive ability.It was also indicated that the SOMFA model was reliable and able to predict activities of new analogues as well.According to the value of  1 , the contributions of steric and electrostatic field are 42.7% and 57.3%, respectively.Consequently, electrostatic field had a slightly bigger effect than that of steric field on the bioactivities of the analogues.
Taking alignments into account, we found that the alignment had a profound influence on the result.The model which was built by docking-based alignment showed the highest value of  2 and  2 CV of the three alignments, because it reflected a true pharmacophoric conformation docked in the cavity of receptor, indicating that it was more reliable and accurate.
Taking charges into account, according to alignment B and C, we found that the results were less sensitive to the quantum chemistry charge.That is to say, it made a slight difference that we use various semiempirical methods, for example, AM1, PM3, and MNDO.
Taking resolution of grid into account, we found that the values of  2 and  2 CV presented in model 10-18 indicated the correlation and reliability of models, 0.5 Å > 1.0 Å > 1.5 Å, while in model 1-10, 18-27, 1.0 Å > 0.5 Å > 1.5 Å, because it is known that a finer grid resolution produced a better correlation and reliability.However, if the grid is far too fine, the amount of noise in the data is increased and thus the model is inaccurate.In model 20, 1.0 Å grid resolution produced the best result.
Taking the values of  1 into account, we found that the values of models which were built by alignment A showed higher values.That is to say, the contribution of steric field  is four times more than that of electrostatic field.Due to the structure-based alignment, they were given such a high contribution of steric field.While according to alignment B and alignment C, we found that steric field had almost the same influence to the electrostatic field, indicating that the steric and electrostatic interactions of molecules could be two equally important factors for the bioactivities of the analogues.
The experimental and predicted activities of training set and test set are reported in Table 3. Figure 4 shows the correlation between experimental and predicted activities of SOMFA model for the training set and test set.
It is known that the best way to validate the 3D-QSAR model is to predict bioactivities of compounds in test set.The SOMFA analysis of the test set composed of 9 compounds is reported in Figure 4, indicating a satisfying linear correlation  and moderate difference between experimental and predicted activities.From Figure 4, we found that compound 32 had a large residual and thus was classified as outlier.There may be a complicated relationship between structure and activity in this compound.On the whole, the model performed well in the activity prediction of most of the test compounds.SOMFA calculation for both shape and electrostatic potentials is performed, then combined to get an optimal coefficient  1 = 0.427 according to (1), indicating that the electrostatic field contribution is of a little high importance.The master grid maps derived from the best model are used to display the contribution of shape and electrostatic potentials.The master grid maps give a direct visualization of which parts of the compounds differentiate the activities of compounds in the training set under study.The master grid also offers an interpretation as to how to design and then synthesize some novel compounds with much higher activities.The visualization of the shape master grid and electrostatic master grid of the best SOMFA model is shown in Figures 5 and 6, respectively, with compound 4 (sunitinib) as the reference.
Each master grid map is colored in two different colors for favorable and unfavorable effects.In other words, the electrostatic features are red (more positive charge increases activity or more negative charge decreases activity) and blue (more negative charge increases activity or more positive charge decreases activity), and the shape features are red (more steric bulk increases activity) and blue (more steric bulk decreases activity), respectively.
As shown in Figure 5, the shape master grid shows redcolored regions where steric bulk enhances activity and bluecolored regions where steric bulk detracts from activity.We found that there was a high density of blue lattice points surrounding 6-position of indolinone ring and 4-position of pyrrole ring, which suggested that more bulky substituents in these areas would decrease the bioactivities.We also found a big region of red lattice points appearing near 2-position and diethylamine group, which suggested that more bulky substituents in these areas would remarkably increase the bioactivities.
As shown in Figure 6, the electrostatic potential master grid shows red-colored regions where increased positive charge is favorable for bioactivities and blue-colored regions where increased negative charge is favorable for bioactivity.We found that there was a high density of red lattice points  surrounding indolinone ring and pyrrole ring, suggesting that reducing electronic density would increase bioactivities.While diethylamine group surrounded by blue lattice points suggested that negatively charged substituents would increase the bioactivities.

Conclusion
In this study, according to different alignments, 3D-QSAR analysis was carried out to construct a highly accurate and predictive SOMFA model ( 2 CV = 0.5267,  2 = 0.5626).The contributions of steric and electrostatic fields are 42.7% and 57.3%, respectively, indicating that the electrostatic interaction of molecules could be a little more important factor for the bioactivities of the analogues.The final grid maps help to better interpret the structure-activity relationship of these analogues.Generally, the small-sized electropositive potential substituents (e.g., methyl, ethyl, and aliphatic amines) at the indolinone ring and pyrrole ring increase the activity, and the big-sized electronegative potential substituents (e.g., benzene ring with electron-withdrawing groups and pyridine ring) at the diethylamine group increase the activity.All analyses of SOMFA models may provide some useful information in the design of new analogues of sunitinib.

Figure 1 :
Figure 1: Superposition of compounds using alignment A, based on common structure.(a) Common fragment for superposition; (b) superposition of compounds in training and test sets.

Figure 2 :
Figure 2: Superposition of compounds of using alignment B, based on common structure.(a) Common fragment for superposition; (b) superposition of compounds in training and test sets.

Figure 3 :
Figure 3: Superposition of compouds of using alignment C, based on docking.(a) Thirty-four analogues in the active site of VEGFR-2; (b) superposition of compouds in training and test sets.

Figure 4 :
Figure 4: Correlation between experimental and predicted activities of SOMFA model for the training set and test set.

Figure 5 :
Figure 5: The shape master grid with compound 4. Red: steric bulk enhances the activity in this region.Blue: steric bulk detracts from activity in this region.

Figure 6 :
Figure6: The electrostatic potential master grid with compound 4. Red: positive charge is favoured in this region or negative charge is disfavoured in this region.Blue: negative charge is favoured in this region or positive charge is disfavoured in this region.

Table 1 :
Structures of 34 compounds and their VEGFR-2 inhibitory activities.

Table 2 :
Statistical results of the various SOMFA models.

Table 3 :
The comparison of experimental and predicted activities of 34 compounds in training set and test set.