2D-QSAR Study of Indolylpyrimidines Derivative as Antibacterial against Pseudomonas aeruginosa and Staphylococcus aureus: A Comparative Approach

A set of 15 indolylpyrimidine derivatives with their antibacterial activities in terms of minimum inhibitory concentration against the gram-negative bacteria Pseudomonas aeruginosa and gram-positive Staphylococcus aureus were selected for 2D quantitative structure activity relationship (QSAR) analysis. QSAR was performed using a combination of various descriptors such as steric, electronic and topological. Stepwise regression method was used to derive the most significant QSAR equation for predicting the inhibitory activity of this class of molecules. The best QSAR model was further validated by a leave one out technique as well as by the random trials. A high correlation between experimental and predicted inhibitory values was observed. A comparative picture of behavior of indolylpyrimidines against both of the microorganisms is discussed.


Introduction
Pseudomonas aeruginosa (PA), a gram-negative pathogen, has been known as a major cause of hospital acquired infection and antimicrobial resistance [1,2]. Pseudomonas aeruginosa is responsible for various infectious cases such as nosocomial pneumonia urinary tract infections, surgical wound infections, and bloodstream infections [3]. Structural differences exist between cell walls of gram-positive and gram-negative bacteria. Gram-positive bacteria have more peptidoglycan layers as compared to gram-negative bacteria. Therefore the cell wall of gram-positive bacteria is thicker than the cell wall of gram-negative bacteria.
Gram-negative bacterial cell wall is different from grampositive bacterial cell wall by having an outer membrane of lipoproteins that covers the peptidoglycan layer. The outer membrane of gram-negative bacteria is made up of phospholipids, lipoproteins, and lipopolysaccharides. The outer membrane is negatively charged and helps prevent the bacteria from being phagocytosed by macrophages. The outer membrane provides protection from effects of antibiotics, digestive enzymes, and heavy metals.
Many chemical entities work as antibacterial agents by inhibiting the DNA synthesis of cell wall by blocking the enzymes such as DNA gyrase and dihydrofolate reductase and even inhibiting enzymes processing the development of peptidoglycan layer [4,5].
Approach of antibacterial drug is initially a surface phenomenon. The wall of gram-positive and gram-negative bacteria will resist the surface interaction. Therefore there is a difference in antibacterial activity of chemical entity towards gram-positive and gram-negative bacteria. In the present work, we have made an attempt to differentiate the behavior of chemical entities towards gram-positive and gram-negative bacteria using QSAR molecular descriptors which explains surface phenomenon. This would have been possible to express in terms of Hammett parameters and pKa values of the compounds as descriptors, but our approach gives emphasis on calculated surface parameter to express the antibacterial activity. There are many equations in the literature expressing antibacterial activity using Hammett and pKa values but these equations are insignificant in virtual screening of molecular databases for finding significant antibacterial hits. The objective of the present investigation was to study the usefulness of QSAR in the prediction of the antibacterial activity of indolylpyrimidine derivatives against Pseudomonas aeruginosa (PA) and Staphylococcus aureus (SA) and understand how multiple linear regression (MLR) equations can explain the structural key points correlating to differential behavior in activity against both gram-positive and gramnegative strains.
The pharmaceutical importance of pyrimidine compounds lies in the fact that they can be effectively used as analgesic, anti-inflammatory, anticonvulsant, insecticidal, herbicidal, antitubercular, anticancer and antidiabetic agents. The indole ring is known to exhibit anti-inflammatory, antimicrobial and antifungal activities [6][7][8][9][10]. The fused ring system of substituted indolylpyrimidines is remarkably effective as antitumor and antibacterial activity [11,12]. The QSAR method requires data collection from the same laboratory experiment, molecular descriptor selection, QSAR model development and finally model validation. A QSAR study has predictive ability and even provides clues for mechanism of drug receptor interactions [13,14].

Biological Activity
Very few research articles are available from the same laboratory on indolylpyrimidines as antibacterial agents. Indolylpyrimidines tested were obtained from the studies reported by Panda and Chowdhary [15] where in vitro antibacterial activity against Pseudomonas aeruginosa and Staphylococcus aureus was carried out at uniform concentration of 100 g/mL and zone of inhibition is reported in distance unit, that is, millimeters (mm). The values are directly proportional since the more the distance in mm for zone of inhibition is, the more the potency would be. The distance in mm is used as dependent variable in QSAR study. Zone of inhibition against Pseudomonas aeruginosa (PAantibact) varies from 10 mm to 24 mm while, for Staphylococcus aureus, it varies from 13 mm to 31 mm (SAantibact). Figures 1 and  2 give the activity distribution plot for antibacterial activity against Pseudomonas aeruginosa and Staphylococcus aureus, respectively. The plot shows blue columns representing training set while brown columns represent test set distribution.

Data Sets
Due to constraints of data selection based on the same laboratory experimental studies, study with exact numerical values was selected for QSAR study. According to the variation in substituents at various positions in a set of indolylpyrimidine derivatives, molecules were divided into the training set and test set using sphere exclusion method. The training set comprised 15 derivatives (4a, 4c, 4e, 4g, 4i, 5c, 5f, 5g, 5h, 5j, 6a, 6b, 6d, 6i, and 6j) while the and test set comprised 9 derivatives (4d, 4f, 4h, 4j, 5a, 5d, 6c, 6e, 6f, and 6g) of indolylpyrimidines published by Panda and Chowdary [15], which have been shown to possess antibacterial activity against Pseudomonas aeruginosa. Training set Test set   The case of QSAR model studied for activity against Staphylococcus aureus has a training set comprising 14 molecules (4a, 4c, 4d, 4f, 5a, 5b, 5d, 5g, 5h, 6b, 6c, 6f, 6i, 6j) and test set comprising 6 molecules (4b, 5c, 5f, 6a, 6d, 6h). The test compounds were selected manually considering the activity distribution and structural diversity as compared to the training set. Chemical structures of indolylpyrimidines and corresponding biological data are shown in Table 1.

Molecular Modeling
All molecular modeling studies were performed by using Schrodinger software running on windows platform. Ligprep module was used to draw the 3D structures. The 3D structures were further cleaned up and subjected to energy minimization using conjugate gradient method using MMFF force field. The minimization was performed until the RMS gradient value reaches a value smaller than 0.1 kcal/mol ∘ A. Optimization was further performed using BFGS method until the RMS gradient attains a value smaller than 0.0001 kcal/mol ∘ A. The lowest energy structure of each molecule was subjected to calculate molecular descriptors.

Descriptor Generation
The numerical descriptors are responsible for encoding information of important features of the molecular structure and can be categorized into different properties such as electronic, geometric, hydrophobic and topological. Various molecular descriptors were calculated such as molecular weight, dipole moment and partition coefficient (ClogP), surface area descriptors, H-bond donor count, H-bond acceptor count, ionization potential and electron affinity. Among the set of equations, generated descriptors that explain surface phenomenon were retained as shown in Table 2. Biological activity and descriptor values were scaled to unit variance as shown in Table 3. Pearson's correlation matrix was used to select the suitable descriptors for MLR analysis. Pearson's correlation matrix  was performed on all descriptors by using "BuildQSAR" module available in Schrodinger software. Tables 4 and 5 show correlation matrix for the descriptors used in the resulting models for Pseudomonas aeruginosa and Staphylococcus aureus, respectively. The descriptors correlated above 0.5 were eliminated from the QSAR study. The correlated descriptor that duplicates the meaning of other descriptors was eliminated. ClogP was found to be correlated to FISA by 0.94 and to FOSA by 0.46, respectively. Correlation matrix shows that ClogP correlates well with FISA and PSA (Tables 4 and 5).  Since ClogP is unable to express charge of a molecule, therefore descriptors based on solvent accessible surface area such as FISA and FOSA were used to overcome this limitation [16,17]. Linear QSAR equations were developed by a stepwise addition of terms. Each descriptor was chosen as input for variable selection method such as stepwise addition method. The method selects the descriptor that contributes to the antibacterial activity of indolylpyrimidine derivatives. To reduce the variation in the biological data, stepwise equations were generated using autoscaling of the dataset.
MLR method only can be used when a relatively small number of molecular descriptors are used. The ratio of descriptor to number of molecules in a QSAR equation generated was kept as 1 : 5. Thus for training set of 15 molecules, 3 descriptors were optimal. However, PCR and PLS would also be used which allow using more descriptors, but too many descriptors may cause difficulties in model interpretability. Besides, using several factors (principal components or latent variables) can make model interpretation tedious. Validation is a crucial aspect of any QSAR analysis [18,19]. Predictive power of selected MLR equation is validated by the live one out (LOO) technique. The QSAR equations were validated by the calculation of the following statistical parameters: CV 2 ( 2 ), 2 , ran 2 , and std error (Tables 6a and 6b). Randomization was performed by randomly shuffling the dependent parameter and then generating the equation using MLR method for the same set of descriptors and training set molecule. The resulting 2 is denoted by ran 2 . If ran 2 is consistent and equivalent to 2 value of the QSAR model, then the QSAR equation is spurious. Consider where, where obs , calc , and mean are observed, calculated, and mean values, respectively, " " is the number of compounds, and " " is the number of independent parameters. The value PRESS is indicated by prediction sum of squares. The PRESS value can be used to calculate CV 2 ( 2 ), called crossvalidated 2 , which represents the prediction ability of the QSAR equation. This is a good way to validate the prediction of a regression equation. The CV 2 ( 2 ) value ranges from zero to one. To calculate PRESS, each observation is individually omitted once. The remaining − 1 observations are used to calculate a regression equation and value of the omitted observation is estimated. This is done " " times, once for each observation. The difference between the actual value ( obs ) and the predicted value ( calc ) is called the prediction error. The sum of the squared prediction errors is indicated as PRESS value. A compound is said to be outlier whose activity cannot be predicted by generated QSAR equation. The structural diversity of these molecules is responsible for their nonpredictability. Equation 1 in Table 6(a) gives outliers, which can be identified by score value [20]. score value is calculated by the following formula: where initially the mean is subtracted from every value and then the mean-shifted values are divided by the standard 6 Journal of Computational Medicine deviation ( ). score is a value that estimates in terms of the number of standard deviations the value above or below the mean of a data set.

Results
In QSAR equation development for Pseudomonas aeruginosa, the regression equation indicated dipole moment as the most significant in contribution to inhibitory activity (Equation 1, Table 6(a)) since it has the highest correlation with the activity. Molecular weight (mol MW) is an additional parameter to dipole, which significantly increases the correlation coefficient from 0.7910 to 0.8587. The selection of variable was such that it minimizes the mean squared error of prediction.
Similarly, the addition of a third parameter, FISA, also increased the correlation coefficient from 0.8587 to 0.8793. Other regression equations were also obtained just by altering third parameter FISA with donorHB or PSA and so forth. Tables 6(a) and 6(b) give the best equations (1 and 2) selected in comparison to 1.1 and 2.1, respectively, since the latter ones are having high standard error ( ) value and therefore are insignificant.
In Table 6(a), Equation 1, all the descriptors are directly proportional to the activity and it is highly correlated to dipole more than molecular weight. Similarly, in case of QSAR equation developed for activity against Staphylococcus aureus is shown in Table 6(b), PISA is a prominent parameter. Successive addition of WPSA or PSA and the third parameter EA (eV) enhanced QSAR equation significantly. is the number of molecules in training set, 2 is squared correlation coefficient and F-test is a variance-related statistical value that compares two models differing by one or more variable. The QSAR equation is supposed to be good if the F-test is above a threshold value. The statistical quality of the resulting models, as depicted in Table 6(a), is determined by , standard error (std error), and randomization test (ran 2 ) [21,22].
A data point was considered to be an outlier if its residual value exceeded two times the value of standard error of estimate of the model. It is noteworthy that all these equations were derived using the entire data set of compounds ( = 28) and the outliers were identified for both of the QSAR equations. For QSAR equation of Pseudomonas aeruginosa, outliers were obtained as 5b, 6h with score value of 4.938. While QSAR equation of activity against Staphylococcus aureus had outliers as 4e, 4g, 5j, 6e, 6g with score value of 3.938. The reason for the molecules being found as outliers was investigated. It was found that the descriptor value for these molecules was away from the range of descriptor values of the training set molecules [23]. Removal of these outliers has improved the statistics of the equations.
The best measure of reliability of a 2D-QSAR model is a high 2 , not just a high 2 that could be a result of overfitting to data. More often, a value of 2 > 0.5 is considered acceptable [24][25][26]. Self-consistency of the derived models was verified using the leave one out (LOO) process and the predictability of each model was assessed using cross-validated 2 , called 2 . Figures 3 and 4 shows plots of predicted versus experimentally observed inhibitory activity for training set and test set of indolylpyrimidines against PA respectively. While Figures 5 and 6 shows plots of predicted versus experimentally observed inhibitory activity for training set and test set of indolylpyrimidines against SA respectively. The plots for QSAR models of PA and SA show a very good fit in the range of 2 = 0.87 to 0.80. It indicates that these models can be successfully applied to predict the antibacterial activity of this class of molecules. Randomization studies show ran 2 of 0.2 and 0.3 for Equation 1 (Table 6(a)) and 2 (Table 6(b)), respectively; thus the equations are not chance correlations. Moreover, it was possible to use the reported QSAR models to predict the activity of analogous molecules for antibacterial activity against PA and SA. The  applicability domain of the derived QSAR models is restricted to substituted indolylpyrimidine derivatives.

Discussion
It clearly showed that QSAR Equation 1 in Table 6(a) obtained for antibacterial activity against PA includes dipole, mol MW, and third descriptor FISA. Thus the equation is a combination of size of molecule, polar nature, and hydrophilicity of molecule as shown in Table 6(a). While QSAR equation obtained for antibacterial activity against SA includes PISA, PSA, and third descriptor FOSA. Thus QSAR Equation 2 in Table 6(b) shows combination of hydrophobicity and electron density centers and weakly polar groups (halogens P and S).
Since both of the activities were conducted in the same laboratory and at the same concentration of drug, that is, 100 g/mL, the comparisons with respect to activity were possible. Equation 1 in Table 6(a) gives descriptors for PA activity such as dipole, molecular weight, and FISA. In QSAR Equation 1 of Table 6(a), all the descriptors are directly proportional to the activity. Dipole is highly correlated with the activity followed by molecular weight. It is observed that the higher the dipole and the higher the PA activity, the higher the FISA activity and the higher the PA activity and molecular weight to be in the range of 320 to 332. Overall, this equation gives the relationship of polarity of a molecule as the essential characteristics required for the antibacterial activity against PA.   In QSAR Equation 2 which is developed for SA activity, as shown in Table 6(b), PISA is a prominent parameter. Successive addition of WPSA and the third parameter EA (eV) enhanced the model significantly. In interpretation of SA activity, the following descriptors were found to have general observations. PISA is a (carbon and attached hydrogen) component of the SASA. The lower the PISA value the higher the SA activity. The lower value of PISA is due to presence of electronegative groups the higher SA activity is. Higher value of PISA is due to presence of protons connected to C, N, and O and has more influence at the R 2 position. WPSA is a weakly polar component of the SASA (halogens P and S). Sulfhydryl group and halogens are not essential but impart moderate activity. Among halogens, activity diminishes at R 2 as Br > F in the presence of S. Electron acceptor is an atom that has a more positive value of electron affinity and the electron donor atom has less positive electron affinity. Electron affinity follows the trend of electronegativity. Fluorine (F) has a higher electron affinity than oxygen and chlorine most strongly attracts extra electrons. Atoms whose anions are stabler than neutral atoms have a greater electron affinity. Electron withdrawing group at R 2 is having more electron affinity and electron donating groups have moderate electron affinity which seems to be responsible for high to moderate activity, respectively. The activity would be higher for a combination of anion and proton donor at R 1 or R 2 position or vice versa, respectively. Overall, this equation gives the relationship of hydrophobicity of a molecule as essential characteristics required for the antibacterial activity against SA. Compounds 4b, 4j, 6f, 6h, and 6j were active at both the gram-positive and gram-negative bacterial strains. This can be explained by using both of the QSAR equations. The compounds have high electron affinity and high dipole along with high values of FISA for activity against PA. The molecules have sufficient PISA contribution and low values of WPSA. Any extension to substituted phenyl by bulky groups would be favorable for activity against SA.
Polar surface area (PSA) descriptor represents drug transport properties such as intestinal absorption and blood-brain barrier penetration [27]. It is the sum of the contributions to the molecular (usually van der Waals) surface area of polar atoms such as oxygen, nitrogen, and their attached hydrogens. The polarity and polarizability of a molecule have been well known to be important for description of various physicochemical properties and chemical reactivity of molecule. Molecular polarity accounts for chromatographic retention on a polar stationary phase [28]. Most often, dipole moment is used which reflects only global polarity of molecule. Local polarities can be represented by atomic charges in the localized regions of the molecule.

Conclusions
The 2D-QSAR study on a series of indolylpyrimidine compounds showed that the presence of functional groups that balance the dipole and hydrophobicity would lead to increase in the activity of indolylpyrimidines against both PA and SA. The regression equation indicates that presence of polar components weighs more than hydrophobicity in the R 2 position of indolylpyrimidines for activity against Pseudomonas aeruginosa, while absence of protonated electronegative bulky groups at R 2 is suitable for antibacterial activity against Staphylococcus aureus.

PA:
Pseudomonas aeruginosa SA: Staphylococcus aureus MLR: Multiple linear regression QSAR: Quantitative structure activity relationship MMFF: Merck molecular force field RMS: Root mean square LOO: Leave one out.