A Comparative Study of Two Quantum Chemical Descriptors in Predicting Toxicity of Aliphatic Compounds towards Tetrahymena pyriformis

Quantum chemical parameters such as LUMO energy, HOMO energy, ionization energy (I), electron affinity (A), chemical potential (μ), hardness (η) electronegativity (χ), philicity (ωα), and electrophilicity (ω) of a series of aliphatic compounds are calculated at the B3LYP/6-31G(d) level of theory. Quantitative structure-activity relationship (QSAR) models are developed for predicting the toxicity (pIGC50) of 13 classes of aliphatic compounds, including 171 electron acceptors and 81 electron donors, towards Tetrahymena pyriformis. The multiple linear regression modeling of toxicity of these compounds is performed by using the molecular descriptor log P (1-octanol/water partition coefficient) in conjunction with two other quantum chemical descriptors, electrophilicity (ω) and energy of the lowest unoccupied molecular orbital (ELUMO). A comparison is made towards the toxicity predicting the ability of electrophilicity (ω) versus ELUMO as a global chemical reactivity descriptor in addition to log P. The former works marginally better in most cases. There is a slight improvement in the quality of regression by changing the unit of IGC50 from mg/L to molarity and by removing the racemates and the diastereoisomers from the data set.


Introduction
The quantitative structure-activity relationship (QSAR) analysis is aimed at deriving empirical models that relate the activity of chemical compounds to their structure [2].The underlying assumption is that the chemical structure of a compound implicitly determines its behavior towards biological systems.Appropriate structural or functional descriptors are used to represent the chemical structure and the analysis results in a mathematical model describing the relationship between the chemical structure and the biological activity.Different types of descriptors have been employed, which are of constitutional, geometrical, topological, electrotopological, steric, electrostatic, electronic, and quantum chemical origins.The most essential scientific purpose of developing a QSAR model includes: (1) understanding the mechanism of interaction between compounds and biological systems, (2) gaining information about a dose range for the biological effect of a chemical compound which in turn can be useful in the experimental drug design and toxicity testing, and (3) the prediction of the activity of new chemical compounds.Further, QSAR models can save time and experimental resources for synthesizing and biological testing of a large number of compounds and offer possibility of reduction or replacement of animal use in research and toxicity testing.Various statistical methods are used in QSAR analysis.These methods include regression analysis, partial least squares, classification trees, and neural networks [3].
For the development of a useful QSAR model, the foremost important thing is to assess the mode of biochemical action of the toxicant on the biological system, at cellular and molecular levels.There are many approaches to evaluate the mechanistic basis of toxicity.Some of those methods are: in vitro tests [4], joint toxicity tests [5], fish acute toxicity syndromes [6], and the mechanism evaluated on the basis of structural parameters.The mechanism of toxicity ranges from noncovalent effects to electrophilic one involving covalent binding with biological macromolecules.Among varied modes of toxic action, the narcotic mechanism involves the nonspecific non-covalent reversible interactions of the toxicants with cell membranes [7].Nonpolar narcotics are neutral nonreactive compounds such as aliphatic alcohols, ketones, ethers, and so forth, whose toxic effect is assumed to be determined mainly by the lipid solubility [8].Polar narcotics are less inert aromatic chemical species, such as phenols and anilines, which usually posses a hydrogen donor group [9].A large number of QSAR studies of acute toxicity have been reported in the literature [10].Many authors [11][12][13][14][15] have reported quantitative relationship between toxicity and hydrophobicity, wherein the hydrophobicities are represented by octanol-water partition coefficient (log P oct values) or octanol-water distribution coefficient (log D oct values) as descriptors.These model relationships are assumed to represent a "baseline effect," whereby no completely soluble and nonvolatile chemical compound can exhibit toxicity less than that predicted by such relationships.Schultz et al. [16] have investigated the toxicity of a large data set of 500 aliphatic chemicals towards the protozoan Tetrahymena pyriformis in terms of their IGC 50 values using octanolwater partition coefficient.Some authors [17] have reported that dimyristoyl phosphatidylcholine-water partition coefficients give better statistical fit than octanol-water partition coefficients in QSAR inhibition of T. pyriformis population growth for nonpolar narcotics, polar narcotics, and esters.Roberts and Costello [18] have developed QSAR models for the toxicity prediction of 18 nonpolar and polar narcotics to the fish Poecilia reticulata using log P oct (octanol-water partition coefficient) and log P MW (membrane-water partition coefficients).Freidig and Hermens [19] have reported QSAR models for the toxicity prediction in the cases of Poecilia.reticulata (14 day LC50) and Pimephales promelas (4 day LC50).These authors have developed separate one parameter QSAR models for a group of narcotics and reactive compounds, using log P oct as a descriptor for the narcotics and an electronic descriptor for the reactive compounds.Response-surface approach has been widely used for the development of mechanistically comprehensible QSAR models for toxicity.The basic premise of this approach is that the toxic action depends on the biouptake and bioavailability as well as on the electrophilic reactivity of the toxicant at an active site.Researchers have employed log P oct or log D oct as a descriptor encoding biouptake and availability and energy of the lowest unoccupied molecular orbital (E LUMO ) or maximum acceptor superdelocalisability (A max ) as descriptor encoding the electrophilic reactivity.This approach has been applied to different species, including the bacterium Vibrio fischeri [20], the protozoan Tetrahymena pyriformis [21,22], the yeast Saccharomyces cerevisiae [23], the mould Aspergillus nidulans [24], the algae Scenedesmus obliquus [25] and chlorella vulgaris [24], the plant Cucumis sativus [26,27], and mice [24].The response surface approach has been extended by adding additional indicator variables and other parameters to improve the statistical fit of the models [28,29].Our group has carried out toxicity analysis of a diverse class of systems using conceptual density functional theorybased reactivity/selectivity descriptors like electronegativity, hardness, electrophilicity, and so forth.It has been shown that the toxicity values for a wide variety of polyaromatic hydrocarbons like polychlorinated biphenyls (PCBs), polychlorinated dibenzofurans (PCDFs), polychlorinated dibenzo-p-dioxins (PCDDs) and chlorophenols (CP), as well as arsenic derivatives, and several aliphatic and aromatic toxic molecules, calculated using various conceptual DFT descriptors, especially global and local electrophilicities, correlate well with their corresponding experimental toxicity values [30][31][32][33][34][35][36][37][38][39].In an earlier study, we have reported an atom counting and electrophilicity-based QSTR protocol for predicting the toxicity of aliphatic compounds towards a protozoan, Tetrahymena pyriformis [40].In the present work, we develop QSAR models for toxicity of several classes of aliphatic compounds using quantum chemical descriptors, along with the molecular descriptor log P. We attempt to make a comparative evaluation of two quantum chemical parameters namely, electrophilicity index (ω) and energy of the lowest unoccupied molecular orbital (E LUMO ), as useful toxicity predicting descriptors towards Tetrahymena pyriformis.We intend to check whether the electrophilicity index (ω) is a marginally better toxicity predicting descriptor than LUMO energy when used in addition to log P (a hydrophobicity encoding descriptor).

Computational Method
All the geometries are optimized using the GAUSSIAN 03 set of codes [41].A hybrid density functional theory, using the Becke exchange functional [42] and the correlation functional by Lee et al. [43] and 6-31G(d) basis set are used for the optimization of all the molecules studied in the present work.Frequency analysis is performed on the optimized structures at the same level of theory, and no imaginary frequencies are found.The quantum chemical descriptors such as electron affinity, ionization potential, chemical potential, hardness, and electrophilicity are calculated directly from orbital energies of the optimized geometries.

Theoretical Background
3.1.Quantum Chemical Descriptors.Electrophilicity index [44][45][46] is defined (ω) as a measure of the decrease in energy due to the maximal transfer of electrons from a donor to an acceptor system and is given as where μ and η are the chemical potential [47] and hardness [48], respectively.Chemical potential and hardness can be expressed in terms of ionization energy (I) and electron affinity (A) as given below ( Using Koopmans' approximation, I and A can be expressed in terms of the energies of the highest occupied (E HOMO ) and the lowest unoccupied molecular orbital (E LUMO ) as The condensed Fukui functions are defined as , for electrophilic attack, where q k is the associated electronic population on atom k in a molecule.The philicity at any atomic site k is defined as [49] where (α = +, −, and 0) represent local philic quantities describing nucleophilic, electrophilic, and radical attacks, respectively.

Regression Analysis.
The regression analysis is a statistical method wherein a functional dependence of a dependent variable on a set of other independent variables is determined.In linear regression analysis, this dependence has a linear form, which can be expressed as; where a 1 , a 2 • • • a p are regression coefficients, b is the intercept, X 1 , X 2 , • • • X p are independent variables, and Ý represents expected values of the dependent variable by the regression model.
The above equation represents a hyperplane in the pdimensional space, where p is the number of independent The general regression equations obtained by using oneparameter and two-parameter models for all the aliphatic acceptors and donors (removing the racemates and the diastereomers from the data set and by changing the unit of IGC 50 from mg/L to molarity).variables in the equation.This regression equation can be used for predicting values of the dependent variable from the values of the independent variable.For determining the quality of the statistical fit, the Pearson correlation coefficient (r) (for regression with single independent variable) or squared coefficient of determination (R 2 ) is used, which have the following mathematical forms where TSS is the total sum of squares, represented as (Y − Y mean ) 2 and has N −1 degrees of freedom, ESS is the explained sum of squares, represented as ( Ý − Y mean ) 2 and has p degrees of freedom, and RSS is the residual sum of squares, represented as (Y − Ý ) 2 and has N − p − 1 degrees of freedom.Y is the observed value of the dependent variable, Ý is the predicted value of the dependent variable by the regression model, Y mean is the mean value of the dependent variable, N is the number of observations, and p is the number of independent variables included in the regression model.If the R 2 value is greater than 0.5, the explained variance by the model (ESS) is larger than the unexplained variance (RSS).The regression equation is considered efficient when the value of R 2 is nearer to 1.The number of independent variables in the equation and the size of the data sample affect the value of R 2 .When a new variable is added to the regression equation, the value of R 2 may increase or remain same, even if the added variable does not contribute to reducing of the unexplained variance in the dependent variable.Therefore, another statistical parameter, adjusted R 2 value, is used, which is given by the equation where, N is the sample size and p is the number of independent variables.The value of R 2 adj decreases if an added variable to the equation does not reduce the unexplained variance.
The uncertainty in the model is represented as the standard error of estimate, represented by s where RMS is the residual mean square.The standard error of estimate reflects the dispersion of the observed values of the dependent variables about the regression line.Larger values of s mean worse statistical fit of the model and less reliability of the prediction.The statistical significance of a regression equation can be assessed by the means of the Fisher (F) value where EMS is the explained mean square given as ESS/p.A regression equation is considered to be statistically significant if the observed F value is greater than a tabulated value for the chosen level of significance and the corresponding degrees of freedom of F. The degrees of freedom of F are equal to p and N − p − 1.
A reliable and transparent regression analysis must follow certain basic assumptions, which can be briefly enumerated as follows: (1) The response variables are not dependent on one another.
(2) The relationship between the dependent and the independent variable(s) is linear.
(3) The residuals (predicted minus observed values of the dependent variable) must follow the normal distribution.
(4) The variance of the residuals is constant for all values of the independent variables.(5) The independent variables should not show multicollinearity (high level of intercorrelation) and redundancy.

Results and Discussion
The quantum chemical descriptors like LUMO energy, HOMO energy, ionization energy, electron affinity, chemical potential, hardness, philicity, and electrophilicity of a series of aliphatic compounds, are calculated from optimized geometries, using (1)-( 5) (see Table S1 The toxicity values based on these equations, along with the experimentally observed toxicity values are given in Table S3 and Table S4 (see, Supplementary Materials available online at doi: 10.1155/2010/545087).Though, the two parameter equations employing the log P and either of the electronic descriptors (ω or E LUMO ) show slightly better correlation as compared to one-parameter model, the overall toxicity predictability of these equations is poor, as is evident from values of the correlation coefficients R 2 adj and the calculated toxicity values.It is particularly evident that these generalized equations cannot be used as model equations for accurately predicting the toxicities of the aliphatic compounds.
In order to obtain better predictability and correlation, a stepwise regression analysis is performed by taking each class of chemical compounds separately.The experimentally observed and the calculated toxicity values (pIGC 50 ), along with various descriptors, are presented in Tables 1 and 2 for a set of electron acceptors and a set of electron donors, respectively.The corresponding one-parameter model regression equations (log P, ω, and E LUMO ) and two-parameter model regression equations ((log P, ω) and (log P, E LUMO )) are shown in Table 3.As is evident from Table 3, the one parameter regression equation based on E LUMO alone does not show any meaningful correlation between the experiment and the calculated toxicity values.The regression equations based on ω show improved correlation coefficients over the equations based on E LUMO for all the electron acceptors and electron donors, except for unsaturated alcohols.However, the adjusted R 2 value is less than 0.70 for diols, acetylenic alcohols, unsaturated alcohols, and amines.For all the electron donor aliphatic compounds, the R 2 adj values are negligible, with the exception of amino alcohols.It is remarkable to note that one-parameter regression equations obtained by using log P as an independent variable shows an overall sufficiently improved correlation, compared to that using the electronic descriptors like the electrophilicity (ω) and E LUMO .This result is expected since the hydrophobicity and lipophilicity of the chemical compounds mainly govern their toxic actions at cellular and molecular levels.However as a whole, the stepwise one-parameter model regression analysis based on electronic parameters or log P shows that neither a global electrophilicity descriptor (E LUMO or ω) nor a hydrophobicity descriptor (log P) alone is enough for modeling the toxicity of these compounds with a sufficiently high predictive power.
To improve the predictability of the regression equations and to assess the relative usefulness of the two-quantum descriptors, a two-parameter regression analysis was performed.The results indicate that there is an overall better correlation between the experimental toxicity values and the values are slightly better when a set of E LUMO and log P values are used in the regression equation as compared to a set of electrophilicity (ω) and log P, except in case of amino alcohols.The calculated toxicity values (pIGC 50 ) along with the experimental values, for all the 13 groups of aliphatic compounds studied are reported in Tables 1 and 2.
These results suggest that electrophilicity index (ω) is a marginally better chemical reactivity descriptor in larger cases as compared to E LUMO .We may recommend the toxicity prediction using either of them along with log P. But, a generalized pattern to that effect needs further validation, probably by considering a wide variety of chemical toxicants.Although it is expected that a mechanistic basis of the toxic action may be envisaged from the descriptors used, one should not take the toxicity predictions based on these model relationships without a bit of caution.
As suggested by the Referee, we change the unit of IGC 50 from mg/L (as used in [1,16]) to molarity and remove all the racemates and diastereoisomers (also used in those references) from the data set.Respective regression equations are provided in Tables 4 and 5, and the plots of calculated versus observed pIGC 50 values are presented in Figures 3 and 4. For the individual groups, the correlation improves in most cases.The overall correlation improves in the cases of both electron donors and acceptors, and the overall conclusion remains the same.It may be suggested that log P and ω should be used to predict the toxicity of various aliphatic electron donors and acceptors towards Tetrahymena pyriformis.

Conclusions
Toxicity of aliphatic compounds considered in this study cannot be completely explained on the basis of the hydrophobicity and the lipophilicity considerations alone.The model QSAR equations with improved toxicity predictability can be developed by taking the electrophilic property of the molecular system into consideration in addition to the hydrophobicity.The "response surface" model proposed by the earlier authors has used mostly E LUMO as the global parameter for the electrophilic reactivity.The results of this study clearly show that electrophilicity index (ω) and E LUMO are equally capable of describing the contribution of toxicity of aliphatic compounds due to chemical reactivity.The electrophilicity index seems to be a marginally more efficient descriptor for the toxicity prediction as compared to E LUMO .Better QSAR models are obtained by removing the racemates and the diastereoisomers from the data set and by changing the unit of IGC 50 from mg/L to molarity, as suggested by the Referee.

Figure 1 :
Figure 1: Observed and calculated pIGC 50 values (a) using electrophilicity index (ω) and log P descriptors and (b) using E LUMO and log P descriptors in a two-parameter regression model, for a complete set of aliphatic acceptors.

Figure 2 :
Figure 2: Observed and calculated pIGC 50 values (a) using electrophilicity index (ω) and log P descriptors and (b) using E LUMO and log P descriptors in a two-parameter regression model, for a complete set of aliphatic donors.

Figure 3 :
Figure 3: Observed and calculated pIGC 50 values (a) using electrophilicity index (ω) and log P descriptors and (b) using E LUMO and log P descriptors in a two-parameter regression model, for a complete set of aliphatic acceptors (removing the racemates and the diastereomers from the data set and by changing the unit of IGC 50 from mg/L to molarity).

Figure 4 :
Figure 4: Observed and calculated pIGC 50 values (a) using electrophilicity index (ω) and log P descriptors and (b) using E LUMO and log P descriptors in a two parameter regression model, for a complete set of aliphatic donors (removing the racemates and the diastereomers from the data set and by changing the unit of IGC 50 from mg/L to molarity).

Table 1 :
Electrophilicity (ω, eV), energy of lowest unoccupied molecular orbital (E LUMO , au), log P, and observed and calculated values of pIGC 50 for the complete set of aliphatic acceptor compounds with Tetrahymena pyriformis.

Table 2 :
Electrophilicity (ω, eV), energy of lowest unoccupied molecular orbital (E LUMO , au), log P, and observed and calculated values of pIGC 50 for the complete set of aliphatic donor compounds with Tetrahymena pyriformis.

Table 3 :
Regression models for different groups of aliphatic compounds for estimating their toxicity towards Tetrahymena pyriformis.