Investigation of Antileishmanial Activities of Acridines Derivatives against Promastigotes and Amastigotes Form of Parasites Using Quantitative Structure Activity Relationship Analysis

In a search of newer and potent antileishmanial (against promastigotes and amastigotes form of parasites) drug, a series of 60 variously substituted acridines derivatives were subjected to a quantitative structure activity relationship (QSAR) analysis for studying, interpreting, and predicting activities and designing new compounds by using multiple linear regression and artificial neural network (ANN) methods. The used descriptors were computed with Gaussian 03, ACD/ChemSketch, Marvin Sketch, and ChemOffice programs. The QSAR models developed were validated according to the principles set up by the Organisation for Economic Co-operation and Development (OECD). The principal component analysis (PCA) has been used to select descriptors that show a high correlation with activities. The univariate partitioning (UP) method was used to divide the dataset into training and test sets. The multiple linear regression (MLR) method showed a correlation coefficient of 0.850 and 0.814 for antileishmanial activities against promastigotes and amastigotes forms of parasites, respectively. Internal and external validations were used to determine the statistical quality of QSAR of the two MLR models. The artificial neural network (ANN) method, considering the relevant descriptors obtained from the MLR, showed a correlation coefficient of 0.933 and 0.918 with 7-3-1 and 6-3-1 ANN models architecture for antileishmanial activities against promastigotes and amastigotes forms of parasites, respectively. The applicability domain of MLR models was investigated using simple and leverage approaches to detect outliers and outsides compounds. The effects of different descriptors in the activities were described and used to study and design new compounds with higher activities compared to the existing ones.


Introduction
For hundreds of years, Leishmaniasis, a disease caused by a number of species of protozoan parasites belonging to the genus Leishmania, is recognized as an important public health problem throughout the world [1][2][3]. Currently, the leishmaniases are considered to be endemic in 88 countries and, according to World Health Organization (WHO) [4], twelve million people are infected, with about two to three million new cases each year, and 350 million people are under risk of infection; it is a major public health problem particularly in Latin America, Africa, and Asia [5][6][7][8][9]. To date, no vaccine against any clinical form of Leishmaniasis has been commercialized and treatment relies only on chemotherapy, which has been based on the use of pentavalent antimonial drugs. Other medications, such as pentamidine and amphotericin B, have been used as alternative drugs for resistant parasites. With the emergence of some resistant 2 Advances in Physical Chemistry strains, the toxicity of current drugs, severe side effects, and high cost and/or restricted therapeutic spectrum, a need for development of new and safer drugs is warranted [2,3,10]. A great number of natural and synthetic compounds have been tested in the past years in antileishmanial assays. Their structures are diverse and often contain nitrogen heterocycles such as quinolines, pyrimidines, acridines, phenothiazines, and indoles [11][12][13].
Many experiments have been performed with the compounds bearing the heterocyclic ring structures to explore their effectiveness against Leishmania. These studies suggested their similar pharmacophoric feature of the heterocyclic scaffold as a potential target for drug discovery of antileishmanial drugs [14].
Leishmania parasites exist in two forms, one is promastigotes and the other is amastigotes. The promastigotes are flagellated and found in sand fly, while the amastigotes are ovoid and nonflagellated form of Leishmania [15]. Antileishmanial activity is performed against promastigotes and then amastigotes forms of parasites. Heterocyclic system may also be formed by fusion with other rings, either carbocyclic or heterocyclic.
Since their discovery in the 1880s, acridines family have demonstrated a broad spectrum of pharmacological properties [16]. First employed as antibacterial agents during the beginning of the twentieth century [17]. They have been rapidly revealing interesting antiproliferative activities against both protozoa and tumor cells [18,19]. Consequently, they have been extensively used in antiparasitic chemotherapy and a wide range of new acridines derivatives have been synthesized and successfully assessed for their antileishmanial activities [20,21].
In order to open a new way in antileishmanial drug research, a series of sixty acridines derivatives were synthesized [22][23][24] and studied for their antileishmanial (against promastigotes and amastigotes form of parasites) activities. The aim of this study was to develop a QSAR model able to correlate the structural features of the acridines derivatives with their biological activities.
In general, the QSAR methods are based on the assumption that the activity of a certain chemical compound related to its structure through a certain mathematical algorithm. This relationship can be used in the prediction, interpretation, and assessment of new compounds with desired activities reducing and rationalizing time, efforts, and cost of synthesis and new product development.
The basic assumption to drive a QSAR model is presented due to a mathematical function of the chemical properties which is related to the effect (activity). Therefore, the effect is like the function " " of the chemical properties " ": = ( ). To find this algorithm, we use a number of chemical compounds with known values of the studied effect ( ). For each chemical compound, we calculate a series of parameters (called chemical descriptors). Then, we find an algorithm that provides a quite accurate value, similar to the real experimental value. The final step is to check if the obtained algorithm is able to predict the activity values for other chemicals not used to build up the model (external validation).
Indeed, it is very important to generate a model which worked not only for the chemical substances used within the training set but also for other similar chemicals. Consequently, the challenge is to define the correct statistical properties of the model.

Materials and Methods
The current QSAR study investigates prediction and interpretation of the studied compounds and was also used for designing new proposed compounds by using linear and nonlinear methods. It consists of four stages: selection of dataset and generation of molecular descriptors, descriptive analysis, statistical analysis (prediction and evaluation), and suggestion of novel compounds.
A flow chart for the development of the QSAR model along with the various validation methods used in this work is demonstrated in Figure 1.

Molecular Descriptors Generation.
A wide variety of molecular descriptors was calculated using Gaussian 03, ACD/ChemSketch, Marvin Sketch, and ChemOffice programs [25][26][27][28] to predict the correlation between these descriptors for the studied molecules with their antileishmanial activities and to develop linear (multiple linear regression (MLR)) and nonlinear (artificial neural network (ANN)) models. Tables 3 and 4 show the selected descriptors (using the PCA method; see more in descriptive analysis results) to be used in this study.

Descriptive Analysis.
In this stage, the principal component analysis (PCA) was used to determine the nonlinearity and nonmulticollinearity among variables (descriptors) and to select descriptors that correlate with the activity. After that, the univariate partitioning (UP) method was used to form dissimilar clusters of compounds, to which the query compounds would be compared for determining the degree of similarity and dividing the dataset into training and test sets. Internal and external validation of QSAR models Novel compounds with suggested activities N ∘ · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · pIC 50 In order to propose models and to evaluate quantitatively the physicochemical effects of the substituents on the activities of molecules, we submitted the data matrix constituted obviously from the used variables (descriptors) corresponding to the dataset molecules to a descendant MLR analysis and to an ANN. We use the coefficients , 2 , 2 adj , MSE, and value to select the best regression performance [29], where is the correlation coefficient; 2 is the coefficient of determination; 2 adj is the coefficient adjusted for degrees of freedom. Mean squared error (MSE) is the standard error of the coefficient of each descriptor and of the global model, which gives an indication of the valid inclusion of a descriptor in a QSAR model. value is the probability ( ) of Fisher statistics ( ), which gives an indication of the probability that a QSAR is a chance correlation.

Statistical Analysis (Models Development and Evaluation
In order to assess the significance of the models and accurate prediction ability for new compounds: (i) we use an internal validation procedure (leave-oneout cross validation), whereby one compound is removed and the rebuilt model with the remaining molecules is used to predict the response of the eliminated compound. This one is then returned and a second is removed, and the cycle is repeated, and so on, until all compounds have been removed one by one, and an overall correlation coefficient cv is computed; (ii) after the model is built, an external prediction is necessary. In this one, the obtained model was used to predict the activities of a test set comprising compounds that are similar to those not used in the training set. This is usually performed by splitting a dataset into a training set and a test set, typically in a 1/5 ratio. (iii) a model is valid only within its training domain and new molecules must be considered as belonging to the domain before the model is applied (OECD Principle 3 [31]). Without applicability domain (AD), 4 Advances in Physical Chemistry Table 1: Chemical structure and antileishmanial activities of studied compounds (see Figure 8).  each model can predict the activity of any compound, even with a completely different structure from those included in the study. Therefore, the AD is a tool to find out compounds that are outside the applicability domain of the built QSAR model and it detects outliers present in the training set compounds. There are several methods for defining the applicability domain (AD) of QSAR models [32], but the most common one is determining the leverage values ℎ (ℎ = ( ) −1 ( = 1, 2, . . . , )) for each compound, where is the descriptor row-vector of query compound, is − 1 matrix of model descriptor values for training set compounds, and the superscript refers to the transpose of matrix/vector [32,33]. In this study, we use Williams plot; in this plot, the applicability domain is established inside a squared area within standard deviation ± (in this study = 3; "three-sigma rule" [34]) and a leverage threshold ℎ * (ℎ * = 3 * ( + 1)/ ) [35], where is the number of training set compounds; is the number of model descriptors. The leverage (ℎ) greater than the warning leverage (ℎ * ) suggested that the compound was very influential on the model [36]. The results of the leverage approach were compared with that of the simple approach introduced by Roy et al. [37].

Multiple Linear Regression (MLR).
The descendent multiple linear regression (MLR) analysis, based on the elimination of aberrant descriptors (one by one) until a valid model (including the critical probability: value < 0.05 for all descriptors and the model complete), was employed to find a linear model of the activity of interest, which takes the following form: where is the studied activity, which is the dependent variable. 0 is the intercept of the equation. are the molecular descriptors. are the coefficients of those descriptors. This method is one of the most popular methods of QSAR/QSPR thanks to its simplicity in operation, reproducibility, and ability to allow easy interpretation of the features used. The important advantage of the linear regression analysis is that it is highly transparent; therefore, the algorithm is available and predictions can be made easily [38]. It has served also to select the descriptors used as the input parameters in the artificial neural network (ANN).  Figure 2). The input layer formed by a number of neurons equal to the number of descriptors obtained in the multiple linear regression models and the output layer represents the calculated activity values. For determination of the number of hidden neurons in the hidden layer, where all calculations of parameter optimization of neural networks are made, a parameter has been proposed. The parameter plays a major role in determining the best artificial neural network architecture [39,40]; is defined as follows: = number of data points in the training set sum of the number of connections in the ANN .
In order to avoid overfitting or underfitting, it is recommended that the value of should be between 1.00 and 3.00; if < 1, the network simply memorizes the data, whereas if > 3, the network is not able to generalize [41].

Software Packages Used in Our QSAR Development Study.
There are various free and commercial software available for QSAR development. These include specialized software for drawing chemical structures, generating 3D structures, calculating chemical descriptors, and developing QSAR models. The software packages used in our works are represented in Table 2.

Principal Component Analysis (PCA).
PCA is a useful statistical technique for summarizing all the information encoded in the structures of the compounds. It is very helpful for understanding the distribution of the compounds. This is an essentially descriptive statistical method, which aims to extract the maximum of information contained in the dataset compounds [42,43]. In this work, the PCA is used to overview the examined compounds for similarities and dissimilarities and to select descriptors that show a high  correlation with the response activity; this one gives extra weight because it will be more effective at prediction [44]. Tables 3 and 4 present the descriptors with a correlation coefficient with the activity higher, in absolute value, than 0.1. The absence of any serious multicollinearity between the descriptors present in the model was confirmed by the correlation matrix.

Univariate Partitioning (UP).
The aim of the UP was the recognition of groups of objects based on their similarity; this method is based on the criterion of partitioning proposed by Fisher [45]. In this study, the division of the dataset into training and test sets has been performed. In this one, from each obtained cluster, one compound for the training set was selected randomly to be used as test set compound. The partitioning results are given in Table 5.     (i) For the two models, P value is lower than 0.0001; it means that we would be taking a lower than 0.01% risk in assuming that the null hypothesis (no effect of the explanatory variables) is wrong and that the regressions equations have statistical significance. Therefore, we can conclude with confidence that the selected variables do bring a significant amount of information.
(ii) Higher value of 2 and 2 adj and lower mean squared error (MSE) indicate that the two proposed models are predictive and reliable.
(iii) Our obtained models were validated internally by the leave-one-out cross validation technique; the cross validation coefficient 2 cv for the two models was determined based on the predictive ability of the model. The value of 2 cv is higher than 0.5 ( 2 cv = 0.536 for the MLR pIC 50 (PRO) model and 2 cv = 0.525 for the MLR pIC 50 (AMA) model); indicating good predictability of the model.
(iv) True predictive power of these models is to test their ability to predict perfectly pIC 50 of compounds from an external test set (compounds that were not used for the developed model); pIC 50 of the remaining set of 10 compounds are deduced from the quantitative models proposed with the compounds used in training set by MLR. These models will be able to predict the activities of test set molecules in agreement with the experimentally determined value. The observed and calculated pIC 50 values are given in Table 6. The predictive capacity of the models was judged, the higher value of 2 test ( 2 test = 0.660 for the MLR pIC 50 (PRO) model and 2 test = 0.718 for the MLR pIC 50 (AMA) model) indicates the improved predictability of the model.
(v) Xternal Validation Plus indicates the absence of systematic errors in the model and a moderate performance of prediction quality of a QSAR model based on proposed MAE-based criteria (Table S2).
(vi) The values of calculated activities from (3) and (4) are given in Table S1  (i) The number of H-Bond acceptors (NHA) has a negative sign in the model, which suggests that increased activity can be achieved by decreasing the number  of heteroatoms (nitrogen or oxygen atoms), mostly the nitrogen ones for decreasing the ratio of nitrogen (N%) having also a negative sign in the model.
(ii) The electrophilicity index has a negative sign in the model, which suggests that increased activity can be achieved by decreasing the electrophilicity of the acridine derivatives (a high value of electrophilicity describes a good electrophile, while a small value of electrophilicity describes a good nucleophile).
(iii) The dipole moment has a positive sign in the model, which suggests that increased activity can be achieved by increasing the polarity of the acridine derivatives.
(iv) The number of H-Bond donors (NHD) has a positive sign in the model, which suggests that increased activity can be achieved by increasing of the heteroatom with one or more hydrogen atoms.
(v) The molecular topological index (MTI) coefficient (topological parameter) has a negative sign in the model, which suggests that increased activity can be achieved by increasing the flexibility of the substituent side chain.
(vi) The total valence connectivity (TVC) (topological parameter) has a positive sign in the model, which suggests that increased activity can be achieved by increasing in branching of the acridine derivatives, because the overall connectivity increases with both molecule size and complexity, as expressed in branching and cyclicity of molecular skeleton [46]. (i) Henry's law constant ( H ) has a negative sign in the model, which suggests that increased activity can be achieved by decreasing solubility; consequently, by decreasing the polarity, H = / ( : solubility; : pressure).
(ii) The melting point ( ) has a negative sign in the model, which suggests that increased activity can be achieved by decreasing the polarity and the branching of molecule (increasing branching makes the molecule less compact. As the surface area of the molecule decreases, it will become more compact and thus easier to pack).
(iii) The highest occupied molecular orbital energy HOMO (negative values) has a positive sign in the model, which suggests that the higher of HOMO , the weaker donating electron ability, is showing the fact that the nucleophilic reaction occurs more easily and the activity of the compound is higher.
(iv) The critical temperature (CT) has a positive sign in the model, which suggests that increased activity can be achieved by increasing the critical temperature.
(v) The number of H-Bond acceptors (NHA) has a positive sign in the model, which suggests that increased activity can be achieved by increasing the number of heteroatoms (nitrogen or oxygen atoms).
(vi) The number of H-Bond donors (NHD) has a positive sign in the model, which suggests that increased activity can be achieved by increasing of the heteroatoms with one or more hydrogen atoms.
Comparing the importance of each descriptor on pIC 50 of acridines, we must know the standardized coefficient and the -test values of them in the MLR equations. The bigger the absolute value of the -test value is, the greater the influence of the descriptor is.
This meant that the -test values of , , NHD, and TVC are both larger than those of the other descriptors, which indicates that, in this model, the influence of these descriptors on activity is stronger than that of the others. It shows also the importance of electrophilicity index in the prediction of pIC 50 PRO. Moreover, the -test value of NHD is larger than that of the other descriptors, which indicates that the influence of this descriptor, in this model, on activity is stronger than that of the others.
In the conclusion, these results illustrate that, to increase the antileishmanial activity against promastigotes parasites, we will decrease the electrophilicity and increase the branching, the polarity, and the number of hydrogen atoms attached in the heteroatom of the acridine derivatives. Moreover, to increase the antileishmanial activity against amastigotes parasites, we will decrease the solubility, the polarity, and the branching and increase the electrophilicity and the number of heteroatoms mostly attached in the hydrogen atoms and the critical temperature of the acridine derivatives.
(i) Detection of Outliners: Applicability Domain (AD). The applicability domain (AD) of these models was evaluated by leverage analysis expressed as Williams plot (Figures 4 and  5), in which the standardized residuals ( ) and the leverage threshold values (ℎ * = 0.585 and 0.553 for pIC 50 (PRO) and pIC 50 (AMA), resp.) were plotted. Any new value of predicted pIC 50 data must be considered reliable only for those compounds that fall within this AD on which the model was constructed.
From these figures, it is obvious that there is no response outlier in training set and no response outside in test set. Only one chemical is identified as outside for MLR model for pIC 50 PRO and two chemicals for MLR model for pIC 50 AMA; these outsides compounds are given as follows.
For pIC 50 PRO. Compound number 64 in test set has higher leverage which is greater than ℎ * value of 0.585 and all the compounds have a standard deviation into ± interval ( = 3).
For pIC 50 AMA. Compounds numbers 5 and 6 in test set have a standard deviation value greater than the ± interval ( = 3).
These results are confirmed using the simple approach (Tables S3 and S4), the chemical number 8 is identified as outlier and the chemical number 64 is identified as outside for MLR model for pIC 50 PRO, and any chemical is identified as outside or outlier for MLR model for pIC 50 AMA.
These erroneous predictions could probably be attributed to wrong experimental data or to the structure of these outsides; Cl, F, or NMe 2 substitutes the tree compounds at the para position of the phenyl ring ( Figure 6); maybe the selected descriptors do not pay much attention to these special substructures. The predictions of tree compounds are extrapolations of the model, but fortunately they are all "good leverage" chemicals.

Artificial Neural Network (ANN).
In this study, we used a feed forward network with two layers, with a sigmoid transfer function in the hidden layer and a linear transfer function in the output layer ( Figure 2). The architecture of the artificial neural network used in this work was 7-3-1 and = 1.82 for pIC 50 (PRO) and 6-3-1 and = 2 for pIC 50 (AMA).
The output layer represents the calculated activity values (predicted pIC 50 ). The calculation result and the performance of established models are recorded in the output layer.
To justify the predictive quality of models, total data are distributed randomly into three groups. The first group (70% of the total data) used to drive the system. The second group (15% of the total data) will be used to validate the network; and the remaining 15% that did not participate in the learning models will be used as an independent test of network generalization. The distribution of the total data into training, validation, and test sets is shown in Table 7.
The correlation between the experimental and calculated values using the artificial neural network models is highly significant, as illustrated in Figure 7 and as indicated by better and 2 and the small MSE values for all three phases: training, validation, and test ( Table 7). The predicted activities calculated with the artificial neural network and the observed values are given in Table S5.
The results obtained by MLR and ANN are very sufficient to conclude the performance of the models. A comparison of the quality of the statistical terms of these models shows that the ANN has substantially better predictive capability. ANN was able to establish a more satisfactory relationship between the molecular descriptors and the activity of the studied compounds compared to MLR; but the most negative side of this method is the fact that it is poorly transparent, whereas transparency of MLR approach gives the most interpretable results and gives a good explanation of activities with descriptors. Consequently, we can design new compounds with improved values of activity compared to the studied compounds using the MLR models. Taking into account the above results, we added suitable substitutions and then we moved to calculate their activities using the proposed models equations ( (3) and (4)). Therefore, the suggested models will reduce the time and cost of synthesis as well as the determination of the antileishmanial activities against promastigotes and amastigotes forms of parasites for the acridine derivatives.

Design New Compounds with Higher Antileishmanial
Activities. According to the above discussions, the MLR models could be applied to other acridines derivatives according to Table 1 and could add further knowledge in the improvement of new way in antileishmanial drug research. If we develop a new compound with better values than the existing ones, it may give rise to the development of more active compounds than those currently in use.
In this way, we carried out structural modification starting from compounds having the highest pIC      Figure 8 by the same methods as well as pIC 50 values theoretically predicted by the MLR models are listed in Table 8.
From the predicted activities, it has been observed that the designed compounds have higher pIC 50 values than the existing compounds in the case of the 60 studied compounds ( Table 1).

Conclusion
Following the five principles of the Organisation for Economic Co-operation and Development (OECD) for the validation of QSAR models, two different modelling methods, multiple linear regression (MLR) and artificial neural network (ANN), were used in the construction of QSAR models for the antileishmanial activities against two forms of parasites (promastigotes and amastigotes) of acridines derivatives. The accuracy and predictability of the proposed models were proven by the comparison of key statistical terms of models. The good results obtained with the internal and external validations show that the proposed models in this paper are able to predict activities with a great performance and that the selected descriptors are pertinent. The applicability domain (AD) of the MLR model was defined. The resulting models have shown that we have established a relationship between some descriptors and the activities in satisfactory manners. The ANN results have substantially better predictive capability than the MLR, but the latter gives the most important interpretable results.
The obtained results show that, to increase antileishmanial activity against promastigotes parasites, we will increase electrophilicity and decrease branching, polarity, and number of hydrogen atoms attached in the heteroatom of the acridine derivatives. Moreover, to increase antileishmanial activity against amastigotes parasites, we will increase solubility, polarity, and branching and decrease electrophilicity and number of heteroatoms mostly attached in the hydrogen atoms and critical temperature of acridine derivatives. Comparing -test and standardized coefficient values of descriptors indicates that the influences of the electrophilicity index ( ) on pIC 50 (PRO) and of the number of H-Bond donors (NHD) on pIC 50 (AMA) are stronger than those of the others.
The most important finding from this research is that we have designed and proposed new compounds with higher values of activities compared to existing ones by adding suitable substituents and calculating their activity using the regression equations. Consequently, the proposed models will reduce the time and cost of synthesis as well as the determination of the antileishmanial activities against promastigotes and amastigotes forms of parasites of acridine derivatives.