Molecular Modeling Studies of Thiophenyl C-Aryl Glucoside SGLT2 Inhibitors as Potential Antidiabetic Agents

A QSAR study on thiophenyl derivatives as SGLT2 inhibitors as potential antidiabetic agents was performed with thirty-three compounds. Comparison of the obtained results indicated the superiority of the genetic algorithm over the simulated annealing and stepwise forward-backward variable method for feature selection. The best 2D QSAR model showed satisfactory statistical parameters for the data set (r 2 = 0.8499, q 2 = 0.8267, and pred_r 2 = 0.7729) with four descriptors describing the nature of substituent groups and the environment of the substitution site. Evaluation of the model implied that electron-rich substitution position improves the inhibitory activity. The good predictive 3D-QSAR models by k-nearest neighbor (kNN) method for molecular field analysis (MFA) have cross-validated coefficient q 2 value of 0.7663 and predicted r 2 value of 0.7386. The results have showed that thiophenyl groups are necessary for activity and halogen, bulky, and less bulky groups in thiophenyl nucleus enhanced the biological activity. These studies are promising for the development of novel SGLT2 inhibitor, which may have potent antidiabetic activity.


Introduction
One of the main features of diabetes is the elevation of blood sugar with its deleterious consequences in a variety of tissues [1]. Thus, control of the plasma glucose level is of utmost importance in the treatment of this disease. In recent years, the idea that affecting glucose absorption in the intestine and/or the glucose reabsorption in the kidney might be a possible way to control the sugar level has evolved. Diabetes comprises a group of metabolic disorders characterised by chronic hyperglycaemia with disorders in the metabolism of carbohydrate, fat, and protein that result in defects in secretion and action of insulin [2]. Dysfunction and failure of various organs, especially the eyes, kidneys, nerves, and heart, and the blood vessels are the usual complications of diabetes [3,4]. Diabetes is mainly divided into four main types including insulin-dependent diabetes mellitus (type 1), non-insulin-dependent diabetes mellitus (type 2), gestational diabetes, and other specific types [5]. Diabetes mellitus type 2 (T2DM) accounts for almost 90% of diabetes cases, with the property of insulin resistance and beta-cell dysfunction 2 International Journal of Medicinal Chemistry in the brush border membrane of mature enterocytes in the small intestine, where it absorbs dietary glucose and galactose from the gut lumen [13]. SGLT2, expressed exclusively in the kidney, is located in the S1 segment of the proximal convoluted tubule of the kidney. It is a low-affinity, highcapacity cotransporter and is responsible for 90% of renal glucose reabsorption [14]. Several therapeutic agents are available for monotherapy or for combination therapy with different mechanisms to treat diabetics, such as metformin, rosiglitazone, sitagliptin, acarbose, and glimepiride [15]. The obvious need for new approaches to treat patients with uncontrolled T2DM has prompted continuous exploration of alternative targets involved in maintenance of glucose homeostasis. Several SGLT2 inhibitors have been reported as undergoing clinical trials. Phlorizin [16], 3-(benzo[b]furan-5-yl)-2 ,6 -dihydroxy-4 -methylpropiophenone-2 -O--Dglucopyranoside [17], sergliflozin [18], and remogliflozin [19] are O-glycosides and show strong inhibition of SGLT2. They also demonstrate efficacy in vivo when administered orally in rats or mice. They induce a glucosuric response, the result of the blockade of renal glucose reabsorption, and consequently lead to reduction of the blood glucose level and improvement of insulin sensitivity [20]. Quantitative structure-activity relationship (QSAR) studies can be utilized to predict eye irritation potential as an alternative in silico method, just as it has been used successfully to predict several other toxicological endpoints for some time [21]. Hence, in continuation to our efforts  in developing QSAR studies for angiotensin II AT 1 receptor, antitubercular agents, antimalarial activity, antimicrobial activity, antibacterial activity, COX inhibitors, and so forth.
In this study, we have taken thiophenyl derivatives for performing 2D and 3D quantitative structural-activity relationship analysis and calculations in order to understand their stereoelectronic properties. Genetic algorithm (GA), simulated annealing (SA), and stepwise forward-backward variable selection methods have been employed for selection of relevant descriptors. The obtained results provide further insight into some beneficial information in structural modifications to design new potential SGLT2 inhibitors. Moreover, new compounds with high predictive activities were designed.

Data Set.
The biological data set was chosen from a series of thirty-three thiophenyl derivatives as SGLT2 inhibitors as potential antidiabetic agents reported by Lee et al. [68]. The biological activity values [IC 50 (nM)] reported in nanomolar units were converted to their molar units pIC 50 and subsequently used as the dependent variable for the QSAR analysis. The converted to pIC 50 for the QSAR analysis along with the structure of the compounds in the series are listed in Table 1 (marked with asterisk). The test compounds were selected manually such that the structural diversity and wide range of activity in the data set were included. In this paper, a series of thiophenyl compounds with substitutions at X and R position of thiophenyl moiety are subjected to examining the relationships between structural modifications and activities against hSGLT2 inhibitors with the help of QSAR modeling.

Computational Details.
All the computational studies were performed by V-life MDS (Molecular Design Suite) 3.5 software supplied by V-life Sciences Technologies Pvt. Ltd., Pune, India [69]. The sketched structures were used for the calculation of 2D molecular descriptors using QSAR module of Molecular Design Suite software. Each compound was subjected to energy minimization and batch optimization using Merck Molecular Force Field (MMFF), fixing Root Mean Square Gradients (RMS) to 0.01 kcal/molÅ [70].
The sphere exclusion method [71] was adopted total set of inhibitors was divided randomly into a training set (26 compounds) for generation of QSAR models and a test set (7 compounds) for validation of the developed model. This random division of data set will be done through several cycles in order to get the best QSAR model. This study will help in rational drug designing of these derivatives as SGLT2 inhibitors for the eradication of T2DM. The unicolumn statistics of the training and test sets are reported in Table 2. The maximum and minimum value in training and test set were compared in a way that (1) the maximum value of pIC 50 of test set should be less than or equal to maximum value of pIC 50 of training set, (2) the minimum value of pIC 50 of test set should be higher than or equal to minimum value of pIC 50 of training set.

Calculating 2D
Descriptors. In the current approach, an attempt has been taken to understand the structural and physicochemical requirements of a set of hSGLT2 inhibitors by the help of regression 2D quantitative structureactivity relationship (2D QSAR). The energy-minimized geometry was used for the calculation of the various 2D descriptors such as topological, shape and geometrical, and physicochemical parameters such as individual (H-Acceptor count, H-Donor count, XlogP, retention index (Chi), element count, estate numbers, estate contribution, and alignmentindependent descriptors were used as predictor variables), as they were found to be appropriate for the development of models. A considerable number of the 265 physicochemical parameters, 300 alignment type parameters, and 99 atoms types count descriptors calculations were done using the V-life, MDS 3.5. The preprocessing of the independent variables (i.e., 2D descriptors) was done by removing the invariable (constant column) which resulted in a total of 216 molecular descriptors to be used for QSAR analysis. The various alignment-independent descriptors were also calculated. In this study to calculate AI descriptors, we have used the following attributes: 2 (double bonded atom), 3 (triple bonded atom), C, N, O, S, H, F, Cl, Br, and I, and the distance range of 0-7. The QSAR models were built with the consideration of the applicability of the descriptor for the activity. Various types of physicochemical descriptors have been calculated which are shown in the data sheet (Table 3).
International Journal of Medicinal Chemistry 3     [72] where a template structure is defined and used as a basis for alignment of a set of molecules, and a reference molecule is chosen on which the other molecules of the data set get aligned considering the chosen template. In the present study, we aligned the database by fitting all of the compounds on most active compound 16 (Figure 1(a)) as an alignment template using a common substructure with the V-life MDS routine database alignment. The superimposition of all molecules based on minimizing root mean square deviation (RMSD) is shown in Figure 1(b). The steric, electrostatic, and hydrophobic fields were calculated at each lattice intersection of a regularly spaced grid of 2.0Å. Methyl probe of charge +1 with 10.0 kcal/mole electrostatic and 30.0 kcal/mole steric and hydrophobic cutoff was used for fields generation. This resulted in calculation of 4500 field descriptors (1500 for each steric, electrostatic, and hydrophobic which theoretically form a continuum) for all the compounds in separate columns (Table 3).

External Validation for 2D QSAR Models.
The QSAR models were assessed by the number of cross-validated 2 by leave-one-out method [73] ( 2 ), cross-validated standard error ( 2 se), predicted 2 for external test set (pred 2 ), and standard error for predicted 2 (pred 2 se). The internal cross-validated predictability ( 2 ) was evaluated by the equation given below: where ,̂are the actual and predicted activity of the th molecule in the training set, respectively, and mean is the average activity of all molecules in the training set. For external validation, activity of each molecule in the test set was predicted using the model generated from the training set. The pred 2 value is calculated as follows: where ,̂are the actual and predicted activity of the th molecule in the test set, respectively, and mean is the average activity of all molecules in the training set. Both summations are over all molecules in the test set. Thus, the pred 2 value is indicative of the predictive power of the current model based on the external test set.

Evaluation of the Quantitative of Models.
Among several search algorithms, stepwise (SW) forward variable selection (1) 2 > 0.5,

Results and Discussion
QSAR study was performed on thiophenyl C-aryl glucoside derivatives for their SGLT2 inhibitors as potential antidiabetic agents. Comparison of the obtained results indicated the superiority of the genetic algorithm over the simulated annealing (SA) and stepwise forward-backward variable method for feature selection: The significant model with 2 = 0.8499 was considered, as model-1 showed an internal predictive power ( 2 = 0.8267) and a predictivity for the external test set (pred 2 = 0.7729) of about 77%. The -test = 45.897 shows the statistical significance of 99.99% of the model which means that the probability of failure of the model is 1 in 10000. In addition, the randomization test shows confidence of ∼99.9% that the generated model is not random and hence it is chosen as the QSAR model. Genetic algorithm-PLS model indicates the positive contribution of SsCH 3 count, and SaaSE-index showed that increase in the values of these descriptors is beneficial for the SGLT2 inhibitors. From the above model, it is clear that the descriptor T C Cl 1 contributes positively to the SGLT2 inhibitors activity, which corresponds to count of number of carbon atoms separated from any chlorine atom by 1-bond distance. Thus, the presence of chloro substituents (like in compounds 1-18) would increase the activity. Descriptor SsCH 3 count indicates that increase in methyl group of R position may lead to an increase in the activity. Its positive value suggests that increasing the number of such carbons will lead to better SGLT2 inhibitors. This suggests that substituents such as methyl would increase the activity. The above results are in close agreement with the experimental observations, where compounds 10, 15, 16, 17, and 28-33 at the R position produce SGLT2 inhibitors. Molecules with negative coefficient LUMO energy can accept electrons more easily than those having high LUMO energy. The SaaSE-index (∼24%) shows the sulphur atom connected with two aromatic bonds in the molecule and is inversely proportional to the activity. This further suggests that the increase in electronegative atom environment adjacent to indicated sulphur atom would result in increase in the activity. In addition, in agreement with QSAR model, presence of more bulky and hydrophilic substituents like methoxy at ring R led to an increase in SGLT2 potency. On the other hand, more lipophilic halogens like chloro in this X position retain the SGLT2 inhibition activities. The residuals (observedpredicted activity) were found to be minimal and are presented in Table 4. The statistical results of best model and the correlation matrix between the physicochemical parameters and the biological activity for model-1 are presented in Table 5. The contribution chart of selected descriptors is represented in Figure 1(c). Also, the graph for observed versus predicted activity for the series is plotted in Figure 1(d) which shows good correlation.
Molecular fields are the steric, electrostatic, and hydrophobic interaction energies which are used to develop a model for 3D QSAR. In this study, 3D QSAR models were generated by kNN-MFA in conjunction with genetic algorithms (GA), simulated annealing (SA), and stepwise (SW) forward-backward selection methods:  These points suggested the significance of electrostatic properties as indicated in the ranges in parentheses for maximum SGLT2 inhibitory activity. The negative value for E 184 means that electron-withdrawing substituents in this region are favorable and would increase SGLT2 inhibitory activity, as shown by the presence of chlorine group in the active compounds. Therefore, less steric and more steric substituents were preferred at the position of generated data points S 1044 and S 931, respectively, for enhancing the biological activity of thiophenyl pharmacophore. Two data points generated at the position of R around thiophenyl nucleus were steric points S 1044 and S 931 which indicates that less steric or less bulky substituents are favorable on this site. On the other hand, less electronegative groups such as hydroxyl and nitro. The electrostatic blue ball model around R positions of the thiophenyl suggested the electron-withdrawing groups on this position benefited potency; this may be the reason why compounds with double bonds at R positions had higher potencies than other compounds. In addition, a red contour near the position suggested the electron-withdrawing substituent would increase the activity. Therefore, the -OH at R position resulted in significant increased activity. Electronwithdrawing nature of the electronegative chloro atom does contribute to the SGLT2 inhibitory activity of the molecule. The graph for observed versus predicted activity for the series is plotted in Figure 1(f) which shows good correlation.  The residuals (observed-predicted activity) were found to be minimal and are presented in Table 4: training = 26, test = 7, 2 = 0.7254, 2 se = 0.3795, -test = 31.8965, and pred 2 = 0.6938.
3D data points generated, which contribute to SA kNN-MFA QSAR model, are shown in Figure 1(g). The external predictability of the above 3D QSAR model using the test set was determined by Pred 2 , which is 0.6938. The points generated in SA kNN-MFA 3D QSAR model are E 1139 (0.1545, 0.1545) and S 646 (−0.3001, −0.3001), that is, electrostatic and steric interaction fields at lattice points 1139 and S 646, respectively. Positive values for E 1139 show that electrondonating groups on the thiophenyl ring increase biological activity of compounds. On the other hand, thiophenyl ring might be substituted with either electron-withdrawing or electron-donating groups without loss of activity. 3D QSAR studies showed requirement of steric group at R position. The graph for observed versus predicted activity for the series is plotted in Figure 1(h). The residuals (observed-predicted activity) were found to be minimal and are presented in Table 4: training = 26, test = 7, 2 = 0.7359, 2 se = 0.3562,test = 30.8743, and pred 2 = 0.6743.
The 3D model using the SW-kNN-MFA analysis method 2 was found to be 0.7359 which suggests that the model could be useful for predicting SGLT2 inhibitor for such thiophenyl derivatives. These points suggested the significance and requirement of electrostatic and steric and hydrophobic properties as mentioned in the ranges for structure-activity relationship and biological activity of thiophenyl analogues. It shows a positive contribution towards the activity, which indicates that H 1153 hydrophobic substitution at R position favors activity. Figure 1(i) showed the hydrophobic contour the presence of a yellow contour covering the R positions of the ring indicating that hydrophobic substituents may be well tolerated in that region. The R position of the thiophenyl was surrounded by a yellow contour which also indicated that hydrophobic groups at this position may increase activity. Less bulky substituents are tolerated at thiophenyl ring meaning that increasing size of the groups substituted in these regions reduces SGLT2 inhibitory activity, since S 190 have negative values. Positive values for the hydrophobic 1153 lattice points around ring indicate that SGLT2 inhibitory activity could be increased by substituting more hydrophobic groups in these regions. Similarly, the positive values of electrostatic descriptors suggested the requirement of electropositive or sterically bulky groups at the position of generated data point E 1090 around thiophenyl pharmacophore for maximum activity. The electrostatic E 1090 data point generated which indicates sterically bulky groups such as benzene, methyl, ethyl, and isopropyl was required at R position. The graph for observed versus predicted activity for the series is plotted in Figure 1(j). The residuals (observed-predicted activity) were found to be minimal and are presented in Table 4.

Conclusions
QSAR study was performed on thiophenyl C-aryl glucoside derivatives for their SGLT2 inhibitors as potential antidiabetic agents. Genetic algorithms (GA), simulated annealing (SA), and stepwise (SW) forward-backward selection methods have been employed for selection of relevant descriptors. Comparison of the obtained results indicated the superiority of the genetic algorithm over the stepwise method for feature selection. 2D QSAR further revealed that a specific group or type of descriptor is not sufficient to capture the true factors responsible for the activity in the set of inhibitor compounds. This study also revealed that SsCH 3 count, along with LUMO energy and SaaSE-index, forms a powerful tool to improve a QSAR model. This study used T C Cl 1 to investigate whether a similarity based set generation method would lead to better understanding of the QSAR models. The 2D and 3D QSAR suggested the presence of negative steric potential at R position in thiophenyl nucleus, that is, R position in thiophenyl nucleus should acquire less steric or less bulky substituents are favorable as well as according to models. The constructed 3D QSAR models and structureactivity relationship (SAR) analyses of the compounds used in the study suggested that an electronegative and bulky substituent at R position and one less bulky substituent at X, R position of thiophenyl analogs are required to design novel SGLT2 inhibitors. We emphasize in this study that the E 1139 electron-donating groups on the thiophenyl ring increase biological activity of compounds. Electronwithdrawing groups are highly favorable. Furthermore, the kNN-MFA maps offered enough information to understand the structure-activity relationship and identified structural features influencing the inhibitory activity. The correlation of the results obtained from 3D QSAR study successfully explored the primitive structure-activity relationship. The findings can be quite useful to aid the designing of novel antidiabetic agents with high predicted potent activity.