Design of Thymidine Analogues Targeting Thymidilate Kinase of Mycobacterium tuberculosis

We design here new nanomolar antituberculotics, inhibitors of Mycobacterium tuberculosis thymidine monophosphate kinase (TMPKmt), by means of structure-based molecular design. 3D models of TMPKmt-inhibitor complexes have been prepared from the crystal structure of TMPKmt cocrystallized with the natural substrate deoxythymidine monophosphate (dTMP) (1GSI) for a training set of 15 thymidine analogues (TMDs) with known activity to prepare a QSAR model of interaction establishing a correlation between the free energy of complexation and the biological activity. Subsequent validation of the predictability of the model has been performed with a 3D QSAR pharmacophore generation. The structural information derived from the model served to design new subnanomolar thymidine analogues. From molecular modeling investigations, the agreement between free energy of complexation (ΔΔG com) and K i values explains 94% of the TMPKmt inhibition (pK i = −0.2924ΔΔG com + 3.234; R 2 = 0.94) by variation of the computed ΔΔG com and 92% for the pharmacophore (PH4) model (pK i = 1.0206 × pK i pred − 0.0832, R 2 = 0.92). The analysis of contributions from active site residues suggested substitution at the 5-position of pyrimidine ring and various groups at the 5′-position of the ribose. The best inhibitor reached a predicted K i of 0.155 nM. The computational approach through the combined use of molecular modeling and PH4 pharmacophore is helpful in targeted drug design, providing valuable information for the synthesis and prediction of activity of novel antituberculotic agents.


Introduction
A substantial number of the influenza A subtype H1N1 death cases reported by WHO occurred in patients with chronic respiratory conditions shedding light on possible impact of influenza on active tuberculosis (TB) patients [1]. Tuberculosis kills more than 2 million people [2] and infects around 2 billion worldwide [3] with more than 9 million cases annually [4]. According to WHO in the second millennium decade (2020), over 1 billion people will be newly infected and 36 million will die from TB [5,6] making it a leading cause of mortality as infectious disease. In this regard, the millennium development goal (MDG) to halve by 2015 TB mortality relative to the 1990 level is problematic [7]. First, mortality in comparison with the 1990 level is slightly decreasing but not in Africa [2]. Secondly, the increased occurrence of MDR and XDR-TB strains is disqualifying the current 40-year-old and long-term DOTS drugs: isoniazid, rifampicin, pyrazinamide, 2 Tuberculosis Research and Treatment and ethambutol [2]. Accordingly the need of new lowcost and short-term anti-TB therapies is more than urgent regardless those currently in the preclinical or early clinical phase since most of them are improvements on existing antimicrobial compounds with nonnegligible susceptibility to resistance.
The development of new drugs is entirely subordinated to the fulfillment of a particular profile: potency and rapid action short-term DOTS efficiency against MDR-TB safer than existing treatment coadministrable with anti-retrovirals easy to use in the field-action against latent as well as active forms [8]. Another requirement is the increased efficacy on targets relevant to dormancy phase and preventing from "nonproliferation and back resuscitation into growth phase" [3].
In the forthcoming of the availability of appropriated new targets, attention is given to enzymes catalyzing vital processes such as NAD supply [9] or ATP-dependent phosphorylation [10]. TMPKmt is the last specific enzyme for the synthesis of dTTP catalyzing dTMP conversion into dTDP using ATP as phosphoryl donor making its inhibitors potential anti-TB drugs targeting DNA replication [11]. So far the most potent thymidine-like inhibitors reported are in low micromolar range (3.5-5 M) activity. The lack of leads for TMPKmt inhibitors and the availability of X-ray crystals structure of the enzyme bound to substrate open the gate to the design of new antitubercular agents with this scaffold [12].
The high-resolution 3D structures of TMPKmt-dTMP complexes ( = 4.5 M) have been released at 1.6Å-1GSI [13] and 1.9Å-1G3U [14]. The first one with the highest resolution is displayed as 2D interaction view in Figure 1 depicting the interactions of thymine pyrimidine ring: stacking with Phe70 ( = 3.7Å) and cation-with Arg95 ( = 5.6Å). Hydrogen bonds with Arg74 and Asn100 keep the orientation of the pyrimidine ring while the ribose ring's hydroxyl is bonded to Asp9. The orientation of Arg95 is also kept by its HB to one of the phosphate oxygen atoms; this polar group interacts with Tyr39 and the Mg 2+ ion.
On the basis of these interactions, dTMP analogues have been reported by removing the monophosphate counterpart and Br substitution on the pyrimidine ring resulted in keeping the potency of the substrate [15] opening an important gate since the polarity of the dTMP analogues degrades their ADME profile due to their inability to cross cell-membrane [16]. Focusing on nucleosides instead of nucleotides with a hydroxyl group in place of OPO 3 2− for dTMP resulted in a sixfold drop of activity but a simple bromine substitution in the 5-position of the pyrimidine ring kept the initial potency (TMD4, = 5 M) [15]. This position is not suitable for other halogens F-212 M, Cl-10 M, and I-33 M neither for hydroxyls OH-270 M, CH 2 OH-820 M nor alkyl chains CH 3 CH 2 -1140 M as repulsion was expected in the case of groups larger than CH 3 [15]. No phenyl ring has been tried in that position for this nucleoside but a sevenfold decrease in potency is observed in the case of dTMP with a benzyl (C 6 H 5 CH 2 -28 M) [16]. The stabilizing feature at the 5-position of the pyrimidine ring is not purely hydrophobic, and an orientational part has to be considered. For this we explore here the possibility to design analogues with different small groups in place of the above cited groups bearing both requirements.
The 2 position on the ribose ring has been occupied by hydroxyl and halogens with no specific improvement of potency neither. Various substitutions in 3 did not improve the potency. Differently the 5 -position is the second enriching one after that on the pyrimidine ring. Few substitutions have been made beside OH : N 3 , NHCOCH 3 , NH 2 , and halogens usually to keep the interactions involving the phosphate group in dTMP analogues. More recently attempts to replace the ribose ring with a phenyl [17] with no real improvement followed by a bicyclic sugar derivatives [18] reaching 3.5 M and later by a spacer ended by acyclic nucleoside analogues exploring edge to face interaction between the naphthyl group and Tyr39 has led to 0.27 M potency [19]. A Computer-assisted combinatorial design of bicyclic thymidine analogs as inhibitors of TMPKmt by Frecer et al. [20] identified submicromolar concentration range inhibitors with favorable ADME profile.
In the study reported herein the thymidine scaffold is kept with substituted small size group at position 5 of the phenyl ring, 2 , 3 on the ribose ring and finally at 5 -position taking in to account the structure of the training set (TS). A QSAR model of interaction with TMPKmt was built from a TS of 15 TMDs starting from the above mentioned 1GSI.pdb 3D structure to compute the free energy of complexation taking into account the interaction energy, the solvation free energy, numeric solution of the Poisson-Boltzmann scheme and finally the conformational entropy variation of the inhibitor upon binding. The predictability of the model was further crossed with a PH4 3D QSAR one used to screen a library of thymidine analogues (TMAs) for subnanomolar range analogues. The identified hits from the complexation QSAR equation finally reach a predicted activity in the picomolar concentration range.

Training and Validation
Sets. The training and validation sets of thymidine analogues inhibitors of TMPKmt used in this study were selected from the literature [15][16][17]. The inhibitory potencies of these derivatives cover sufficiently broad range of activity to allow a reliable QSAR model to be built (5 ≤ exp ≤ 1900 M).

Model
Building. Molecular models of the enzymeinhibitor complexes ( : ), free TMPKmt ( ) and inhibitors ( ) were prepared from high-resolution crystal structure of the reference complex containing the deoxythymidine monophosphate TMPKmt : dTMP [13] (Protein Data Bank [21] entry code 1GSI at a resolution of 1.6Å) using Insight-II molecular modeling program [22]. The structures of the and : complexes were considered to be at a pH of 7 with neutral N-and C-terminal residues and all protonizable and ionizable residues charged. No crystallographic water molecule was included into the model. The inhibitors were built into the reference structure complex by in situ replacing 3.5 Arg 95 Figure 1: Interaction of dTMP with active site residues of TMPKmt.
of the derivatized R-group of the dTMP moiety (scaffold). An exhaustive conformational search over all rotatable bonds of the replacing function group, coupled with careful gradual energy minimization of the modified inhibitor and the TMPKmt active site residues located in the vicinity of the inhibitor (within 5Å distance), was employed to identify the low-energy bound conformations of the modified inhibitor. The resulting low-energy structures of the : complexes were then carefully refined by minimization of the whole complex. This procedure has been successfully used for model building of viral and protozoal protease-inhibitor complexes and design of peptidomimetic and hydroxynaphthoic inhibitors [23][24][25][26][27].

Molecular Mechanics.
Simulations of the models of inhibitors, TMPKmt, and their complexes were carried out with all-atom representation using atomic and charge parameters of the class II consistent force field CFF91 [28]. A dielectric constant of 4 was used for all molecular mechanics (MM) calculations in order to take into account the dielectric shielding effect in proteins. Minimizations of the : complexes, free and were carried out by relaxing the structures gradually, starting with added hydrogen atoms, continued with residue side chain heavy atoms and followed by the protein backbone relaxation. In all the geometry optimizations, a sufficient number of steepest descent and conjugate gradient iterative cycles were used with the convergence criterion for the average gradient set to 0.01 kcal⋅mol −1 ⋅ A −1 .

Conformational Search. Free inhibitor conformations
were derived from their bound conformations in the : complexes by gradual relaxation to the nearest local energy minimum. Then a Monte Carlo search (with an upper limit of 50,000 iterations) for low-energy conformations over all rotatable bonds except those in the rings was carried out using Cerius 2 molecular modeling package [29]. Two hundred unique conformations were generated for each inhibitor by randomly varying torsion angles of the last accepted conformer by ±15 ∘ at 5000 K followed by subsequent energy minimization. During the minimization, a dielectric constant = 80 was used to account approximately for the dielectric screening effect of hydration upon the generated conformers. The conformer with the lowest total energy was selected and reminimized at = 4.

Solvation Gibbs Free
Energies. The electrostatic component of solvation Gibbs free energy that incorporates also the effects of ionic strength through the solution of nonlinear Poisson-Boltzmann equation [30,31] was computed by the DelPhi module in Discovery Studio [32]. The program treats the solvent as a continuous medium of high dielectric constant ( = 80) and the solute as a cavity with low dielectric ( = 4) with boundaries linked to the solute's molecular surface, which encloses the solute's atomic charges. The program uses a finite difference method to numerically solve for the molecular electrostatic potential and reaction field around the solute. DelPhi calculations were carried out on a (235×235×235) cubic lattice grid for the : complexes and free and (65 × 65 × 65) grid for the free with full coulombic boundary conditions. Two subsequent focusing steps led in both cases to a similar final resolution of about 0.3Å per grid unit at 70% filling of the grid by the solute. Physiological ionic strength of 0.145 mol⋅dm −3 , atomic partial charges and radii defined in the CFF91 parameter set [28] and a probe sphere radius of 1.4Å were used. The electrostatic component of the solvation Gibbs free energy was calculated as the reaction field energy [30,[33][34][35].
2.6. Entropic Term. The vibrational entropy change during the inhibitor binding to the was calculated by normal mode analysis of the inhibitor vibrations using a simplified method of Fischer et al. [36,37]. In this approach, vibrational analysis of the inhibitor bound at the active site of a "frozen" receptor ( ) and of the low-energy conformer of the free inhibitor is computed for fully minimized structures using Discover [22] and It has been shown previously that for small and relatively stiff ligands this method gives a good approximation of the vibrational entropy change of the fully flexible system, that is, including the degrees of freedom of the protein receptor [36,37]. The vib { } term accounts for vibrational motions of the free inhibitor and represents an indicator of conformational flexibility of the molecule. Namely, low frequency vibrations, which correspond to collective motions of a number of atoms with larger amplitudes, that is, conformational changes, contribute most to this term. Relative values of ΔΔ vib with respect to a reference inhibitor were used to compensate partially for the restricted flexibility of . Although enthalpic contribution to binding affinity is important, the enthalpy/entropy balance is acting more and more as descriptor of selectivity bringing drug optimization to a multidimensional approach [38]. value can thus be predicted from the complexation GFE Δ com = −RTln assuming the following equilibrium: where {} aq indicates solvated species. The standard GFE change of reaction (1) can be derived by molecular simulations of the complex and the free reactants: In this work, we approximate the exact values of standard GFE for larger systems such as enzyme-inhibitor complexes by the expression [25,26]: where MM { : } stands for the molecular mechanics total energy of the complex (including bonding and nonbonding contributions), and sol { : } is the solvation GFE and trv { : } the entropic term: composed of a sum of contributions arising from translational, rotational, and vibrational motions of : . Assuming that the trans and rot terms for the free and the complex : are approximately equal, we obtain where tran { } and rot { } describe the translational and rotational entropy terms of the free inhibitor, and Δ vib represents the vibrational entropy change upon the complex formation.
Comparison between different inhibitors was done via relative changes in the complexation GFE with respect to a reference inhibitor, ref , assuming ideal gas behaviour for the rotational and translational motions of the inhibitors: The evaluation of relative changes is preferable as it is expected to lead to partial cancellation of errors caused by the approximate nature of the molecular mechanics method as well as solvent and entropic effects description.

Interaction Energy.
To calculate MM interaction energy ( int ) between enzyme residues and the inhibitor, a protocol available in Discovery Studio [32] that computes the nonbonded interactions (van der Waals and electrostatic terms) between defined sets of atoms was used. The calculations were performed using the CFF91 force field [28] with a dielectric constant of (5).
2.9. Pharmacophore Generation. Bound conformations of inhibitors taken from the models of : complexes were used for building of 3D QSAR pharmacophore by means of Catalyst HypoGen algorithm [39] implemented in Discovery Studio [32]. The top scoring pharmacophore hypothesis was built up in three steps (constructive, subtractive, and optimization steps) from the set of most active inhibitors. Inactive molecules served for definition of the excluded volume. The maximum number of five features allowed by the HypoGen algorithm was selected based on the TMD scaffold and substituents during the pharmacophore generation, namely: hydrophobic aromatic (HYdAr), hydrophobic aliphatic (HYd), hydrogen bond donor, (HBD), hydrogenbond acceptor (HBA), and ring aromatic (Ar). Adjustable parameters of the protocol were kept at their default values except the uncertainty on the activity, which was set to 1.5 instead of 3. This last choice to bring the uncertainty interval on experimental activity from the large ⟨ /3, 3 × ⟩ to a relatively narrow ⟨2 × /3, 3 × /2⟩ taking account in this way of the accuracy and homogeneity of the measured inhibitory activities since they are coming from the same work in the same laboratory. During the generation of 10 pharmacophores, the number of missing features was set to 0 and the best one was selected.

Results and Discussion
A training set of 15 TMDs and validation set of 6 TMVs were selected from 3 series of compounds with measured activities from the same laboratory [14][15][16]. They are listed in Table 1 and their experimental activity (5-1900 M) covers a range sufficiently large to build a reliable QSAR model. (2) was computed for the complexes from in situ modification of dTMP as described in Material and Methods. Table 2 lists the Gibbs free energy and its components. The ΔΔ comp reflects the mutual affinity between the enzyme and the inhibitor. Since it is computed from simulations in an approximate way, the consistency of the binding model is evaluated through a regression analysis leading to linear correlation with experimental activity . The statistical data of these regressions are illustrated in Figure 3 and listed in Table 3. From the regression equation, the computed ΔΔ comp for a novel compound similar to TMDs is used to predict its enzyme inhibitory activity pred provided they share the same binding mode. This process usually can narrow the filter to new lead compounds and save time compared with traditional synthesis approach. From the complexation model structures the computed breakdown of the contribution of  Figure 4. It comes out clearly from TMD4 to TMA12 that except Tyr39 all the residues involved in strong interaction with TMD4 remain in close contact with TMA12 ( Figures 2 and 5(c)) with a noticeable increase of the contributions to int for Phe70, Pro37, Arg74, and Tyr165.   f ΔΔ comp is the relative Gibbs free energy change related to the enzyme-inhibitor complex formation: ΔΔ comp ≅ ΔΔ MM + ΔΔ solv − ΔΔ vib . g exp is the experimental TMPKmt inhibition constant obtained from [15][16][17]. h Ratio of predicted and experimental inhibition constants pre / exp ⋅ pre was predicted from computed ΔΔ comp using the regression equation for TMPKmt shown in Table 3.   cation-interaction with Arg95. This last interaction is the main driving force of the substrate-like orientation of TMDs. The binding mode of dTMP scaffold as reported from X-ray structures is reproduced by our model based on thymidine scaffold making enrichment of interactions by new substitutions straightforward provided the accuracy of our predictive equation is sufficiently stable to cover the nanomolar range with the assumption that structurally similar ligands bind in a similar fashion. A quick analysis of the training set shows that halogen in the 5-position on pyrimidine ring is not the main feature for potency confirmed by the relatively low activity of TMD2-3. At the other ends in 5 -position only N3 bring substantial increase of potency. These two positions are of great importance for activity as indicated by the pharmacophore features in Figure 6. It is easy to appreciate the weight of each feature since CH3 in 5-position in a majority of inhibitors did not improve the inhibitory potency resulting in a 5-fold decrease in TMD15 (27 M) compared with Br in TMD4 (5 M). The cation-

Pharmacophore
Model of Inhibitory Activity. The 3D QSAR PH4 generation process follows three main steps, the constructive, the subtractive, and the optimization steps. The constructive phase of HypoGen automatically selected as lead compounds (5 ≤ ≤ 7 M) for which (5 × 1.5 − /1.5 > 0) TMD4 and TMD5 using these top two to generate all the starting PH4 features and retaining only those fitting the remaining leads. The subtractive phase in which inactive compounds (log( )-log(5) > 3.5) are used to remove features that map more than 50% of them retained representatives of all the selected five features. In the optimization phase using the simulated annealing algorithm the highest scoring PH4s are retained according to their probability function-based cost. A total of 10 hypotheses were generated all displaying four features. The costs range  . This difference is a major quality indicator of the PH4 predictability; Δ > 70 corresponds to an excellent chance or a probability higher than 90% that the model represents a true correlation [32]. To be statistically significant the hypothesis has to be as closer as possible to the fixed and as further as possible from the null cost, for this set of 10 hypotheses Δ ≥ 82.2 attests the quality of the model. The standard indicators like the root-mean-square deviations (RMSD) and the correlation coefficients ( 2 ) range from 1.126 to 2.036 and from 0.96 to 0.88, respectively. Due to the closed values for the whole set of PH4s, the first hypothesis (Hypo1) has been retained for further analysis.
The randomization validation (Fisher method) of the PH4 model by CatScramble algorithm in Catalyst has been carried out from 49 random runs according to the 98% confidence level selected creating each time 10 valid hypotheses. Finally none of them was as predictive as the 10 lowest listed in Table 5 Table 4 (see also Figure 5), the most active ones reaching picomolar concentration range.

Substitution at the 5 End (R 4 ).
Keeping all the remaining moiety unchanged, the 5 hydroxyl group in the training set has been replaced with various (TMA1-6, TMA18) polar groups among which the carboxylate COO − performed the best but with a fivefold decrease of potency ( = 24 nM) from TMD4. As we can see from Figure 5(b), the carboxylate interacts through two HB with Arg95 and Tyr39 bringing substantial affinity towards TMPKmt. The former OH was not involved in any HB (Figure 2). About this additional HB with Tyr39 it is interesting to point out, among the few differences of the binding sites of TMPKmt and TMPKh (the human corresponding enzyme-PDB code 1E99) where Arg14, Tyr39 and Asn100 are replaced in the last one by Ser20, Arg45 and Gly102, respectively [41], the crucial opportunity to exploit this structural and functional difference (interaction with Tyr39) to design selective inhibitors for TMPKmt. Recent attempts of extension by replacement of the carboxylate COO − with other groups resulting in 5 -modified thymidines are promising [42].

Substitution at the 3 Position of the Ribose Ring (R 3 ).
In order to improve the predicted potency for TMA12, 16 and 17 replacement of OH by NH 2 on the ribose ring does not result in any increase in predicted inhibitory activity, the best one TMA37 = 1 nM being sixfold less potent than TMA12. Any other design modification, for example replacement of ribose by a cyclopentane, led to a decrease of potency (TMA19-27 and TMA41-47). Figures 5(b) and 5(c) show the main interactions supporting the affinity of TMA12 in comparison with those coming out from our QSAR model for TMDs. The Connolly surface presented in Figure 5(d) suggests further design opportunities such as bicyclic 2 ,3 -ring inhibitors of TMPKmt [20] and extension of the carboxylate end to derivatives carrying a naphtholactam or naphthosultam moiety at position 4 of a (Z)-butenyl chain [43]. The design of this class of inhibitors requires appropriate training set available ( : 0.42-0.75 M). We are building the QSAR model and new analogues will be released in due course [44].

Conclusions
The interaction model built based on complexation methodology and validated by a PH4 analysis provided structural information helpful in the design of new analogues. They are structurally close to reported dTMP analogues. Here a carboxylate in place of 5 -position hydroxyl increased their inhibitory potency reaching low to subnanomolar range (1-0.155 nM). One of the most potent new analogue TMA12 is HB bonded to the selective Tyr39 making the best of TMAs a promising set for synthesis and evaluation. A more systematic search through screening of combinatorial library and subsequent evaluation on enzymatic assay will lead to more potent analogues.