With the features of extremely high selectivity and efficiency in catalyzing almost all the chemical reactions in cells, enzymes play vitally important roles for the life of an organism and hence have become frequent targets for drug design. An essential step in developing drugs by targeting enzymes is to identify drug-enzyme interactions in cells. It is both time-consuming and costly to do this purely by means of experimental techniques alone. Although some computational methods were developed in this regard based on the knowledge of the three-dimensional structure of enzyme, unfortunately their usage is quite limited because three-dimensional structures of many enzymes are still unknown. Here, we reported a sequence-based predictor, called “iEzy-Drug,” in which each drug compound was formulated by a molecular fingerprint with 258 feature components, each enzyme by the Chou’s pseudo amino acid composition generated via incorporating sequential evolution information and physicochemical features derived from its sequence, and the prediction engine was operated by the fuzzy
Enzymes are biomacromolecules that catalyze almost all the chemical reactions essential for the life of a cell [
A schematic drawing to illustrate how to use Chou’s distorted key theory to develop peptide drugs against HIV/AIDS. (a) shows a good fitting and binding of a peptide to the active site of HIV protease right before it is cleaved by the enzyme. (b) shows that the peptide has become a noncleavable one after its scissile bond is modified although it can still tightly bind to the active site. Such a modified peptide, or ‘‘distorted key”, will automatically become an inhibitor candidate against HIV protease.
To develop enzyme-targeting drugs, an essential step is to identify drug-enzyme interaction in cellular networking [
Therefore, it would save us a lot of time and money if we could identify the interactions between enzymes and drugs before carrying out any intense study in this regard. In view of this, the present study was initiated in an attempt to develop a computational method based on the sequence-derived features that can be used to predict the drug-enzyme interactions in cellular networking.
As summarized in a comprehensive review [
The data used in this study were collected from Kyoto Encyclopedia of Genes and Genomes (KEGG) [
Since each of the samples in the current network system contains an enzyme (protein) and a drug, a combination of the following two approaches was adopted to represent the enzyme-drug pair samples.
First, for each of the drugs concerned, we can obtain a MOL file from the KEGG database [
In order to capture as much useful information from a molecular fingerprint as possible, we can also convert the above 256-bit hexadecimal string into a 1024-bit binary vector, which is a digital sequence only including 0 and 1, and consider two different digital signal characteristics for the digital sequence as follows.
Suppose that
The sequences of the enzymes involved in this study are given in Online Supporting Information S2. Now the problem is how to effectively represent these enzyme sequences for the current study. Generally speaking, there are two kinds of approaches to formulate enzyme sequences: the sequential model and the nonsequential or discrete model [
To incorporate as much useful information as possible from an enzyme sample, we are to approach this problem from three different angles, followed by incorporating the feature elements thus obtained into the general form of PseAAC of (
According to Schäffer et al. [
Therefore, based on the grey system theory and (
Substituting the elements in (
In other words, in this study (
For the convenience of the later formulation, let us use
To optimize the prediction results, different weights were usually tested for each of the elements in (
The
Fuzzy
Next, let us give a brief introduction how to use the fuzzy
Supposing that
The quantitative definitions for the aforementioned
If there is a tie between
The predictor, thus, established is called iEzy-Drug, where “i” means identify, and “Ezy-Drug” means the interaction between enzyme and drug. To provide an intuitive overall picture, a flowchart is provided in Figure
A flowchart to show the operation process of the iEzy-Drug predictor. See the text for further explanation.
In the literature, the following equation set is often used for examining the performance quality of a predictor:
To most biologists, however, the four metrics as formulated in (
It is obvious from (
The relations between the symbols in (
Now it is obvious to see from (
How to properly examine the prediction quality is a key for developing a new predictor and estimating its potential application value. Generally speaking, the following three cross-validation methods are often used to examine a predictor of its effectiveness in practical application: independent dataset test, subsampling or
A 3D plot to show how the parameter in (
The success rates thus obtained by the jackknife test in identifying interactive Enzyme-drug pairs or noninteractive enzyme-drug pairs on the benchmark dataset
The jackknife success rates obtained with iEzy-Drug in identifying interactive enzyme-drug pairs and noninteractive enzyme-drug pairs for the benchmark dataset
Method | Acc | Sn | Sp | MCC |
---|---|---|---|---|
iEzy-Druga |
|
|
|
80.39% |
NN predictorb | 85.48% | N/A | N/A | N/A |
bSee [
To provide a graphical illustration to show the performance of the current binary classifier iEzy-Drug as its discrimination threshold is varied, a 2D plot, called Receiver Operating Characteristic (ROC) curve [
A plot for the ROC curve to quantitatively show the performance of the iEzy-Drug predictor.
A semiscreenshot to show the top page of the iEzy-Drug web-server. Its web-site address is at
The reason why iEzy-Drug can remarkably improve the prediction quality is that it has introduced the 2D molecular fingerprints to represent drug samples see Online Supporting Information S3 for the detailed fingerprint expressions for the drugs listed in Online Supporting Information S1 and that it has successfully used PseAAC to incorporate the features derived from the sequences of enzymes that are essential for identifying the interaction of enzymes with drugs in the cellular networking.
To enhance the value of its practical applications, the web server for iEzy-Drug has been established that can be freely accessible at
For the convenience of the vast majority of biologists and pharmaceutical scientists, here let us provide a step-by-step guide to show how the users can easily get the desired result by means of the web server without the need to follow the complicated mathematical equations presented in this paper for the process of developing the predictor and its integrity.
The authors would like to thank the three anonymous reviewers, whose constructive comments are very helpful for strengthening the presentation of the paper. This work was supported by the Grants from the National Natural Science Foundation of China (no. 31260273), the Jiangxi Provincial Foreign Scientific and Technological Cooperation Project (no. 20120BDH80023), Natural Science Foundation of Jiangxi Province, China (no. 2010GZS0122, 20122BAB201020), the Department of Education of Jiangxi Province (GJJ12490), the LuoDi plan of the Department of Education of Jiangxi Province (KJLD12083), and the Jiangxi Provincial Foundation for Leaders of Disciplines in Science (20113BCB22008). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the paper.