Evaluation and Identification of the Neuroprotective Compounds of Xiaoxuming Decoction by Machine Learning: A Novel Mode to Explore the Combination Rules in Traditional Chinese Medicine Prescription

Xiaoxuming decoction (XXMD), a classic traditional Chinese medicine (TCM) prescription, has been used as a therapeutic in the treatment of stroke in clinical practice for over 1200 years. However, the pharmacological mechanisms of XXMD have not yet been elucidated. The purpose of this study was to develop neuroprotective models for identifying neuroprotective compounds in XXMD against hypoxia-induced and H2O2-induced brain cell damage. In this study, a phenotype-based classification method was designed by machine learning to identify neuroprotective compounds and to clarify the compatibility of XXMD components. Four different single classifiers (AB, kNN, CT, and RF) and molecular fingerprint descriptors were used to construct stacked naïve Bayesian models. Among them, the RF algorithm had a better performance with an average MCC value of 0.725±0.014 and 0.774±0.042 from 5-fold cross-validation and test set, respectively. The probability values calculated by four models were then integrated into a stacked Bayesian model. In total, two optimal models, s-NB-1-LPFP6 and s-NB-2-LPFP6, were obtained. The two validated optimal models revealed Matthews correlation coefficients (MCC) of 0.968 and 0.993 for 5-fold cross-validation and of 0.874 and 0.959 for the test set, respectively. Furthermore, the two models were used for virtual screening experiments to identify neuroprotective compounds in XXMD. Ten representative compounds with potential therapeutic effects against the two phenotypes were selected for further cell-based assays. Among the selected compounds, two compounds significantly inhibited H2O2-induced and Na2S2O4-induced neurotoxicity simultaneously. Together, our findings suggested that machine learning algorithms such as combination Bayesian models were feasible to predict neuroprotective compounds and to preliminarily demonstrate the pharmacological mechanisms of TCM.


Introduction
Traditional Chinese medicine (TCM) prescriptions, as a representative of drug combinations, are generally used in combination with multiple medicinal herbs in a certain dose to exert therapeutic effects [1,2]. In clinical practice, the TCM prescription Xiaoxuming decoction (XXMD) has been an effective prescription for the treatment of stroke for over 1200 years and has been recorded in Beiji Qianjin Yaofang (Essential Prescriptions Worth a Thousand Gold for Emergencies) in Tang Dynasty [3]. XXMD consists of twelve herbs, which are presented in Table 1. In previous studies, it has been shown that prophylactic administration of XXMD for 5 days prior to surgery protected rats against ischemiareperfusion-induced brain injury [4]. However, due to the complex composition in TCM prescriptions, it is difficult to conduct in-depth studies on the underlying pharmacological mechanisms of XXMD. Therefore, it is essential to clarify the compatibility of various herbs in TCM prescriptions with sufficient approaches. To further explore the therapeutic roles 2 BioMed Research International of XXMD, phenotypic-based drug discovery technology was employed in this study. Regardless of the molecular mechanism of action during the initial stages of drug discovery, phenotypic-based screening, as opposed to target-based screening, has provided new impetus to improve the probability of success in drug discovery [5,6]. Thus, identifying the components in XXMD by phenotypic-based screening will contribute to elucidating the mechanism of therapeutic effects of XXMD. There is no doubt that traditional analytic techniques, such as experimental screening methods, take up time and resources. To solve these issues, and to enhance screening efficiency, virtual screening (VS) methods, thereby taking machine learning as an example, have been widely adopted [7][8][9]. However, studies that focus on classification predictions towards phenotypic screening to evaluate and identify classic TCM prescriptions and discovering neuroprotective agents against ischemic stroke are limited.
Ischemic stroke, which is the result of an insufficient blood supply leading to dysfunction of the brain tissues [10], is mainly due to the occurrence of atherosclerosis and thrombosis in the arteries supplying blood to the brain [11]. Tissue plasminogen activator (tPA) is the only thrombolytic agent that is approved by the US Food and Drug Administration (FDA) [12]. However, when a thrombus is destroyed, blood containing a large amount of reactive oxygen species (ROS) will perfuse into the ischemic site, thereby causing cerebral ischemia-reperfusion (I/R) injury. As the most essential ingredient of ROS, H 2 O 2 is produced in the body's oxidative metabolism and is considered a messenger of intracellular signaling cascades [13]. Large-scale production of H 2 O 2 will damage the structure and function of biological membranes and organelles (such as mitochondria) in neuronal cells [14,15]. Moreover, the excessive level of oxidative stress caused by I/R injury results in a vicious cycle of mitochondrial dysfunction, calcium overload, excitatory glutamate excess release, and lipid peroxidation following cerebral I/R injury [16]. This will lead to an imbalance of neuronal cell homeostasis, which subsequently aggravates I/R injury. Thus, to facilitate cerebral tissue repair, identifying novel therapies is of utmost importance.
In this study, a workflow for the classification models, model validations, and their application to virtual screening of neuroprotective agents is presented in Figure 1. Two data sets containing 263 and 116 neuroprotectants were constructed, respectively, and categorized into a training set and test set. The data set containing 263 compounds was used to construct hypoxia-induced neuronal injury models (NIN models), while the data set containing 116 compounds was used to construct H 2 O 2 -induced neuronal injury models (NHN models). Next, data of four single classification models (AB, kNN, CT, and RF) and molecular fingerprint descriptors were integrated to construct stacked naïve Bayesian (s-NB) models. The predictive power of the models was evaluated by five times cross-validation of the training set and validation of the test set. The final two optimal stacked NB classification models, s-NB-1-LPFP4 and s-NB-2-LCFP6, were used for classification of the phenotypic-based active ingredient combinations and screen potential neuroprotectants in XXMD. Furthermore, to verify the results of the two optimal Bayesian classification models, sodium dithionite (Na 2 S 2 O 4 ) and hydrogen peroxide (H 2 O 2 ) were used to induce chemical hypoxia and oxidative damage, which mimicked the hypoxic phenotype and reperfused phenotype in SH-SY5Y cells [17,18].  and PubChem Compound Database (https://www.ncbi.nlm .nih.gov/pccompound/). A total of 1858 compounds were collected. After removal of duplicate compounds, which were found in more than one herb, 1484 compounds were selected for further studies. Information of XXMD and the twelve herbs obtained from database sources is presented in Table 1.

Collection and Preparation of Training Set and Test
Set. Compounds that were defined as active in the two phenotypes were collected from the ChEMBL database [19,20], using IC 50 ≤10 M in nerve cells as the selection criterion. After eliminating duplicate structures, active datasets were constituted with corresponding neuroprotective compounds. It is noteworthy that compounds collected from the ChEMBL database did not overlap with compounds in the XXMD dataset. Corresponding decoys (defined as inactive) were automatically generated by the DUD-E online database with a ratio of 4:1 to active compounds [21]. Morgan fingerprints (4096 bits, radius = 2) were generated using the RDKit package in Python and were then fed into a t-distributed stochastic neighbor embedding (t-SNE) algorithm and principal component analysis (PCA) to obtain 2-dimensional representations [22][23][24]. As shown in the t-SNE plot (Figures 2(a) and 2(b)), remarkable colocalization of several active compound clusters was shown, which was well separated from inactive compounds. It should be noted that different active compound clusters may represent different chemotypes, which would require further investigation in the future. Results from PCA plots are presented in Figures 2(c) and 2(d).
Considering the sparseness nature of Morgan fingerprints, it is reasonable that the explained variance ratios of PC1 and PC2 are not high. However, when using the top two PCs, some active compounds can still be distinguished from inactive compounds. Both the active and inactive data sets were randomly distributed into a training set and test set with a ratio of 3:1 as shown in Table 2. In all data sets, neuroprotective compounds and decoys were, respectively, marked as "1" and "0." Prior to the calculation of molecular descriptors, all compounds required the addition of hydrogen atoms, deprotonation of strong acids, protonation of strong bases, generation of a valid 3D conformation, and energy minimization. Detailed information on the training sets and test sets is presented in Tables S1-S4.

Molecular
Descriptors. Molecular descriptors are the basis for the combination of machine learning [25]. Therefore, we used Discovery Studio 2016 (DS 2016) [26] and MOE 2014.9 software [27] to calculate three sets of twodimensional (2D) descriptors to describe each compound. A total of 256 descriptors calculated by DS 2016 were constituted as the first descriptor set. In addition, the second descriptor set was composed of 185 descriptors calculated

Molecular Descriptors Selection.
In this study, Pearson correlation analysis was performed to identify descriptors that were highly correlated with activity [28]. Firstly, descriptors in which values appeared in high frequency of more than 50% were eliminated. Secondly, descriptors in which correlation coefficients had an activity of less than 0.1 were excluded [29]. If the absolute value of the correlation coefficients between two descriptors was higher than 0.9, the descriptor possessing a lower correlation coefficient with activity was deleted. Finally, the remaining descriptors were used for building models. Detailed information on the descriptors is presented in Table S5.

Methods for Model Building.
We followed the methods of Fang et al. (2016) [6]. In this study, five different machine learning methods, Adaboost (AB), nearest neighbors ( NN), classification tree (CT), random forest (RF), and naïve Bayesian (NB), were used for computational processes. The construction of models was performed using Orange Canvas 3.4.1 [30] (AB, NN, CT, and RF) and DS 2016 (NB). In addition, four models (AB, NN, CT, and RF) exported two probabilities (positive and negative probability) as well as estimated target values (1 or 0).

Single Classifier Model
(1) Adaboost. Adaboost (AB) is an iterative algorithm that was designed to get the weighted sum of classifiers by using the lifting method [31,32]. It tends to tweak the subsequent weak learners in favor of reducing misclassification caused by previous classifiers [33]. Weight coefficients of the N samples in the algorithm were distributed to the same value initialized to 1/N. Next, this subsection was used to train the classifier and calculate the weights of the misclassified samples as the weighted error rate.
(2) Nearest Neighbors. The k nearest neighbors (kNN) algorithm is a nonparametric learning method for classification and is regression-based on the closest training sample in the feature space [34,35]. The feature selection, the number of nearest neighbors K, and the shape of the distance weighting function determine the performance of the K-NN model. In this method, each molecule is removed from the training set, and the activity value is predicted to have no inverse distance weighted average activity of the most similar molecules. In this study, K was optimized (K=1-10).
(3) Classification Tree. Classification tree (CT), a method commonly used in data mining, is designed to illustrate the structure of any particular field [36]. The C4.5 tree in Orange is designed to build classification trees from a set of training data by splitting criterion called normalized information gain [37,38]. The attribute with the highest normalized information gain is selected to make the decision [6]. The parameters in Orange were adopted using the default settings.
(4) Random Forest. Random forest (RF) is an ensemble learning method that stacks multiple decision trees to produce consensus predictions for each tree [39,40]. RF randomly divides the data in the training set to build individual trees. The arbitrary node of the trees is drawn from the best subset of total descriptors and is selected. Random decision forests correct for the habit of decision trees of overfitting their training set.

Stacked Naïve Bayesian Classification Model.
To improve the prediction reliability of a single model, consistent scoring and data fusion are beneficial. In general, stacked models reduce unreliable predicted noise hidden by single classification model [6,41]. In this study, the probability values calculated by four models were quoted as new descriptors, and the prediction results were integrated with NB models. Stacked-NB classification (s-NB) models were constructed and validated by DS 2016 [29]. The learning process generated a large set of Boolean features from the input descriptors. The weights calculated for each feature using a Laplacian-adjusted probability estimate were summed to provide a probability estimate, which was a relative predictor for the possibility of that sample being from the good subset [42,43].

Performance Parameters Applied for Model Evaluation.
We followed the methods of Fang et al. (2016) [6]. In this study, the quality of the models was assessed by 5- Furthermore, s-NB models measure the accuracy of prediction by calculating the receiver operating characteristic (ROC) and area under the curve (AUC) of the training set and the test set. The ROC curve is a comprehensive indicator that reflects the model sensitivity and specific continuous variables [44]. The ROC curve is based on the true positive rate as the ordinate and the false positive rate as the abscissa.   [6]. The MTT assay was used to assess cell survival. SH-SY5Y cells (5 x 10 3 cells/well) were seeded in 96-well plates in 100 L of medium per well and were cultured for 20 hours. When the cell density was roughly 80%, cells were treated with medium containing different concentrations of test compounds for 2 hours; then 200 M H 2 O 2 or 8 mM Na 2 S 2 O 4 was separately added and cells were incubated for 22 hours. MTT reagent (final concentration 1.0 mg/ml) was added 2 hours before the end of the incubation. After 3 hours of incubation, the medium was replaced with 100 L of dimethyl sulfoxide (DMSO) and the absorption of the solubilized formazan was measured at 570 nm using a microplate reader (Spectra Max M5, Molecular Devices, USA). Cell survival was normalized to the control group, for which the cell survival was set to 100%.

Statistical
Methods. Data are presented as the mean ± standard deviation. Statistical analysis was performed using the SPSS statistical package (Version 16.0; SPSS, Chicago, IL, USA) program, and the significance of each group was verified with one-way analysis of variance (ANOVA), followed by a Tukey's multiple comparison post hoc test. P < 0.05 was considered significant.

Performance of Classification Models.
In this study, all single classification models were constructed based on AB, K-NN, CT, and RF algorithms and three descriptor sets. Subsequently, 5-fold cross-validation and test set validation were used to further evaluate the predictive power, as shown in Table 3. Among 12 NIN models, the RF algorithm had a better performance with an average MCC value of 0.725±0.014 and 0.774±0.042 from 5-fold cross validation and test set, respectively. The best single classifier was RF-c1, which was developed by random forest using DS MOE 2D descriptors. Regarding the 12 NHN models, the best performance was also achieved by the RF algorithm, with an average MCC value of 0.857±0.026 and 0.715±0.043 from 5-fold cross-validation and test set, respectively. The best single classifier was RF-c2, which was developed by RF stacked with a DS MOE 2D descriptor set.
The RF algorithm was more applicable than other algorithms for the classification of compounds and the prediction of neuroprotective compounds against hypoxic injury and oxidative damage. The average MCC values of each algorithm were calculated with three descriptor combinations (detailed information is presented in Figure S1). The results showed that improving the diversity of descriptors increased the predictability of the models. Therefore, classification models constructed by the DS MOE 2D descriptor combination were selected for further studies.
Based on the results presented above, probability values calculated by four models were integrated into a stacked Bayesian model. Molecular fingerprint descriptors were also integrated into s-NB classification models as replenishment descriptors (detailed information is presented in Table S6). To further compare the predictive performance of the single classification models and the s-NB classification models, the MCC values and AUC values were calculated via a ROC plot. As shown in Figure 3, AUC values of the stacked Bayesian model s-NB-1 were 0.932 and 0.913 for the training set and the test set, respectively, and were higher than the corresponding AUC value calculated before replenishing fingerprint descriptors. For the NHN model, the verification obtained the same result. As shown in Figure 4, s-NB-1-LPFP6 (MCC = 0.993) revealed superior predictive performance for the training set when compared with four single models (MCC ranged from 0.737 to 0.830); the same results were obtained by verification of the test set. Similarly, the MCC values of the s-NB-2-LPFP6 model were 0.968 and 0874 for the training set and the test set, respectively, and were higher than the values of the corresponding single model MCC.

Virtual Screening of Neuroprotective Agents from XXMD.
In this study, we aimed to preliminarily clarify the functional mechanisms of XXMD and to identify potential neuroprotective agents against I/R injury; therefore, virtual screening of the compositions in the XXMD database was performed based on the two optimal classification models (s-NB-1-LPFP6 and s-NB-2-LPFP6). In total, 658 compounds were predicted to be active against ischemia/hypoxia-induced neurotoxicity. Similarly, 615 compounds were predicted to be active against H 2 O 2 -induced neurotoxicity by s-NB-2-LPFP6. A total of 398 compounds were ranked by Bayesian scoring EstPGood (0 ≤ EstPGood ≤ 1) (detailed information is presented in Table S7).    Figure 5 shows that each Chinese medicinal herb in XXMD contained antihypoxia and anti-H 2 O 2 damage compounds. Most were predicted to exert antihypoxia and anti-H 2 O 2 effects. It could be critical for XXMD to play a therapeutic role in I/R-induced brain injury. Flavonoid glycosides received a higher score in the antihypoxia phenotype, such as narcissoside, rutin, lsoquercitrin, hirsutrin, quercetin derivatives, and kaempferol derivatives. It has been suggested that flavonoid glycoside compounds were the major components in XXMD which acted against hypoxic phenotype in I/Rinduced brain injury. Similarly, for the NHN model, alkaloids and sterol compounds in baikal skullcap root and ginseng displayed predictive activity in the anti-H 2 O 2 phenotypes, such as pancratistatin, menisarine, fangchinoline, normenisarine, and other sterols.
In addition, 398 compounds were clustered into 5 groups by FCFP 6 fingerprint with the cluster ligands module in DS 2016. Clustering is based on the root-mean-square (RMS) difference of the Tanimoto distance for fingerprinting. For each cluster, scaffold novelty as well as probability output was considered. Finally, 10 compounds (Table 4) were identified from the XXMD database for cell-based neuroprotective assays.
In addition, to estimate the model's ability to extrapolate, the Dice similarity between 10 compounds and two predefined sets of compounds against hypoxia-induced and H 2 O 2 -induced injury were calculated by generating Morgan fingerprint using RDKit package in Python. As shown in Table 4, most of the 10 compounds are structurally similar to the representative compounds of the two predefined sets. The results also verify the prediction reliability of the two optimal classification models 3.3. Cell-Based Neuroprotective Assay Results. Cell-based neuroprotective assay results are presented in Table S8. Most of compounds showed a good dose-response relationship at different concentrations. Figure 6

Discussion
In this study, compounds in XXMD were analyzed by stacked naïve Bayesian models. The results demonstrated that the existence of the 12 TCMs contained in XXMD was valuable. The synergistic effects in TCM prescriptions could be considered as the material basis for the therapeutic efficacy. Preliminary assay results suggested that a machine learning algorithm, such as combination of Bayesian models, may be feasible to predict neuroprotective compounds and preliminarily demonstrated the pharmacological mechanisms of TCM. The results suggest broadening the selection of stroke treatment methods and further demonstrated the feasibility of applying computer-assisted drug discovery to the analysis of TCM prescriptions.
Through in silico prediction studies, in this study, baicalein was identified as a potent neuroprotective agent, with effects on hypoxic and oxidative damage phenotypes. As a natural phenolic flavonoid compound, baicalein has attracted increased attention for antioxidant and antiinflammatory efficacy [44]. However, studies on the efficacy of baicalein against cerebral I/R injury are limited. The validation of baicalein against two damage phenotypes supports its potential usage in ischemic stroke therapy. Moreover, prim-O-glucosylcimifugin (POG) has the highest content of chromone of Saposhnikovia divaricate (Turcz) Schischk (Fangfeng). However, no reports are available on the antioxidation effect of POG or its therapeutic effect on I/R-induced brain injury. In our study, we first identified the potential activity of POG against oxidant damage and hypoxia injury. These findings were subsequently verified by cell-based neuroprotective assays, which indicated that the established virtual screening pipeline could identify novel neuroprotective agents in TCM.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Additional Points
Machine learning models were used to demonstrate the pharmacological mechanisms of classic traditional Chinese medicine prescriptions. Single classifiers and molecular fingerprint descriptors were integrated to improve the predictive power of naïve Bayesian models. The stacked naïve Bayesian models were applied to predict neuroprotective compounds for ischemic stroke. Two compounds gained using above protocols were confirmed by cell-based assays.

Conflicts of Interest
The authors declare that there are no conflicts of interest.