Prescription Function Prediction Using Topic Model and Multilabel Classifiers

Determining a prescription's function is one of the challenging problems in Traditional Chinese Medicine (TCM). In past decades, TCM has been widely researched through various methods in computer science, but none concentrates on the prediction method for a new prescription's function. In this study, two methods are presented concerning this issue. The first method is based on a novel supervised topic model named Label-Prescription-Herb (LPH), which incorporates herb-herb compatibility rules into learning process. The second method is based on multilabel classifiers built by TFIDF features and herbal attribute features. Experiments undertaken reveal that both methods perform well, but the multilabel classifiers slightly outperform LPH-based method. The prediction results can provide valuable information for new prescription discovery before clinical test.


Introduction
Traditional Chinese Medicine (TCM) is a unique medical knowledge system in China and has become a popular complementary treatment in Western countries. Currently there are 100,000 formulae based on the continuous clinical records. A formula is a prescription that is validated by pharmacology and clinics. Researchers have made great efforts to study and utilize those formulae to discover new prescriptions hidden in the formulae data [1]. To discover a new prescription for disease treatment, researchers have to analyze the efficiency of related herbs and collect several herbs with proper proportion according to TCM theory. Then, the function of a new prescription has to be proved through repeated clinical tests, which would require a large amount of manpower and material resources. Actually, if a new prescription's function can be prepredicted by computer science technology, the results would provide valuable reference for the following clinical practices.
It has been found that data mining approaches play critical roles in TCM related topics, such as new drug discovery [1], syndrome differentiation [2][3][4], herbal combinational rule mining [5,6], symptom name normalization [7], intelligent diagnosis [8], and treatment pattern mining [9]. Most of the previous research was related to relationship mining, such as herb-symptom relationships [8,10,11] and herb-herb relationships [6]. Wang et al. [6] created a herbal network to present the herb-herb correlation. Chen et al. [8] detected the patterns between herbs and symptoms by using tripartite information network. Recently, more and more researchers have adopted topic models to mine the correlation between TCM objects. Lin et al. [10] proposed a symptom-herb-therapies-diagnosis topic model to diagnose the disease and administer appropriate drugs and treatments given a patient's symptoms. Zhang et al. [4] proposed a Symptom-Herb-Diagnosis Topic (SHDT) model to extract multiple relationships among symptoms, herb combinations, and diagnoses from large-scale CM clinical data. The proposed model was useful in discovering the common TCM diagnosis and treatment patterns. Jiang et al. [11] applied Linked LDA to extract the herb-symptom patterns. Yao et al. [9] employed Labeled LDA (Labeled Latent Dirichlet Allocation) to mine treatment patterns in TCM clinical cases, but the mining result was not satisfactory. Unlike these studies, we concentrate on the prescription function prediction through topic detection and incorporate compatibility rule mining into the topic model.
In TCM theory, a prescription's function can be affected mainly by the following factors: the attributes of herbs, the compatibility rules of paired herbs, and the dosages. Based on this, we present two methods to predict a prescription's function. The first method is based on topic modeling. A novel topic model named LPH (Label-Prescription-Herb) is proposed to incorporate the results of compatibility rule mining into learning process. It can automatically learn the posterior distribution of each herb in a prescription conditioned on the prescription's label set (function set). The second method is based on feature extraction and multilabel classifiers. We extract -dimensional feature vector space for each prescription concerning their herbal attributes and TFIDF (Term Frequency-Inverse Document Frequency) Features and then employ several popular and competitive classifiers to validate our method.
The rest of paper is organized as follows. Section 2 presents the detailed steps of our methods for prescription function prediction. Section 3 provides analyses and discussion of our experimental results. Finally, some conclusions and future works are provided in Section 4.

Methods
The framework of our methods is shown in Figure 1, with details presented in following subsections.
The herb dataset and formula dataset are extracted from our project CKCEST (http://zcy.ckcest.cn/tcm/) (Chinese Knowledge Center for Engineering Science and Technology). In the first method, we conduct compatibility rule mining from the formula dataset and then incorporate the results into the learning process of topic modeling. The objective of topic modeling is to learn the "topic-word" (function-herb) structure with supervision. The prescription's most likely labels can then be inferred by thresholding its posterior probability over function labels. In the second method, we treat our prediction task as a multiclass, multilabel classification problem. We extract feature space based on TFIDF weighting and herbal attributes and then train the multilabel classification model by using the features.

Prediction Based on Topic Model.
In this section, we propose a supervised topic model named Label-Prescription-Herb (LPH) to mine treatment patterns in the herbs of the formula dataset. Although a prescription consists of two or more individual herbs, some of them act as pairs in the treatment. In this subsection we introduce the method to mine the compatibility rules.

Compatibility Rule Mining.
In TCM theory, compatibility refers to the combination of two or more herbs based on the clinical settings and the properties of herbs [12]. The efficiency of a single herb is usually limited, but when two herbs are used together, their interaction should display their superiority over a single herb in the treatment of diseases; we say that these two herbs have compatibility rule. In China, many herbs have intensive compatibility rule that have been learned from ancient times to the modern period. However, the existing 917 herb pairs in Chinese Paired Herb Database are inadequate for our prediction task. Thus, computer intelligence can be employed to discover more pairs for further research. When two herbs are frequently used in combination with each other, they are more likely to be paired drugs. We propose a method based on support degree [13] and dependency relationship for compatibility rule mining between herb ℎ and herb ℎ , which is consists of the following steps: Step 1. support = (ℎ , ℎ ) . (1) Step 2.
Evidence-Based Complementary and Alternative Medicine 3 Step 3.
Step 4. Rank all possible herb pairs according to their associated value of Cor.
Here support denotes the joint probability of occurrence of two herbs ℎ and ℎ . In Step 3, we combine the support attribute ( (ℎ 1 , ℎ 2 )) and the dependency attribute (the ratio of (ℎ 1 , ℎ 2 ) to (ℎ 1 ) (ℎ 2 )). Note that we remove Glycyrrhizae Radix from the mining results, since it is useless to analyze compatibility rule between Glycyrrhizae Radix and other herbs. The use of this herb is merely in decreasing or moderating medicinal side-effects of all herbs in a prescription.

Topic Model Description on TCM.
LDA (Latent Dirichlet Allocation) is a completely unsupervised method that models each document as a mixture of topics [14]. The model outputs a discrete probability distribution over words for each topic and a discrete distribution over topics for each document. However, LDA is not appropriate for multilabeled corpora because it generates automatic summaries of topics that have no direct correspondence with the label set. A simple solution to this problem is to assign a document's words to its labels rather than to a latent and possibly less interpretable semantic space. At present there exists some related research, such as Labeled LDA [15] and partially Labeled LDA [16].
Analogous to the relationship among documents, topics, and words, we can treat herbs as "words." A prescription (formula) is a bag of herbs, and we can treat it as a structured "document." Correspondingly, a prescription's function can be considered as a "topic." Thus, we employ topic models to mine the latent relationship between function labels and herbs. The topic model for our prediction task should incorporate supervision by constraining the model to use only those "topics" that correspond to a prescription's label set. Since the combination of herbs contributes a factor to the function prediction, we consider the role of herb pairs in the topic learning process.
We define some notations. Let each prescription be represented by a tuple consisting of a list of herbs, H ( ) = {ℎ 1 , ℎ 2 , . . . , ℎ } and a list of binary topic presence/absence Here is the prescription length, is the total number of herbs extracted from formula dataset and is the total number of function labels. We set the number of functions in our model to be the number of unique labels .

LPH Model.
To incorporate compatibility rules into the topic model, we introduce variable to indicate whether herb ℎ has compatibility rule with herb ℎ . If = 1, then ℎ and ℎ are paired herbs; otherwise, they are generated from the distribution associated with their function label. The graphical model of LPH model is shown in Figure 2. In Figure 2, is a vector consisting of the parameters of multinomial distribution corresponding to the th function label. is the prior parameter for variable . are the parameters of the Dirichlet topic prior and are the parameters of the herb prior, while Φ is the label prior for function . The generative process for LPH model is given as follows: (1) For each function ∈ [1, . . . , ], generate from a Dirichlet distribution with prior parameter , that is, ∼ Dir( ).
(3) For each herb ℎ , ∈ {1, . . . , }: During step (2)(b), label projection matrix L is used to project the Dirichlet prior vector → = { 1 , . . . , } into a lower dimension → ( ) . For instance, suppose = 6 and that a prescription has labels given by Λ ( ) = (0, 0, 0, 1, 1, 0) which implies L would be The th row of L has an entry of 1 in column if and only if the th label in prescription is equal to the function and 0 otherwise. Then, function mixture is drawn from a Dirichlet distribution with parameters → ( ) = L × → = ( 4 , 5 ) . During step (3)(a), when the parameter for the herb ℎ is observed from the compatibility rule mining results, the prior parameter is separated from the rest of the models. Analogous to Labeled LDA, for prescription , we restrict to be defined over topics corresponding to its prior labels Λ ( ) . This restriction ensures that all the topic assignments are limited to the prescription's labels.

Learning and Inference.
The exact inference for LPH is intractable, thus several approximate schemes have been proposed to infer the model. We use collapsed Gibbs sampling [17] to estimate the probability of a function label assigned to the herb ℎ in a prescription. We first choose initial states for the Markov chain randomly; then we calculate the conditional distribution ( = | f − ) and ( ( , ) = | f − ,− ) as follows, where f − denotes all herbs' function label assignments excluding ℎ ; f − ,− denotes all herbs' function label assignments excluding ℎ and ℎ .
In (5), ℎ − , is the count of herb ℎ in function , (⋅) − , is the total number of herbs assigned to function , − , is the number of times herbs in prescription are assigned to function , and − ,⋅ is the number of herbs in . All counts exclude the current assignment. In (6), all counts do not include the current two cases ℎ and ℎ . Note that once a herb pair (ℎ , ℎ ) is assigned to the function , the two herbs ℎ and ℎ will be assigned to the topic simultaneously.
After Gibbs sampling iterations, we estimate the functionherb multinomial distribution and the prescription function mixture as follows: If = 1, then

Function Prediction.
During multilabel prediction, inferring the best set of labels for an unlabeled prescription at test time is more complex: it involves assessing all function label assignments and returning the assignment that has the highest posterior probability. However, the issue is not so simple, since there are 2 possible function label assignments. For the purpose of this paper, we infer the conditional probability of function labels (topics) given a new prescription by using Bayes rules (see (9)). The prescription's most probable labels can then be inferred by suitably thresholding its posterior probability over function labels. Suppose a new prescription consists of a set of herbs H ( ) = {ℎ 1 , ℎ 2 , . . . , ℎ }, then ( | H ( ) ) is calculated as follows: To simplify calculation, ( ) can be treated as a constant and ( | H ( ) ) can be calculated as follows:

Feature Extraction.
In this section, we adopt the TFIDF method and herbal attributes to extract a prescription's features.

TFIDF Features.
TFIDF is often used as a weighting factor in information retrieval and text mining. In TCM, some herbs appear frequently to tend to have little influence on a prescription's function, such as Glycyrrhizae Radix. In this work, we employ TFIDF to reflect the importance of a herb for a prescription in a collection. A prescription is treated as a "document," and the corresponding herbs are treated as "terms." So, we denote TF(ℎ ) = (ℎ ), which is the Evidence-Based Complementary and Alternative Medicine 5 frequency of ℎ and define IDF(ℎ ) = log( / (ℎ )), where is the number of prescriptions; (ℎ ) = |{ : ℎ ∈ }| is the number of prescriptions containing the herb ℎ . Then, the TFIDF feature for the herb ℎ can be denoted as follows: ) .
Based on this, we use the TFIDF features to represent a prescription: where = TFIDF(ℎ ) if the prescription contains herb ℎ , otherwise 0. is the total number of unique herbs. However, a prescription contains no information about the number of occurrences for each herb. Thus, we cannot calculate (ℎ ) this way. To solve this problem, we set the herb's dosage as its initial weight. The dosage information can reflect the importance of a herb in a prescription but should be standardized before our task, since different herbs have different usual dosages. For instance, the usual dosage for Pseudoginseng is 3 g ∼ 9 g, while that of Dioscoreae Rhizoma is 15 g ∼ 30 g. So, the dosage of herbs in a prescription may not be directly comparable. For a prescription, we first standardize each herb's dosage before the TFIDF weighting phase by the following rule: where is the actual dosage of herb ℎ in a prescription, max is its maximum usual dosage, and min is the minimum usual dosage. Table 1 shows an example of dosage standardization on prescription "Ma Huang Tang." The standardized dosage keeps the order of original data; that is, if a herb has higher dose in prescription than in prescription , it remains in the same order after standardization. Then, (ℎ ) can be calculated as

Attribute
Features. The attributes of each herb, named "channel tropism," "nature & flavor," and "efficiency," are described with certain terms. For instance, "nature" refers to the temperature characteristics of the herb, such as "cold," "hot," and "warm." "Flavor" refers to the taste property of the herb, such as "sour," "bitter," and "sweet." For each prescription, we sort the herbs according to its (ℎ ) and select top two herbs to represent the prescription.  For the herb ℎ , we collect 9 attributes in "nature & flavor," 12 attributes in "channel tropism," and 46 attributes in "efficiency." Then, the attribute feature vector for a prescription can be denoted as If a herb contains feature , the corresponding V is 1, otherwise 0. Some specific attributes, such as "slightly bitter" and "slightly hot," are quantified as 0.5. We consider our prediction task as a multilabel classification problem: given a training set consisting of prescriptions with multiple function labels, predict the set of labels appropriate for each prescription in the test set. Based on the above features, several multiple one-vs-rest classifiers are trained to test our method. These classifiers are SVM (Support Vector Machine), Adaboost, and Bayes Network, which are popular and extremely competitive baselines used by most previous papers [18].

Setup.
In compatibility rule mining step, our method returned top-N herb pairs according to their associated Cor value, which was used to decide the parameter during the process of topic modeling. The parameters and in (3) were both set to 0.5 through repeated experiments.
In topic modeling-based method, we set the number of topics to be the number of function labels, which were 6 Evidence-Based Complementary and Alternative Medicine set to 20. The number of unique herbs extracted from 3055 formulae was 972. Moreover, we set the hyperparameters = 50/ and = 0.1 and the iteration number = 500.
In multilabel classifier-based method, we combined the TFIDF feature space and attribute features to represent a formula. The dimension for TFIDF feature space → p was set to 972, the number of unique herbs. The dimension for attribute features → V was 134. Then, the resulting feature vector of each formula was 1106. We adopted several classifiers (SVM, Adaboost, and Bayes Network) using 4-fold cross validation on 3055 formulae.
We designed five experiments to conduct our prediction task: The label was returned when it satisfied the following condition: where was the threshold. For experiments (c)∼(e), these feature vectors were generated and used as inputs to classifiers. We tuned the SVMs' shared cost parameter C (=10). The "TFIDF + attributes" features were denoted as → p ∪ → V. The prediction was considered as a 20-class, multilabel classification problem. Each test was performed 10 times to obtain the average performance. We scored each method based on Precision, Recall, and Micro-F1 as our evaluation measures. These measures were defined as follows: The total number of correct labels predicted by a method The total number of labels predicted by a method , The total number of correct labels predicted by a method The total number of real labels ,

Compatibility Rule Mining.
We use Precision@N metric to evaluate the effectiveness of our method and then determine the number of returned herb pairs. Precision@N is the ratio of correct pairs to the N returned pairs. The returned pairs are assumed to be correct when they have compatibility rule according to expert's instructions. The experimental results are shown in Table 4. Based on the above results, when the number of returned pairs is more than 1500, the correct sample does not show an obvious increase. Thus, top 1500 herb pairs are returned in our experiment. The mining results are visualized in Figure 3. Each vertex in the graph represents a herb. An edge is drawn between a pair of herbs if they have compatibility rule. As shown in Figure 3, one herb can have compatibility rule with several other herbs. For instance, Ginseng Radix can be combined with Atractylodis Macrocephalae Rhizoma, Zingiberis Rhizoma, Dioscoreae Rhizoma, Angelicae Sinensis Radix, or Cervi Cornu Pantotrichum to promote different treatment effects. It is clear that utilizing powerful computers and efficient algorithms can mine latent compatibility rules, which would be useful for TCM practitioners for further study. Tables 5 and 6 show the 4 topics detected by LPH model, Table 7 shows the 2 topics detected by Labeled LDA model. Each topic contains top 20 herbs. As shown in Tables 5 and 6, we notice that most of the top 20 herbs have related functions corresponding to the Evidence-Based Complementary and Alternative Medicine 7  topic, but several detected herbs do not have corresponding function, such as Plantaginis Semen in "cleaning heat" topic and Glycyrrhizae Radix in "relieving uneasiness of mind" topic. Although Plantaginis Semen has low posterior probability and does not have direct correspondence to the topic, the herb is an important component in some prescriptions having the corresponding function. Glycyrrhizae Radix can be detected in most of topics, since it is frequently used in many formulae to regulate actions of all other herbs. It has to be noted that Glycyrrhizae Radix is removed from the 8 Evidence-Based Complementary and Alternative Medicine    In other topics, we can find similar results as well. Most of the herbs (marked by the rectangle) that do not have intensive correlation with the topic have low probability. A pair of herbs tend to indicate more intensive correlation with the corresponding topics than a single herb, such as Ginseng Radix and Atractylodis Macrocephalae Rhizoma from "relieving uneasiness of mind" topic and Atractylodis Macrocephalae Rhizoma and Angelicae Sinensis Radix from "dispelling internal cold" topic. Therapeutic effects can be promoted by the coordination of two herbs. In addition, many individual herbs are inactive in the corresponding topic but become active in combination with other herbs, such as Paeoniae Radix Alba and Szechwan Lovage Rhizome from "dispelling internal cold" topic. However, Labeled LDA cannot discover combinations of effective interacting herbs (see Table 7).

Function Prediction.
In employing the LPH model to solve the multilabel classification problem, we should determine the threshold in (15). However, there is no theoretical basis to automatically choose an optimal threshold. In this study, we provide the experimental results using different thresholds (see Table 8). Table 9 shows the classification performance. Comparing the above two methods, multilabel classifiers perform slightly better than topic model-based methods. As shown in Table 8, the value of threshold has a strong influence on the classification results. We can take = 1 − 8 as an optimal value to achieve optimal prediction power. LPH substantially outperforms Labeled LDA on Micro-F1 with the optimal .
The results demonstrate that incorporating compatibility rule into topic model can promote prediction accuracy. The recall on both two models are not satisfactory, as the posterior probability can highlight the most probable function labels but neglect others.
From Table 9, we notice that when using TFIDF features only, the performance is not good. The predictive ability based on herbal attributes is better than TFIDF features. This indicates that "channel tropism," "nature & flavor," and "efficiency" are valuable information for function prediction, which is consistent with TCM theory. The combination of the features outperforms individual feature space. SVM produces the highest Micro-F1 on the "TFIDF + attributes" feature space compared with other classifiers.

Discussion.
From the compatibility rule mining results, we can see that our method can effectively discover herb pairs with combinational rules. The method is not meant to perfectly model TCM reality, but to function as a tool for TCM practitioners. Also, it can indicate herbs that are likely to be used together for special therapeutic effects and allow researchers to make attempts at further study.
From the topic discovery results, we can see that it is feasible to employ the supervised topic model to predict the function of a new prescription. The idea of incorporating compatibility rules into the process of topic modeling promotes the accuracy of our task. The results are more satisfactory than Labeled LDA because the efficiency of a pair of herbs is more explicit than a single herb, which contributes to the function prediction on a new prescription.
The two proposed kinds of methods can provide valuable information for new prescription discovery before clinical test procedures [16], but each has its advantages. The method based on multilabel classifiers contains complicated and trivial steps in feature extraction, such as dosage standardization and attributes quantification, while the LPH topic model cannot choose the optimal threshold automatically. Although we may improve the function prediction performance by using SVM classifier and LPH model, the results are not very satisfactory. It is possible to combine these two methods to promote prediction accuracy in our future work.

Conclusions
This paper has presented two methods for prescription function prediction. In the first method, we employ a novel supervised topic model named LPH to calculate the prescription's mostly likely function labels. In the second method, we extract feature space based on TFIDF weighting and herbal attributes and use these features to build multilabel classifiers. Results on real world datasets show the effectiveness of our methods. The results can provide valuable information for new prescription discovery.
When doctors write a prescription for the patient, they should obey the principal named "Jun," "Chen," "Zuo," "Shi", which plays a significant role in determining a prescription's function. In the future, we plan to analyze the components of a prescription based on its herbal attributes and dosage information. In other words, the herbs in a prescription may possibly be clustered into four classes by data mining algorithms. The results may further improve the accuracy of our prediction task.

Conflicts of Interest
The authors declare that they have no conflicts of interest.