The Common Prescription Patterns Based on the Hierarchical Clustering of Herb-Pairs Efficacies

Prescription patterns are rules or regularities used to generate, recognize, or judge a prescription. Most of existing studies focused on the specific prescription patterns for diverse diseases or syndromes, while little attention was paid to the common patterns, which reflect the global view of the regularities of prescriptions. In this paper, we designed a method CPPM to find the common prescription patterns. The CPPM is based on the hierarchical clustering of herb-pair efficacies (HPEs). Firstly, HPEs were hierarchically clustered; secondly, the individual herbs are labeled by the HPEC (the clusters of HPEs); and then the prescription patterns were extracted from the combinations of HPEC; finally the common patterns are recognized statistically. The results showed that HPEs have hierarchical clustering structure. When the clustering level is 2 and the HPEs were classified into two clusters, the common prescription patterns are obvious. Among 332 candidate prescriptions, 319 prescriptions follow the common patterns. The description of the patterns is that if a prescription contains the herbs of the cluster (C 1), it is very likely to have other herbs of another cluster (C 2); while a prescription has the herbs of C 2, it may have no herbs of C 1. Finally, we discussed that the common patterns are mathematically coincident with the Blood-Qi theory.


Introduction
The processes of diagnosing syndrome and prescribing prescriptions in TCM (traditional Chinese medicine) are empirical. It is essential to minimize the uncertainty caused by human factors by finding the unchangeable TCM patterns, syndrome pattern, and prescriptions pattern.
As far as the first kind of TCM patterns, syndrome pattern, is concerned, several syndrome patterns had been proposed, such as SEM (structure equation modeling) to explore the diagnosis of the suboptimal health status [1] and some syndrome diagnostic models for chronic gastritis [2] or for tuberculosis [3].
Here we studied the second kind of TCM patterns, prescription patterns, which are rules or regularities used to generate, recognize, or judge prescriptions. In TCM, the herbs in one prescription are not organized randomly, but according to a set of principles for the therapeutic purposes of mutual enhancement, mutual assistance, mutual restraint, or mutual antagonism [4]. The prescription patterns would reflect some of the principles in a formalized way.
Before discussing the prescription patterns in further detail, the TCM data relationships used in this paper were described firstly. There are three forms of TCM data, individual herbs (herbs), herb-pairs, and prescriptions. An herbpair is composed of two herbs for the purposes of mutual enhancement, mutual assistance, mutual restraint, or mutual antagonism [5]; a prescription is composed of several herbs. Each herb has some TCM defined properties, including five fundamental natures (cold, cool, neutral, etc.), seven flavors (sour, bitter, sweet, etc.), and twelve meridians (liver, heart, spleen, etc.), and these properties have been standardized by the Chinese government in 2005 [6] and in 2010 [7]. Each herb-pair is associated with some herb-pairs efficacies (HPEs) which are also a kind of TCM defined properties.
The existing prescription patterns vary from methods to methods. By clustering algorithms, the patterns are in the form of specific groups of herbs for stroke [8] or for gout and hyperuricemia [9], in the form of latent tree for "disharmony between liver and spleen" which is a TCM defined symptom [10], in the form of several flat groups of herbs [4,11,12] with different treatment functions. By genetic algorithms, 2 Evidence-Based Complementary and Alternative Medicine the patterns are core groups of herbs for lung cancer [12]. By factor analysis, the patterns are 7 groups of herbs for insomnia [13]. By association analysis, the patterns are the combinations of the herbal properties varied with different treatment purposes [14] and some herb combinations for psoriasis vulgaris [15]. Except for the patterns composed of properties of individual herbs [14], most of prescription patterns are core herbs for a certain disease or syndrome.
Up to now, too much attention has been paid to the specific patterns for diverse diseases or syndromes, while little attention is paid to the common patterns of all prescriptions. A particular pattern is suitable to generate or recognize a specific prescription for a certain disease, while the common patterns are also important when judging the feasibility of a prescription at a high overall level. Common features of complex systems are ubiquitous, such as small world [16] and scale-free [17] in social, biological, and information systems [18]. Usually, the knowledge of a few entities of a complex system does not straightforwardly lead to a description of the overall system [19]. The prescription of TCM as a complex system and the specific pattern of a single disease cannot reflect the global view of the regularities of prescriptions. So it is necessary to explore the common prescription patterns.
To explore the common prescription patterns, there are some questions: are prescriptions characterized by any common features? Is it possible to extract some mathematical expressions of the common features for all prescriptions? We discussed the possibility of solving the problems from two aspects.
(1) Methodology. The prescriptions are one kind of complex systems. In fact, most of complex systems have nested hierarchical structure; that is, the elements of the system can be partitioned into cluster which in turn can be partitioned into subclusters and so on up to a certain level. In biological taxonomy, individuals are grouped into species, species into genera, genera into families, and so on. Hierarchical clustering structure reflects both difference and common feature of the complex systems. The clustering at the top of the hierarchical tree is 1-clustering where all elements are in one cluster; the clustering at the bottom is element-clustering where every element is in different clusters. The higher clustering reflects commonality and the lower clustering reflects diversity. If the prescription-related properties show the hierarchical clustering structure, the herbal relations in the prescriptions would be represented by hierarchical clustering structure. Consequently, the commonality of the prescription patterns could be explored by higher clustering. The hierarchical clustering methods bring us hierarchical representations, rather than flat ones [20].
(2) Data Foundation. More than 100,000 herbal prescription records were accumulated from 180 BC to 1904 [21]. There are plenty of data to support statistical analysis on the prescription patterns. On the other hand, the prescription patterns would be extracted from the latent relations in the herb-pair data. The herb-pair based method is one of six methods of selecting herbs for a prescription [22], where several herb-pairs are selected to form a prescription [23].
So there must be some relationships between herb-pairs and prescriptions.
In this paper, we dug the common prescription patterns from the herb-pair data by hierarchical clustering methods and summarized a mathematic expression of the common patterns.

Data
Description. An herb-pair is composed of two herbs and provides some synergistic efficacy in vivo. In this study, 697 herb-pairs were directly collected from two reputable TCM literatures [24,25]. One literature [25] had been printed four times and used as the data source in the research on herb-pairs [5]; another book [24] was supported by Shandong Academy of Chinese Medicine, Shandong University of Traditional Chinese Medicine, and its affiliated hospitals, and it is a reference book for the TCM graduate students. The data in both books were the TCM clinical records and every herbpair and its efficacy were recorded in explicit form. Thus, 697 herb-pairs with 376 herbs and 32 HPEs were collected, and all HPEs were listed in Table 1.
332 prescriptions were collected from two popular books: Treatise on Febrile Diseases [26] and a collection [21]. The first book is a classical ancient book in TCM. The latter one is a popular current textbook in TCM High Education School, which was one of the series textbooks supported by State Administration of TCM of People's Republic of China. The prescriptions are frequently practiced and have been verified in the clinical TCM practices.
The symbols used in this paper are as follows. Let = {ℎ 1 , ℎ 2 , ℎ 3 ⋅ ⋅ ⋅ } be the set of individual herbs. Let = { 1 , 2 , 3 ⋅ ⋅ ⋅ 32 } be the set of HPEs. An herb-pair of ℎ and ℎ with efficacy is denoted by a triplet (ℎ , ℎ , ), where belongs to . Let be the set of prescriptions, where is a prescription. Each prescription is composed of a set of herbs, = {ℎ | ℎ ∈ }.

Similarity Matrix of HPEs.
Jaccard similarity coefficient is a statistic metric used for comparing the similarity and diversity of finite sample sets. It is defined as the size of the intersection divided by the size of the union of the sample sets. Given two sets and , their Jaccard similarity coefficient is and 0 ≤ ( , ) ≤ 1. The larger the value is, the more similar the two sets are. Obviously, ( , ) = ( , ) and ( , ) = 1. Jaccard coefficient was used to compare the similarity of any two HPEs. An HPE is denoted by a set of individual herbs which make up the herb-pairs with the HPE. The number of HPEs is 32, and the similarities between these efficacies can be represented as a similarity matrix, 32×32 .
For example, given a number of herb-pairs which are where ∈ , ∈ , the herbal set of is { , , , }, and the set of are { , }. The Jaccard coefficient between and is ( , ) = 0.2. The HPE similarity matrix of this example is 2.3. Solving the Problem. In this section, a method of finding the common prescription patterns (CPPM) was proposed. The basic idea of CPPM was to use the clusters of HPEs to represent the common patterns of prescriptions. The methodological possibility had been elaborated in Section 1.   The higher clustering of HPEs reflects commonality and the lower clustering reflects individuality.

An Example
Here is an example to show how to extract the common patterns and the specific patterns of prescriptions at the different levels of the HPE hierarchical clustering.   Table 2, the other patterns at different clustering levels of HPEs were listed in Table 2. It is easy to recognize the common patterns at the 2-clustering and 1-clustering; it is also easy to recognize the core-set at the 5-clustering. The patterns of 5-clustering are diverse that every prescription has a different pattern. Most of existing researches on the TCM prescription patterns were conducted at this level. That is why these studies focus on recognition of the core herbs. Our study here was conducted at the higher level to seek the common patterns.

CPPM.
The inputs and outputs of CPPM were shown as follows.
Inputs are as follows: (i) Similarity matrix of HPEs.
(iii) The granularity level of hierarchical clustering .
(ii) Common prescription patterns: combination of HPEC and POs.
The metrics PO and OO were used to evaluate the support of the results, which would be stated in Section 2.5. The steps of CPPM are as follows.
Step 1 (get clustering of HPEs). A hierarchical clustering algorithm, such as Ward algorithm or BIRCH algorithm, was applied on the similarity matrix of HPEs. Then the -clustering of HPEs was obtained, where the HPEs were clustered into groups/clusters. A cluster of HPE is denoted by HPEC. Finally each HPE belongs to a specific HPEC.
Step 2 (label the herbs by HPEC). There are two substeps: an herb may make up some herb-pairs with different HPEs. (1) The frequent HPE is its dominating HPE. (2) According to the dominating HPE, the corresponding HPEC is assigned as a label to the herb. Thus, every herb has a HPEC.
Step 3 (get prescription patterns). For a prescription, replace its herbs by the corresponding HPECs and get a HPEC sequence of the prescription and then take the HPEC sequence without repeating as the pattern of the prescription. Some herbs of prescriptions cannot be located in the herbpairs. If the input of CPPM is N prescriptions, N patterns would be obtained in this step.
Step 4 (find the common patterns of prescriptions). The metric PO of each pattern is calculated and the frequent patterns with higher value of PO are selected as the common prescription patterns.

Levels of Granularity.
By inputting different parameter in the 1st step of CPPM, different levels of granularity of patterns can be output.
At the top level ( = 1), all HPEs belong to a common HPEC in 1st step, all herbs also belong to the HPEC in 2nd step, and the herbs of prescriptions will be projected to the common HPEC in 3rd step. The common HPEC is the common pattern, but it is obviously meaningless. At the bottom level ( = | |), every HPE belongs to different HPECs and the herbs will be projected to all HPEs in 3rd step. Consequently, the prescription patterns are numerous, and it is hard to find the common prescription patterns in 4th step.
It is not easy to determine the level of granularity. A good granularity level should be a meaningful clustering, which should be coincident with a certain biological action of mechanism or a TCM theory. Here we performed the process in the manual way which would be stated in Section 4.

Overlap Coefficient.
There are two different datasets with different data sources in CPPM, herb-pairs and prescriptions. Through the mapping technique, herb-pairs and prescriptions established a certain connection. The size of the intersection of the two sets reflects their consistency and completeness. Let | | be the size of the set . Let herb-pair be the herbs without repeating of herb-pairs; let prescription be the herbs without repeating of the candidate prescriptions. The overlap coefficient (OO) is to evaluate the consistency and completeness of the two sets. It is better to get a large value of OO. Consider OO without-repeating = prescription ∩ herb-pair min { prescription , herb-pair } . (4)

Probability of Occurrence.
To identify the commonality of the prescription patterns, we designed the probability of occurrence (PO). Let pattern be the number of prescriptions with a certain pattern and | | the number of all inputted prescriptions. The PO of a certain pattern is defined as

Herb-Pair Efficacy
Size. The size of an herb-pair efficacy is the number of herb-pairs with the herb-pair efficacy. The herb-pair efficacy sizes of different efficacies are not the same (in Table 1). The efficacy of 23 has quite large number of herbpairs, while few herb-pairs provide 13 and 29 . The smaller herb-pairs efficacies are unsuitable for statistical analysis, so we delete the smaller herb-pairs efficacies, whose size is smaller than 12. 12 is the threshold value because 18 is one of the specific observed objects and the reason was in Section 4. Our previous study results showed that 1 and 11 are two common herb-pairs efficacies [27] and should be omitted too. Consequently, the candidate herb-pairs efficacies were marked with "√" in Table 1 The clusters at 2-level were listed in Table 1. Because the main goal of this paper is to find the common patterns, some significant results were shown when the clustering level is 2. So the situation of 2-clustering was discussed here.

Classification of Herbs.
Here we counted the number of herb-pairs whose two herbs belong to one cluster, HPE 1 or HPE 2 . The detailed results were listed in Table 1. As we can see, there are very few exceptional cases that two herbs are in one cluster, while the herb-pairs composed of the two herbs maybe provide the efficacy of another cluster. So the statistic shows the 2-clustering at the level of HPE is consistent at level of individual herbs. Thus the 3rd step is validated.

Overlap Coefficient.
The number of herbs in the sets HPE 1 and HPE 2 that are candidate herbs at the herbpairs level is | herb-pair | = 330. There are 332 prescriptions composed of 320 herbs, | prescription | = 320. The size of the intersection of herbs in both herb-pairs and prescriptions is | herb-pair ∩ prescription | = 192. So OO = 0.6.
The number of all herbs in all the inputted prescriptions with repeating is 2057, where 1677 herbs could be located in herb-pair . The value of overlap coefficient is about 0.81.

Prescription Patterns.
To get the prescription patterns, we projected the herbs of the prescriptions to the two clusters, Evidence-Based Complementary and Alternative Medicine

Discussion
This section is quite argued and mysterious. But, for the sake of perfect mathematical matching, we found that the common prescription patterns are completely consistent with the Blood-Qi theory in TCM. The Blood-Regulating efficacies, 19 and 20 , are two of the therapeutic efficacies of HPE 1 and the Qi-Regulating efficacies, 16  In the Blood-Qi theory of TCM, Qi stagnation leads to blood stasis and blood stasis does not always cause Qi stagnation. So the formalization of the theory is The two formulas (f.1) and (f.3) have completely the same form. So the common prescription patterns are coincident with the Blood-Qi theory in TCM.

Conclusions
The prescription of TCM is a complex system. Usually, the knowledge of a few entities (prescriptions) of a complex system does not straightforwardly lead to a description of the overall system. So the common prescription patterns can provide us with a global view of the regularities of prescriptions. A method CPPM proposed in this paper is to find the common prescription patterns based on the hierarchical clustering of herb-pairs efficacies. The method was applied on the 697 herb-pairs and the 332 prescriptions. The statistic results showed that when the granularity level of the hierarchical clustering is 2, the common patterns are obvious. The description of the common patterns is that if a prescription contains the herbs of the clusters (HPE 1 ) it is very likely to have other herbs of another cluster (HPE 2 ); while a prescription has the herbs of HPE 2 , it may have no herbs of HPE 1 . And the formalizations of the common patterns and the Blood-Qi theory showed mathematical consistency. With the common pattern information, if the herbs of a new unknown prescription do not follow the pattern, the prescription is incorrect and inappropriate with a very large probability. Thus the common pattern reflects a kind of prescription regularity and can also be used to judge the appropriateness of a new prescription at a high level.