An Analysis of the Clinical Medication Rules of Traditional Chinese Medicine for Polycystic Ovary Syndrome Based on Data Mining

Objective The aim of the present study is to investigate the rules and characteristics of the clinical administration of traditional Chinese medicine (TCM) in the treatment of polycystic ovary syndrome (PCOS) using data mining methods. Method Medical cases of well-known contemporary TCM doctors treating PCOS were collected from the China National Knowledge Infrastructure, Chinese Biomedical Literature Service System, Wanfang, Chinese Scientific Journals Database, and PubMed; the data were then characterized, and a standardized database of medical cases was built. This database was used to (1) count the frequency of syndrome types and the herbs used in medical cases by data mining methods and (2) analyze drug association rules and systematic clustering methods. Results A total of 330 papers were included, involving 382 patients and a total of 1,427 consultations. The most common syndrome type was kidney deficiency; sputum stasis was the core pathological product and causative factor. A total of 364 herbs were used. Among them, 22 herbs were used >300 times, including Danggui (Angelicae Sinensis Radix), Tusizi (Semen Cuscutae), Fuling (Poria), Xiangfu (Nutgrass Galingale Rhizome), and Baizhu (Atractylodis Macrocephalae Rhizoma). Additionally, 22 binomial associations were obtained from the analysis of association rules; five clustering formulae were obtained via the analysis of high-frequency drug clusters; and 27 core combinations were obtained by k-means clustering of formula. Conclusion In the treatment of PCOS, TCM is primarily employed as a combination approach involving tonifying the kidneys, strengthening the spleen, eliminating damp and dissolving phlegm, activating blood circulation, and resolving blood stasis. The core prescription is primarily a compound intervention based on the Cangfu Daotan pill, Liuwei Dihuang pill, and Taohong Siwu decoction.


Introduction
Polycystic ovary syndrome (PCOS), a type of common gynecological disorder, afects approximately 8-13% of women within the reproductive age [1]. It is characterized by reproductive disorders, endocrine abnormalities, metabolic disorders, and mental health problems [2]; it is also one of the main causes of menstrual disorders and infertility in women and has developed into an area of focus in the feld of gynecological endocrinology and reproductive medicine in recent years. Current treatment strategies for PCOS mainly include oral medications (short-acting combination oral contraceptives, progestins, insulin sensitizers, and ovulation induction), in vitro fertilization-embryo transfer, and surgery. However, long-term clinical practice suggests that the above-mentioned strategies have certain limitations, for example, high recurrence rates and invasiveness, unsatisfactory long-term results, and various side efects [3]. Recently, alternative therapies have been gradually coming into focus as a more favorable option for PCOS treatment [4,5].
Traditional Chinese medicine (TCM) has unique advantages in the treatment of PCOS, and recent studies have shown that TCM can regulate the reproductive endocrine function of patients with PCOS as a whole, improve insulin resistance, promote ovulation, increase pregnancy rates, and improve various clinical symptoms; all with fewer adverse reactions and good safety [6][7][8][9]. However, due to the diverse clinical manifestations of PCOS and the complex nature of the disease, there is no unifed or broadly recognized consensus on its diagnosis and treatment in the context of TCM.
Te present paper deconstructs the TCM clinical differentiation and treatment rules of PCOS through systematic excavation and the study of modern medical cases involving TCM masters treating PCOS; this is of great signifcance for gaining a deeper understanding of the disease and improving the TCM diferentiation and treatment system, as it concerns the condition.

Data Sources.
Five major medical databases (China National Knowledge Infrastructure, China Biology Medicine (CBM) disc, Wanfang, Chinese Scientifc Journals Database, and PubMed) were searched using a computer. A combination of subject terms and basic searches was applied, and the search time limit ranged from the creation of each database to January 31, 2020.
Te Chinese searches included the following: (title: "Duonang Luancao Zonghezheng" or "PCOS") and (title: "Yanan" or "Jingyan" or "Yanfang" or "Gean" or "Anli"). Te English search terms were as follows: ("polycystic ovary syndrome" (Medical Subject Headings terms)) and ("Chinese phytotherapy" (Text Word)). A total of 857 relevant articles were retrieved. After removing repeated entries, initial screening entries, rescreening entries, and supplementary entries, a total of 330 articles (including 329 Chinese papers and 1 paper in English) were included according to the literature selection criteria. Te specifc retrieval and screening processes are shown in Figure 1.

Inclusion Criteria.
Te inclusion criteria for articles focusing on TCM cases were as follows: (1) academic experience, cases cited in medical treatises, and medical talks; (2) a Western medicine case diagnosis included PCOS; (3) the medical records of the frst visit comprised symptoms, tongue and pulse, and prescriptions; and (4) if there were any changes in symptoms, syndrome type, treatment, prescription, or other aspects in a medical case between the frst and second visit, the medical case of the second visit was recorded independently.

Exclusion Criteria.
Te exclusion criteria pertaining to case study articles were as follows: (1) literature reviews, case reports, animal experiments, and clinical studies; (2) cases lacking the necessary information (symptoms, lingual vessels, and prescriptions); (3) cases using acupuncture, auricular points, diet therapy, and ointment only for making a diagnosis and providing treatment; (4) for cases repeatedly published, only the frst published medical case was retained; and (5) conference papers that included only abstracts but not the full text.

Diagnostic Criteria of PCOS.
A confrmed diagnosis of PCOS was in accordance with the revised diagnostic criteria of PCOS set in the 2003 Rotterdam consensus workshop [10]: (1) sporadic ovulation or anovulation; (2) clinical symptoms (including hirsutism and/or acne) or biochemical evidence of hyperandrogenism (total testosterone >3.5 mmol/L); (3) polycystic ovarian lesions detected using the ultrasound method: ≥10-12 follicles with a diameter of 2-9 mm in each ovary and/or an ovarian volume of ≥l0 mL. Patients meeting either of the above criteria were diagnosed with PCOS.

Data Preprocessing and Normalization.
A total of 330 articles, including 382 medical cases, and a total of 1,427 visits resulting in efective prescriptions were recorded using the Excel 2016 program to establish a PCOS prescription database. To ensure the accuracy of the data, the data were recorded independently by two people, and the results were checked after the recording was completed. Te standardized TCM terminology was adopted by referring to Zhongyi Linchuang Shuju Wajue Yanjiu Shuju Guifanhua Biaozhun [11]. Te syndrome types were standardized by referring to zhongyi zhenduanxue [12] (TCM Diagnostics) and zhongyi fukexue [13] (Gynecology of TCM), and the herbs were standardized through reference to the 2015 edition of Zhonghua (Carthami Flos) Renmin Gongheguo Yaodian [14], and Zhongyaoxue [15]. Te following relevant examples were used: (1) the alias and common name of herbs, e.g., Qizi was classifed as GouQizi (Lycii Fructus) and Xianlingpi as Yinyanghuo (Epimedii Folium); (2) herbs with a processing name or a name of origin, e.g., Quan Danggui and Danggui Shen were classifed as Danggui (Angelicae Sinensis Radix), and Chuan Niuxi was classifed as Niuxi (Achyranthis Bidentatae Radix). Te data normalization process attempted to follow the original intention of the doctors involved in the cases.

Te Data Analysis
Platform. Te data were analyzed using the data association analysis platform (XMiner v.1.0) of the Medcase V3.8 data recording and processing/mining system platform [16,17], and the syndrome type and drug frequency of medical records were analyzed using frequency statistics. According to the number of prescriptions included and the prereading of relevant parameters, reasonable support and confdence values were set.
Association rule mining (an in-silico screening process) was applied to investigate the regularity of herbal compatibility in the prescriptions used in reported studies. Te dataset and the association rules are defned as follows: an association rule has the form of left-hand side (LHS) ⇒ right-hand side (RHS), where LHS and RHS are sets of items. Te occurrence of the RHS whenever the LHS set occurs is likely [18]. Evidence-Based Complementary and Alternative Medicine Te Apriori algorithm was used to extract the signifcant associations from all possible combinations of items in the main dataset [19]. Tere were three evaluation metrics (support, confdence, and lift); these are critical in describing the power and signifcance of the rules generated by association rule mining [20]. Specifcally, support is the frequency of the rule occurrence in the total dataset, measuring whether an association between the LHS and the RHS happens by chance. Confdence is the frequency of the rule occurrence in the cases of the dataset fulflling the LHS of the rule, thus representing the reliability of the association. Lift is the ratio of observed support to the expected support when the LHS and the RHS are independent, indicating a dependency of the occurrences of the two items when its value is >1 [21]. Additionally, the association rule method was used to analyze the regularity of the herbal compatibility of the prescriptions and core herbs in the included medical records.
Next, the core combinations and new prescriptions were obtained by setting reasonable correlation and penalty degrees via the variable clustering method. In the present study, k-means cluster analysis was considered, as the variables were quantitative at the interval or ratio level rather than being either binary or counts. To avoid unreliable results through omitted variable bias, the authors of the present study enrolled all the attributes, including the meridian tropism, fve properties, and fve tastes and investigated the therapeutic preferences of the candidate clusters. Moreover, the authors compared the analysis results from diferent permutations of the initial center values to ensure an appropriate number of clusters; this was used to further assess the reliability of the given solution [22].

Analysis of Structural Diagram.
Te single item, single row, and single directional relationships in structural diagrams were shown as the green box, while the orange box represented a single item multirow, single directional, or single item multicolumn bidirectional relationship. Te structural diagram of the associated rule sites in the drug sets can be taken as an example to explain the analysis rules in detail: the double row bidirectional relationship between the two drugs mostly represented that they are the common clinical drug pair; meanwhile, the double row bidirectional relationship existing among three drugs suggested that the three drugs were mostly clinical triplet herbs. Additionally, the single item and multirow bidirectional relationship between the two drugs mostly indicated that the two drugs appeared in clinical pairs in sequential chronological order, mainly manifesting as the drug emitted by the arrow appearing frst and the drug pointed by the arrow appearing second. Moreover, the more times the drug was pointed by the arrow, the more times it was used in herbal compatibility. Te analysis of the visual diagram for other item sets, such as symptoms and TCM pathology, was conducted with reference to the aforementioned drug sets.

High-Frequency Syndrome Types.
A total of 67 syndrome types were involved in 1,427 medical cases; among these, syndrome types with a frequency of >50 were related to phlegm stagnation, phlegm stasis, blood stasis stagnation, liver Qi stagnation, kidney defciencies, and spleen and kidney defciencies, accounting for 35.95% of the total Search five major medical databases, CNKI, WanFang, VIP, CBM and PubMed n=857 Delete identical ones with EndNote n=184 Read titles and abstracts to delete literature review, case reports, clinical and experimental studies n=193 Read the whole article to delete those without medical cases, and those with only abstract but no medical information and complete article n=153 Literature supplemented among each other n=3 Literature after deleting the identical ones n=673 Literature after primary screening n=480 (n Chinese=464, n English=16) Literature after rescreening n=327(n Chinese=326, n English=1) Literature included finally n=330(n Chinese=329, n English=1)  Table 1.

Association Rules within a Syndrome Set.
Te association rule analysis of the main syndromes of PCOS was performed using the Apriori algorithm, applying settings in which support � 0.7%, confdence � 46%, and elevation >1.
A total of 18 combinations of binomial association rules were obtained (see Table 3), and a structural chart of the association rules' loci concerning high-frequency evidence types was established (see Figure 2).

Association Rules within the Herb Set.
Te Apriori algorithm was used to analyze the association rules between prescriptions and herbs in the TCM treatment of PCOS, using settings in which support � 12%, confdence � 65%, and lift measure >1. As a result, 22 binomial association rules were obtained (see Table 4) and employed to form the structural diagram of association rules' loci related to highfrequency herbs (see Figure 3).

High-Frequency Drug System Clustering Results.
Te systematic clustering method was applied to further analyze 51 high-frequency herbs (frequency ≥ 100) that were used to treat PCOS. Five clusters were selected from the clustering results and formulated into a cluster analysis tree diagram (see Figure 4). Meanwhile, the corresponding syndromes of each cluster prescription were summarized based on clinical experience (see Table 5).

Te k-Means Clustering Results of Prescription.
In the drug cluster analysis, 27 clusters were present when k-value � 27.0000 and when inertia � 3310.6332 were used as a benchmark (see Table 6).

Discussion
Te TCM context does not include the term "polycystic ovary syndrome"; however, according to the condition's clinical symptoms, it can be classifed using the terms "amenorrhea," "infertility," "delayed menstruation," and "scanty menstruation." Te "areolae" of "phlegm with blood stasis, reversible areolae" (as recorded in Danxi Xinfa) is presumed to be the earliest description of this disease in TCM.
Distinct from Western medicine, ancient books of TCM have recorded that the defciencies of Qi, blood, yin, yang in Five Viscera (heart, liver, spleen, lung, and kidney) were involved in the pathogenesis development of PCOS. Proposed by TCM physicians, Qi is not only the energy of life but also the driving force behind the biological activities of the human body, mind, and spirit. Blood was the another vital TCM substance in charge of providing nutrients. Te defciencies or stagnant movement of Qi and blood will induce phlegm, dampness, and blood stasis, which stays and obstructs the uterus and consequently causes the blockage of the route of menstrual blood discharge and the polycysticlike pathologic changes of ovaries.
Additionally, according to TCM theory, kidney is the vital of Qi and blood, which produce essence and nourishes and maintains the physiological functions of the ovaries. Insufcient kidney essence induced declined and weakened function of ovaries, resulting in ovaries' dystrophy and various symptoms including failure to have a smooth menstrual fow owing to difculty maturing or properly undischarged follicles. Similarly, the spleen provides nutrition for the body's Qi and blood production and circulation. Spleen insufcient induced the poor menstrual blood sources and difcultly draining menstrual blood due to the poor circulation of Qi and blood, leading to disordered menstrual cycle (the major symptom of PCOS). Additionally, liver Qi plays important role in the development of PCOS according to the theory of TCM. Te stagnation of liver Qi means the poor circulation of liver Qi, which induced the accumulation of condensed pathological metabolites such as phlegm, dampness, and blood stasis, further leading to insufcient menstrual blood sources as well as difcultly draining menstrual blood.
Tus, it can be seen that "dysfunction of kidney, liver, and spleen" and "pathogenic metabolites such as phlegm, dampness, and blood stasis" interact with each other, resulting in PCOS [9]. Tis is consistent with the fndings of the present study, in which the frequency analysis of highfrequency syndrome types shows that the common syndrome types of PCOS include phlegm stagnation, phlegm stasis, blood stasis stagnation, liver Qi stagnation, kidney defciencies, and spleen and kidney defciencies. Among these, kidney defciency is the most important; this is in line with the classical theories, i.e., "menstruation comes from (the) kidney(s)" and "(the) kidney(s) (dominate) reproduction, which is the root of innate endowment." Te results of the association rule analysis indicated a high correlation between Qi defciency with blood stasis and phlegm stagnation, Qi stagnation, dampness blockage and phlegm stasis, lower energizer dampness-heat, and blood stasis stagnation. Phlegm and blood stasis are suggested as the most important pathological products and pathogenic factors of PCOS; this coincides with the theory of "phlegm mixed with blood stasis, reverse ke-nang," proposed by Zhu Danxi.
It is worth noting that TCM with diverse bioactivities [23][24][25][26] has played a signifcant role in the treatment of refractory disease, such as PCOS. Terefore, the authors of the present study applied a frequency analysis to acquire the high-frequency herbs of treating PCOS in published clinical cases included in this study.
Te results showed that the top 10 high-frequency herbs were Danggui (Angelicae Sinensis Radix), Tusizi (Semen were used to tonify the essence and blood as well as to activate the blood to regulate menstruation. Tusizi (Semen Cuscutae) was used to reinforce kidney essence and warm the kidney yang. Fuling (Poria), Baizhu (Atractylodis Macrocephalae Rhizoma), and Gancao (Glycyrrhizae Radix et Rhizoma) were used to fortify the spleen, replenish Qi, and drain dampness. Xiangfu (Nutgrass Galingale Rhizome) was used to soothe the liver, regulate Qi, and help resolve depression. Danshen (Salviae Miltiorrhizae Radix et Rhizoma) was applied to activate the blood and resolve its stasis as well as regulate menstruation. Tis corresponds with the pathogenesis of defciency, dampness, and stasis. It is also consistent with the physiological characteristics "blood is the base of the female," "kidneys dominate reproduction, and the spleen controls digestion, which are considered the root cause of innate endowment and postnatal constitution." Additionally, the results of pharmacological experiments further validated that the aforementioned herbs could perform protective efects on

Evidence-Based Complementary and Alternative Medicine
PCOS through multiple pathways. For example, a report has shown that aqueous extract from Danggui (Angelicae Sinensis Radix) had a benefcial efect on a rat with PCOS and that the underlying mechanism was partly related to the JAK2/STAT3 signaling pathway mediated by interleukin-6 [27]. Te total favone of Tusizi (Semen Cuscutae) could improve PCOS by regulating the secretion of estrogen and androgen as well as afect the hypothalamic-pituitary-ovary axis pathway [28]. Te results of the association rules in the drug sets showed that drug combinations with a confdence level of >70% included the following: Fupenzi (Rubi Fructus) ⟶

Evidence-Based Complementary and Alternative Medicine
Sinensis Radix), suggesting that it was the most commonly used herb in cases of PCOS. Tis is consistent with the results related to the drug frequency distribution. Danggui (Angelicae Sinensis Radix) is a key medicine for nourishing and activating the blood as well as for regulating menstruation. According to Jingyue quanshu·bencaozheng, "the taste is sweet and heavy, so it can nourish the blood, (and) its (smell is) light and pungent, so it can move blood (. . .) there is moving in the nourishing, nourishing in the moving, as the Qi medicine in the blood, as well as the panacea medicine in the blood." A total of 5 clustered prescriptions were obtained by systematic clustering. According to the syndrome type inferred from the prescription, clustering prescription 1 has the function of reinforcing the liver and kidneys and warming the kidney yang, which is suitable for kidney yang defciency syndrome. Clustering prescription 2 has the function of nourishing the kidney yin and nourishing and activating the blood, which is suitable for blood stasis in the case of kidney-defciency syndrome. Clustering prescription 3 has the function of activating and resolving blood stasis and tonifying the blood to regulate menstruation, which is suitable for blood stasis and amenorrhea syndrome. Clustering prescription 4 has the function of fortifying the spleen and replenishing the Qi as well as drying dampness to resolve phlegm, which is suitable for a spleen defciency with phlegm and dampness syndrome. Clustering prescription 5 has the function of warming the yang and resolving phlegm as well as resolving stasis and dredging collaterals, which is suitable for kidney defciency with phlegm stasis syndrome.
Te k-means clustering analysis of the formulations yielded 27 types of core prescriptions. Among them, clustering prescriptions 1 and 13 both include Cangzhu (Atractylodis Rhizoma), Fuling (Poria), Chenpi (Citri Reticulatae Pericarpium), and Xiangfu (Nutgrass Galingale Rhizome); the main efects are strengthening the spleen, drying dampness, and resolving phlegm as well as treating spleen defciencies with dampness encumbrance syndrome, which is presumed to be the addition and subtraction of prescription of the Cangfu Daotan pill according to the drug composition.
Most patients with PCOS are obese, as per Zhulin nvke zhengzhi: "the body hypertrophy has phlegm and Qi defciency, to a few months to begin menstruation, appropriate to take shape more phlegm defciency, to several months and the originator, appropriate to take Cangfu Liujun decoction, as well as Cangfu Daotan pill." Clustering prescription 7 includes Banxia (Pinelliae Rhizoma), Dannanxing (Arisaematis Cum Bile), Chenpi (Citri Reticulatae Pericarpium), and Xiangfu (Nutgrass Galingale Rhizome), and its main efcacy relates to dampness and resolving phlegm; accordingly, it is useful in the treatment of phlegm stagnation syndrome.
According to the drug composition of the prescription, it is speculated that the addition and subtraction prescription of the Erchen decoction.    (Carthami Flos). Teir primary efcacy relates to activating and resolving blood stasis; as such, they are used in the treatment of blood stasis stagnation. According to the prescription source knowledge, it is speculated that they are the addition and subtraction prescriptions of the Taohong Siwu decoction.

Conclusion
Te present data mining research analyzed published medical cases of PCOS that were derived from the fndings of well-known TCM doctors to explore the experiences of PCOS syndrome types, treatment methods, and medications by diferent analysis methods. Te results showed that the common syndrome types include phlegm stagnation, phlegm stasis, blood stasis stagnation, liver Qi stagnation, kidney defciencies, and spleen and kidney defciencies; among these, kidney defciency is the core syndrome type, and phlegm and blood stasis are the core pathological products and pathogenic factors. Commonly used medicines for treating these conditions include Danggui ( Te main efcacy of high-frequency drug system clustering prescription includes reinforcing the liver and kidney, warming the kidney yang, nourishing the kidney yin, nourishing and activating the blood, activating and resolving blood stasis, tonifying the blood to regulate menstruation, fortifying the spleen and replenishing the Qi, drying dampness to resolve phlegm, warming the yang and resolving phlegm, resolving stasis, and dredging collaterals. Te treatment methods that are typically used apply a combination of compound methods, such as tonifying the kidneys, strengthening the spleen, resolving dampness, activating the blood, and resolving stasis. Most of the core prescriptions were TCM compound formula interventions based primarily on the combination of the Cangfu Daotan pill, Liuwei Dihuang pill, and Taohong Siwu decoction. Terefore, this study will provide guiding signifcance for the formation of a consensus on the clinical treatment of PCOS by TCM experts in terms of its diagnosis and treatment as well as the relevant medication use.

Data Availability
Te data that support the fndings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest
Te authors declare that they have no conficts of interest.