Tripterygium wilfordii Hook. f. Preparations for Rheumatoid Arthritis: An Overview of Systematic Reviews

Objectives To summarize the quantity and quality of evidence for using Tripterygium wilfordii Hook. f. (TwHF) preparations in patients with rheumatoid arthritis (RA) and to find the reasons of the disparity by comprehensively appraising the related systematic reviews (SRs). Methods We performed an overview of evidence for the effectiveness and safety of TwHF preparations for patients with RA. We searched seven literature databases from inception to July 15, 2021. We included SRs of TwHF preparations in the treatment of RA. Four tools were used to evaluate the reporting quality, methodological quality, risk of bias, and the certainty of evidence for the included SRs, which are the PRISMA, the AMSTAR-2, the ROBIS, and the GRADE approach. Results We included 27 SRs (with 385 studies and 33,888 participants) for this overview. The AMSTAR-2 showed that 19 SRs had critically low methodological quality and the remaining 8 had low methodological quality. The rate of overlaps was 68.31% (263/385), and the CCA (corrected covered area) was 0.53, which indicated the degree of overlap is slight. Based on the assessment of ROBIS, all 27 SRs were rated as low risk in phase 1; one SR was rated as low risk in domain 1, 9 SRs were in low risk in domain 2, 16 SRs were in low risk in domain 3, and 16 SRs were in low risk in domain 4 in phase 2; 7 SRs were rated as low risk in phase 3. Among 27 items of PRISMA, 15 items were reported over 70% of compliance, the reporting quality of 16 SRs was rated as “fair,” and 11 were “good.” Using GRADE assessment, moderate quality of evidence was found in 5 outcomes, and 5 outcomes were low quality. Conclusion The use of TwHF preparations for the treatment of RA may be clinically effective according to the moderate-quality evidence. There are methodological issues, risk of bias, and reporting deficiencies still needed to be improved. SRs with good quality and further randomized clinical trials that focus on clinical important outcomes are needed.


Introduction
Rheumatoid arthritis (RA) is the most common autoimmune inflammatory arthritis in adults with a prevalence of 0.5-1.0% of the general population [1,2]. A recent metaanalysis found the global prevalence of RA was 460 per 100,000 population in the period 1980 to 2019 [3]. RA is characterized by progressive symmetric arthritis with chronic joint inflammation, synovial hyperplasia, and systemic manifestations [4]. e most common symptoms reported by people with RA are arthralgia, swelling, redness, and limited motion range [5,6]. Without adequate treatment, RA can lead to severe joint deformity and disability, impacting upon patients' quality of life and work ability [7]. Complications associated with RA lead to high morbidity and rising mortality [7,8]. Significant progress in studying the mechanisms of RA has been made in the field of genetic predisposition and environmental research area, which were involved in its onset and progression, emphasizing the heterogeneity of RA [9].
Treatment algorithms for patients with RA involve measuring disease activity with composite indices, and its treatment target is the maintenance of remission/low disease activity or prevention of joint destruction and deformity and improvement of joint function [10]. e 2021 American College of Rheumatology Guideline for the Treatment of Rheumatoid Arthritis addressed the treatment for patients with RA is the disease-modifying antirheumatic drugs (DMARDs) [11]. As the first line of therapy for RA, (cs) DMARDs (e.g., methotrexate (MTX), leflunomide, and sulfasalazine) and several recommendations against the use of glucocorticoid therapy are made in the newest guideline [12]. Although the prospects for most patients are now favorable, many still do not respond to current therapies. Adverse effects (e.g., immunosuppression, bone marrow dysfunction, interstitial lung disease, liver damage, hyperglycemia, and hypertension) occurred in RA patients with longtime medication given for treatment [13,14], and the cost of treating RA has also risen strikingly, largely as a consequence of the biologic therapies [15]. Accordingly, there are still some unmet needs for patients who do not achieve remission and who continue to worsen despite treatment. Hence, patients often seek more complementary therapies. e popularity of complementary and alternative medicine (CAM) in the management of RA has grown considerably, which covered both the interest of patients and the research community over the past decade [16,17]. Botanical extract, among the CAM approaches, is an effective option against RA symptoms owing to several antiinflammatory, palliative, and antiarthritic properties. Tripterygium wilfordii Hook. f. (TwHF) is a traditional Chinese herb, which is widely used in the treatment of RA in China [18,19] due to its anti-inflammatory and immunosuppressive effects. Several TwHF preparations and patented preparations derived from TwHF extracts are clinically available, including Tripterygium wilfordii tablets (TWTs) and Tripterygium wilfordii glycosides tablets (TWGTs) and Tripterygium hypoglaucum hutch tablets. Both TWTs and TWGTs exhibited efficacy similar to MTX as well as enhanced efficacy when a combined remedy of the tablets and MTX was administered to patients with RA in randomized controlled clinical trials [20,21]. Biochemical and pharmacokinetic studies found that triptolide (TP) and celastrol are two of the most bioactive, yet toxic, constituents identified in TwHF preparations [22]. Triptolide is regarded as the most potent systematic anti-inflammatory and immunoregulating natural products [23]. Previous reviews summarized that the mechanisms associated with the significant therapeutic effects of TP and celastrol against T helper cellmediated immunity, including RA, have been extensively studied [24,25]. Emerging evidence suggests that TP suppresses inflammatory responses by attenuating MAPK/NF-κB activation and inhibiting downstream responses [24,25]. Several studies have demonstrated that TwHF preparations' therapeutic effect may be dependent on the immune balance of 17 cells and Tregs, the regulation of the proportion between CD4+ and CD8+ T cells, and the differentiation of dendritic cells [26,27]. A large number of individual trials and systematic reviews (SRs)/meta-analyses (MAs) of TwHF preparations in the treatment of RA have been published. However, the results and quality of the SRs have been mixed. As an increasingly popular form of evidence synthesis, an overview of SRs/MAs uses explicit and systematic methods to extract and analyze their results across important outcomes from multiple SRs/MAs on related research questions [28,29]. us, we conducted an overview of SRs/MAs about TwHF preparations in the treatment of RA to inform healthcare decision-makers and address new questions that were not reported in the included SRs/MAs.

Methods
We adhered to two guidelines for conducting an overview, one is the Cochrane Handbook [30], and the other is the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [31]. Literature search and selection, data extraction, and quality evaluation were completed by two authors (Huimin Li and Simin Xu) independently. All discrepancies were resolved by consulting an experienced third reviewer (Xing Liao) firstly and then reached consensus in the team of all authors. We referred to two published overviews with good quality in this step [32,33].

Protocol and Registration.
We registered our protocol in the International Platform of Registered Systematic Review and Meta-Analysis Protocols (DOI number is 10.37766/ inplasy2021.8.0081).
ere was no need for the ethical approval.

Search Strategy.
We searched the following literature databases using the words of TwHF, RA, and systematic review/meta-analysis from inception to July 13, 2021: e Cochrane Library, PubMed, Embase, VIP database, China National Knowledge Infrastructure, CBM, and WanFang. e details of the literature search strategy are presented in Appendix A.

Inclusion and Exclusion Criteria.
We selected related SRs meeting inclusion criteria: (a) SRs of randomized controlled trials (RCTs) or other research designs; (b) the participants were diagnosed as RA by common criteria, or it was clearly stated that the population of the SR was RA patients; (c) the comparisons were any type of TWHs extract with or without standardization treatment used as the treatment for RA versus standardization treatments, such as drug therapy, routine activities, no therapy, placebo, and other treatment; (d) outcomes including clinical, physiological, or caregiverreported outcomes; patient-reported outcomes; and adverse effects. Only SRs published in English and Chinese were included. SRs with unavailable full text were excluded.

Data Management and Data Collection.
We used the literature manager NoteExpress (V3.5.0.9054) to perform literature selection. We firstly screened title and abstract to eliminate duplication for potentially relevant SRs. Full texts of possible eligible SRs were downloaded and assessed based on inclusion and exclusion criteria. We applied a predesigned form to extract related information from each eligible SR: general information (e.g., the publication year, title, first author, country, and language); review characteristics (e.g., literature database, number of included studies and participants, diagnosis criteria, interventions and comparisons, meta-analysis, quality assessment tool, and outcomes); and the main conclusion.
2.5. Assessment of Methodological Quality. We used the tool Assessing the Methodological Quality of Systematic Reviews 2 (AMSTAR-2) [34] to estimate the methodological quality for all included SRs, which provides guidance to rate the overall confidence in the results of a review. e AMSTAR-2 includes four critical domains, which are preparation for review, search for and selection of primary studies, data coding and reporting, and data synthesis. It contains 16 items, of which seven were critical domains (items: 2, 4, 7, 9, 11, 13, and 15) that can critically influence the validity of an SR and its conclusion. For each item, three options could be chosen to answer the question: "yes" indicating high quality, "partial yes" being partially compliant, or "no" being poor quality.
e overall rating depends on weaknesses in the critical domains (items: 2, 4, 7, 9, 11, 13, and 15). e rating is divided into four categories depending on the number of critical flaws and/or noncritical weaknesses: "high" means no or one noncritical weakness; "moderate" means more than one noncritical weakness but no critical flaws; "low" means one critical flaw with or without noncritical weaknesses; and "critically low" means more than one critical flaw with or without noncritical weaknesses.

Assessment of Risk of Bias.
We also evaluated the risk of bias of each included SR/MA using ROBIS statement [35], which assesses whether an SR is at risk of bias based on its methods and conduct. ROBIS is comprised of three phases: (a) assess relevance (optional), (b) identify concerns with the SR process, and (c) judge risk of bias of the SR. Phase one is optional, which assesses the relevance. Phase two includes four domains formed by 21 signaling questions, which aims to identify concerns with the review process. Phase three, with three signaling questions, concentrates to judge the risk of bias of the SR. All signaling questions were answered as "yes," "probably yes," "probably no," "no," and "no information." Based on the answers to the signaling questions in each domain, each domain is assigned a risk of bias grade. If all of signaling questions of phase 3 were answered as "yes," the SR was judged as "low risk." Any of signaling question of phase 3 was answered as "probably no" or "no," the SR was assessed as "high risk." If the information provided was insufficient to judge, the SR was rated as "unclear risk." After completing phase three, a summary judgment (e.g., high, low, or unclear) regarding the risk of bias for the SR will be rendered.

Assessment of Reporting Quality.
We applied the checklist Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [36] to appraise the report quality for each SR/MA. PRISMA consists of seven main domains: title, abstract, introduction, methods, results, discussion, and funding. It comprises 27 items and a fourphase flow diagram, which focus on the reporting of methods and results in SRs/MAs. Each item was answered as "yes," "no," and "partially reported." With the purposes of statistical analysis, we judged whether an SR fully reported what was required by PRISMA and scored each item with a 1 point (fully reported), 0.5 point (partially reported), or 0 point (not reported) for each item. e sum of all items scored for each question was divided by its maximum possible score as a percentage to assess the report quality for each SR. e report quality of SRs related to its PRISMA score percentage was rated as very poor (<30%), poor (30-50%), fair (50-70%), good (70-90%), and excellent (>90%).

Assessment of Quality of Evidence.
e Grades of Recommendations, Assessment, Development, and Evaluation (GRADE) [37] approach was used to assess and report the certainty of evidence for the clinically important outcome of interest in the current overview. In the GRADE system, five factors for rating down the quality of evidence were considered for the current overview: risk of bias (also called "study limitations"), inconsistencies, indirectness, inaccuracy, and publication bias. Quality of evidence of each outcome was judged as "high," "moderate," "low," and "very low." 2.9. Data Synthesis and Presentation. We narratively described the characteristics of included SRs and the efficacy and safety of TwHF preparations for RA in this overview. We made use of tabulation and figures to summarize the results of all SRs/MAs as well as the appraisal results from AMSTAR-2, PRISMA, and ROBIS. We generated the evidence profile and summary of findings table with the aid of the GRADEpro GDT online software (https://www. gradeworkinggroup.org/).

Results on SRs/MAs Search and Selection.
e initial search strategy yielded 280 records from the selected databases. After removal of 42 duplicates, 238 records were screened based on title and abstract. Afterward, fifty-six articles were read in full text, of which 27 SRs [20,21, were included in the current overview. e excluded review list has been recorded in Appendix A. e PRISMA diagram for the process of screening and selecting SRs is displayed in Figure 1.
e outcomes reported by the 27 SRs covered tender joint count (TJC), swollen joint count (SJC), morning stiffness (MS), grip strength (GS), erythrocyte sedimentation rate (ESR), C-reactive protein (CRP), rheumatoid factor (RF), American College of Rheumatology (ACR), adverse events (AEs), interleukin 1 (IL-1), interleukin 4 (IL-4), interleukin 6 (IL-6), interleukin 10 (IL-10), tumor necrosis factor-alpha (TNF-α), 15-m walking time (15Mwt), 15/20-m walking time (15/ 20Mwt), tenderness score, physician-rated and patient-rated overall assessments, X-ray score, radiological changes of joints, withdrawal rate related to adverse reactions, joint symptoms, disease activity score, cyclic citrullinated peptide (CCP), mean grip strength, analgesic onset time (AOT), short form 36 health questionnaire (SF-36), health assessment questionnaire, traditional Chinese medicine symptom score of the joint swelling, and painful joint count. Among the 25 SRs that aimed to evaluate both efficacy and safety of TwHF preparations, only 12 SRs [20, 40-45, 49, 50, 53, 55, 56] reported AEs. e quality assessment tools of the original studies varied among the 27 SRs, out of which 17 employed Cochrane risk of bias tool, 9 adopted the Jadad score, and the remaining 1 used an unknown tool. Out of the 27 [38,58,60] were no effect and 1 [39]) was harmful. e detailed characteristics the SRs are presented in Table 1. Table 2 presents the results of methodological quality of the 27 included SRs/MAs assessed by the AMSTAR-2. Out of the 27 included SRs, the quality of 20 SRs was rated critically low since they had more than one critical weakness (items 2, 4, 7, 9, 11, 13, and 15). Severe limitation existed in item 2, item 3, item 7, item 10, and item 16 (percentage of items with "yes" < 50%). e methodological quality appraised by the AMSTAR-2 for the 27 SRs can be reflected as follows: 92.6% of the 27 SRs did not explicitly report the review methods, which should be established before conducting the review and significant deviations from the protocol was found (item 2); 91.49% did not provide a list of excluded studies and justified the exclusions (item 7); 96.3% did not explain the selection of the study designs for inclusion in the review (item 3); 81.49% did not use a comprehensive literature search strategy (item 4), 66.67% did not report any potential sources of conflicts of    were classified into five categories by referring to another evidence mapping study [63]. Inconclusive: reported the results differed across or within reviews due to conflicting results or limitations of individual studies. No effect: reported that there is no difference between intervention and comparator. Harmful: reported clearly a harmful effect. Probably beneficial: did not report firm benefits despite the reported positive treatment effect. Beneficial: reported a clear beneficial effect without major concerns regarding the supporting evidence.  [38] Y

Results on Review Quality Assessment
Evidence-Based Complementary and Alternative Medicine interest (item 16); and 66.67% described the included studies insufficiently (item 8).

Risk of Bias.
e ROBIS was used to assess the risk of bias for each SR, the results of which are presented in Appendix B. All 27 SRs were judged with low risk of bias in phase 1 (assessing relevance). Regarding phase 2, across all 27 SRs, the individual bias domains at the highest risk of bias were domains 1 (protocol and eligibility criteria, 26/27, 96.30%) and 2 (methods to identify and select studies, 18/27, 66.67%). Specific areas of concern in these two domains were the lack of information about publication of an SR protocol, language restrictions, choice of literature databases, and searches for gray literature. Eleven (40.74%) SRs were at high risk of bias for both domain 3 (collection and study appraisal) and domain 4 (synthesis and findings). Seven (25.92%) SRs were rated as low risk of bias in phase 3(risk of bias in the review). Finally, 20 of the 27 SRs were rated as "high risk," and the remaining 7 SRs were rated as "low risk." In general, 20 of the 27 SRs were rated as "high risk," and the remaining 7 SRs were rated as "low risk." Reviews with high risk of bias mainly have problems with the completeness of the search for relevant studies, inadequate report of the protocol, and lack of explicit method to select studies.

Reporting Quality.
e results of PRISMA assessment are presented in Appendix B. Of the 27 items, 12 items had adherence greater than 70% in most of the included SRs; however, five items had only one SR, and four items had no adherence. e section of rationale, objectives, eligibility criteria, title, introduction, study characteristics, and results of individual studies were all well reported by all included SRs, but there were still inadequate reports in other sections. Five items with adherence lower than 5% were the main reporting deficiency, which are if a protocol exists or is registered (item 5, percentage of items with "yes," 3.7%); certainty assessment (item 15, percentage of items with "yes," 3.7%); search strategy (item 7, yes � 3.7%); structured summary (item 2, yes � 3.7%); and certainty assessment (item 22, yes � 0%). Additionally, only one SR [45] mentioned the study protocol and the protocol registration number. Finally, the reporting quality of 16 SRs was rated as "fair," and 11 "good."

Evidence Quality of Outcomes.
e information about the efficacy and safety of TwHF preparations for RA from included SRs is summarized and displayed in Table 3. Ten of the 27 SRs that selected rheumatoid factor as the primary outcome suggested that patients with RA who received TwHF preparations had better effects than their counterparts who were treated with DMARDs. Eighteen of the 27 SRs (66.66%) reported that both tender joint count and swollen joint count were significantly reduced in the TwHF preparations group. As for the ACR (20/50/70), 7 of the 27 SRs (25.92%) reported that ACR (20/50/70) was significantly improved in the TwHF preparations group. As for the levels of ESR and CRP, 18 of the 27 SRs (66.66%) reported that both of them were significantly reduced following the TwHF preparations treatment, while one SR reported there was no statistical significance for ESR. Among the 15 included SRs that reported morning stiffness (MS), 8 SRs reported that MS was significantly reduced in the TwHF preparations group. e combination therapy with TwHF preparations and other treatment significantly decreased the duration of morning stiffness; alleviated tender joint count; relieved swollen joint count, ACR (20/ 50/70), ESR, CRP, and RF; and lowered the level of TNF-α. e most common AEs with TwHF preparations were gastrointestinal discomfort, menstruation disorders, amenorrhea, decreased sperm motility, liver function damage, and skin diseases.     e risk in the intervention group (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI). CI: confidence interval; RR: risk ratio; OR: odds ratio; MD: mean difference. GRADE Working Group grades of evidence-HIGH quality: further research is very unlikely to change our confidence in the estimate of effect; MODERATE quality: further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate; LOW quality: further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate; VERY LOW quality: we are very uncertain about the estimate. a: downgraded due to risk of bias; b: downgraded due to publication bias; c: downgraded due to inconsistency and imprecision.

Overall Quality of the Evidence.
e details of GRADE summary of findings are described in Table 3. We only rated the body of evidence for main outcomes that were pooled based on RCTs using the GRADE system. Nineteen SRs involving 5 main outcomes related to the effects of TwHF preparations for RA were analyzed. Based on the analysis of the GRADE approach, moderate quality of evidence was found in 5 outcomes of the included SRs, whereas 5 outcomes were rated as low quality, and 2 outcomes were very low quality. ere was no outcome with high-quality evidence found in the current overview. Risk of bias (n � 13) was the most common downgrading factors, followed by inconsistency (n � 2), imprecision (n � 6), publication bias (n � 5), and indirectness (n � 9). e reasons to downgrade the level of evidence are the poor methodological quality, imprecision of the results, and small sample size among relevant trials. e downgraded reason the small number of participants was for the majority outcomes. e number of participants included in the SR did not reach the optimal information size. en, the quality of evidence was downgraded due to its imprecision. e effect estimates could not provide a convincing explanation for differences in results across studies for nearly half of the outcomes, owing to the statistically significant heterogeneity. Some of the outcomes had publication bias because of the incomprehensive literature search, which was already found by AMSTAR-2 and ROBIS.

Discussion
Overviews are most frequently employed where multiple systematic reviews already exist on similar or related topics and aim to systematically bring together, appraise, and synthesize the results of related systematic reviews [67]. Although there are an increasing number of SRs/MAs published on TwHF preparations for RA, the quality of those Evidence-Based Complementary and Alternative Medicine SRs/MAs taken together has not been assessed until now.
us, there is a need to systematically bring together, appraise, and synthesize the results of related systematic reviews in an overview of this issue.

Summary of Main Findings.
Tripterygium wilfordii Hook f. (TwHF, also known as under God Vine or Lei Gong Teng) is one of the most representative traditional Chinese herbs with therapeutic potential that has been broadly studied by scientists. In spite of some occasional, but severe, adverse effects (which may be harmful to the liver, kidneys, reproductive tissues, and immune tissues [64]) found in clinical practice, the use of TwHF preparations is still not reduced due to their significant efficacy against diseases. In the current overview, 27 included SRs on TwHF preparations were published from 2013 to 2021. Out of the included 27 SRs, 26 of which drew positive conclusions of TwHF preparations for RA; however, none of the review authors drew a firm conclusion owing to the small sample size of the included RCTs or their low methodological quality.
ough it showed that adverse events caused by TwHF preparations were not significantly different from those caused by immunosuppressive agents, there is an urgent need for improving prevention and management of patients' tolerance and monitoring the administration of TwHF preparations in the clinical practice [65]. And TwHF preparations should not be used for RA patients with liver and kidney insufficiency and fertility planning, in view of the liver and kidney and reproductive toxicity. We reclassified and examined the 385 primary studies included in the 27 included SRs. We calculated the percentage of primary studies included in more than one SR and the rate of CCA (corrected covered area), which is a measure about the degree of overlap [68]. e rate of overlaps was 68.31% (263/385) and the CCA was 0.53, which indicated the degree of overlap is slight.
ere are two possible reasons for the overlap, one is SRs in TCM research area often having a broader research question, for instance, the majority of SRs investigating TwHFPs versus conventional medicine on different outcomes, leading to more primary studies included in an SR; the other is some authors of SRs reported that the quality of the published SRs was poor and there was necessity to perform a new one rather than an updated one. e quality of the SRs and the evidence quality of the outcomes in this overview are generally discouraging, on the basis of the evaluation from AMSTAR-2, ROBIS, PRISMA, and GRADE, implying that there is huge disparity between the included SRs/MAs and the real world.
us, in view of these limitations, the trustworthy of evidence for TwHF preparations for RA was weakened. Consequently, recommending TwHF preparations as a complementary or even alternative treatment for patients with RA should be cautious. e current overview found four main findings. First of all, the methodological quality of all the SRs was rated as critically low or low by the AMSTAR-2 tool, and the following deficiencies existed: 1) selective reporting bias arose due to the lack of SR protocol or the absent registration of the protocol of the included SRs, which affected their thoroughness; 2) the confidence of results was influenced by the decreased transparency, due to the omission of the lists of excluded studies with explanations; and 3) the reliability of the conclusions and its impact on different users of reviews were affected by missing disclosure of potential financial conflicts of interest or the authors' conflicts of interest. Secondly, high risk of bias evaluated by the ROBIS tool was found in the literature search, study selection, data synthesis method, and the explanation in the discussion among these included SRs, which made the current evidence unreliable. irdly, the assessment on included SRs' adherence to the PRISMA statement found that incomplete reporting occurred in the literature search strategies, the literature screening processes, the additional analyses, and the sources of funding, which decreased the trustworthiness of the findings. When information is absent or ambiguous in the reporting, SR users cannot implement the findings of SRs into clinical practice. Lastly, the results from the GRADE assessment in this overview revealed that moderate-quality evidence on some outcomes for TwHF preparations having potential effects for patients with RA. Low-quality evidence affected the confidence in the evidence, which made the uncertainty about the trade-offs when recommending the TwHF preparations as an intervention for RA. In regard to the safety of TwHF preparations for RA, 11 SRs reported that the combined therapy increased clinical efficacy significantly when compared with the Western medicine alone, whereas four SRs found no difference between the two groups. In the current overview, there is no high-quality evidence, and most of the outcomes were rated as low or very low quality. Evidence quality was downgraded due to the study limitations, inconsistency, and the publication biases. e publication bias in most of the included SRs was mostly caused by the small number of included RCTs with small sample size as well as positive results, which may lead to overestimating the effect size.
More than that, most of the original studies of TwHF preparations in treating RA have major limitations, including lack of allocation concealment; subjective outcomes without blinding; loss to follow-up; and no intention to treat analysis, which biased the estimates of the treatment effect and affected the confidence in the estimate of effect in SRs. Study heterogeneity prevented meaningful meta-analysis due to the various evaluation criteria for the assessment of clinical effectiveness and different treatment courses across studies. Only one SR [61] conducted subgroup analysis based on the different treatment courses.

Implications for Future Clinical Practice and Research.
According to our results, TwHF preparations may be effective for RA patients, which is consistent with a related previous overview [68]. However, the administration should be monitored due to its adverse effects. TwHF preparations are likely to improve the physical function and quality of life in patients with RA, not just laboratory outcomes. More than half of the included SRs (66%) showed the significant decrease for swollen joint count and tender joint count in the TwHF preparations group, 48.14% for morning stiffness, and only 26% for ACR (20/50/70). But as aforementioned, we should consider the inadequacy of the available evidence and be cautious when recommending TwHF preparations as a treatment for RA patients.
As we all know, the quality of a systematic review depends on the quality of the original research. erefore, well-designed primary studies should be carried out in the future. e composite outcome total effective rate was used as a primary outcome with a simple rate calculation formula in most studies, whereas relieving joint pain was the internationally considered outcomes. For the sake of producing accepted efficacy evidence of TwHF preparations in the treatment of RA, future studies should select wellrecognized outcomes and related measurements that are recommended by expert consensus or by international guidelines [66]. When evaluating the effects of TwHF preparations, we should not only consider the laboratory outcomes and physician-reported outcomes but also take into account patient-reported outcomes (such as quality of life), which can comprehensively evaluate the efficacy of TwHF preparations in the treatment of RA. Additionally, none of the included SRs mentioned follow-up. Considering that RA is a progressive disease with a long disease course, future studies should attach importance to the follow-up period to further assess the long-term efficacy of TwHF preparations for treating RA as well as fully monitoring its toxicity.
Last but not least, we strongly recommend authors of future SRs conduct and report SRs adhering to the AMSTAR-2 tool, ROBIS tool, and PRISMA statement.

Strengths and Limitations.
To the best of our knowledge, this is the first systematic overview to explore the evidence of TwHF preparations for RA by using the AMSTAR-2, ROBIS, PRISMA, and GRADE. From the current overview, the quality of the SRs/MAs and body of evidence across outcomes are presented, which may be helpful for the research and clinical practice of TwHF preparations in treating RA. However, there are several limitations in this overview that should be taken into account. We only searched SRs in English and Chinese, which might produce publication bias. Although there are overlapping studies across the included SRs, we did not remove duplicate data and duplicate studies. As we are not aimed to resynthesize the data to evaluate the efficacy of the intervention, the overlap is unlikely to have an impact on the conclusion. e author team members may have their own subjective views during the evaluation, which could result in bias and influenced the research findings. Finally, out of the 27 included SRs, 26 from Chinese researchers supported the use of TwHF preparations for RA, whereas one SR from the British researchers disapproved the use of TwHF preparations, which may be judged as certain ethical bias.

Conclusion
TwHF preparations may be a complementary and alternative treatment for RA; however, it must be used carefully and monitored for its potentially severe toxicity. e quality of published SRs/MAs is unsatisfactory; hence, further standardized and rigorous SRs/MAs and RCTs are warranted to provide strong evidence for definitive conclusions.