Blinding Measured: A Systematic Review of Randomized Controlled Trials of Acupuncture

Background. There is no agreement among researchers on viable controls for acupuncture treatment, and the assessment of the effectiveness of blinding and its interpretation is rare. Purpose. To systematically assess the effectiveness of blinding (EOB) in reported acupuncture trials; to explore results of RCTs using a quantitative measure of EOB. Data Sources. A systematic review of published sham RCTs that assessed blinding. Study Selection. Five hundred and ninety studies were reviewed, and 54 studies (4783 subjects) were included. Data Extraction. The number of patients who guessed their treatment identity was extracted from each study. Variables with possible influence on blinding were identified. Data Synthesis. The blinding index was calculated for each study. Based on blinding indexes, studies were congregated into one of the nine blinding scenarios. Individual study characteristics were explored for potential association with EOB. Limitations. There is a possibility of publication or reporting bias. Conclusions. The most common scenario was that the subjects believed they received verum acupuncture regardless of the actual treatment received, and overall the subject blinding in the acupuncture studies was satisfactory, with 61% of study participants maintaining ideal blinding. Objectively calculated blinding data may offer meaningful and systematic ways to further interpret the findings of RCTs.


Introduction
The presence of a viable control is important for any research study, allowing for valid comparison to the condition of interest. Randomized controlled trials (RCTs) are the primary way that many areas of research examine the comparative effectiveness of verum and control conditions. Patient blinding in these studies is essential for gathering reliable results that can be expanded upon. However, blinding success in nonpharmacologic interventions, such as acupuncture, can be difficult to assess, and results may not be applicable from one study to the next, due to differences in control methods and in how blinding success is determined, if it is considered at all. The CONSORT 2010 Statement on blinding suggests that it is important to include how blinding was attempted, but there is no mention in the statement of how to assess whether blinding was successful [1]. Without knowing if the specific nonpharmacologic blinding techniques used are valid overall, the information collected may or may not be reliable. We believe it is imperative that blinding success 2 Evidence-Based Complementary and Alternative Medicine be assessed toward better understanding of the reliability of results.
With respect to acupuncture research specifically, developing a control procedure that is physiologically inert and indistinguishable from true treatment has proven to be a challenge due to the very nature of traditional acupuncture. It has resulted in a variety of control methods that up until now have not been compared in terms of blinding success [2][3][4][5]. The traditional techniques of acupuncture (penetrating needles at acupoints for different organ systems and deqi) are the verum conditions of most acupuncture studies. Control conditions include penetrating needles at "nonpoints" and "wrong points, " commercially-developed nonpenetrating devices, homemade nonpenetrating needles, and toothpicks or cocktail sticks [6][7][8][9]. "Nonpoints" are points not used for any purpose in traditional acupuncture. For the "wrong points, " sham acupuncture is done at points thought to affect a different body system than the one targeted by the verum condition. However, the "wrong point" method may cause a physiological effect similar to that of acupuncture and, therefore, may be more appropriately considered to be verum than sham acupuncture [10].
To date, there is no standard or universally accepted sham procedure for acupuncture research and no quantitative comparison of blinding between the above sham methods. This may contribute to why there is a discrepancy between the clinically recognized effectiveness of acupuncture and the relative lack of research supporting it [11]. Methodological progress for blinding characteristics, including the amount of disclosure to study participants, the variables to be collected, the analytic design, and the interpretation strategy, is needed in validation studies of sham control procedures [12].
The present meta-analysis systematically examines the status of blinding in sham acupuncture RCTs via a numerical measure of blinding index. Our primary aim is to empirically evaluate the validity (via effectiveness of blinding) of sham control techniques in order to quantitatively assess blinding across available studies. We hope to determine the reliability of the results of studies that used different sham techniques, so that we may learn which sham methods are most useful for future acupuncture research. We also believe that our systematic review is just the beginning of increasing the validity of the assessment of quantitatively assessed blinding practices in acupuncture research. Our methods can be further extended as a model to assess other nonpharmacologic treatments' blinding techniques.

Data Sources and Searches.
PubMed, Embase, and Web of Knowledge databases were searched for scientific articles using the keywords "acupuncture, " "sham acupuncture, " or "sham procedure. " A revised search was also performed with Ovid Medline using the keywords "acupuncture, " "sham acupuncture, " or "placebo acupuncture. " Eligible studies were those that were randomized controlled in humans and were published in English between 1985 and 2011. Our search was not limited by patient diagnosis or by the part of the body where acupuncture was administered.

Study Selection.
A study was considered eligible for initial screening if the authors stated that they evaluated blinding. One of the authors (Moroz, Freed, or Tiedemann) determined if the study reported data on effectiveness of blinding (EOB) by specifically asking participants if they thought the treatment used verum (V, real needle used or needle penetrated the skin) or sham (S) needling techniques, with or without an option to say they did not know (DK). If the blinding evaluation data was not included or was unclear, the authors were contacted twice by e-mail and/or telephone and asked for additional blinding evaluation data or clarification. Studies whose authors did not respond or responded but no longer had access to the original data were excluded. Studies that used a credibility questionnaire, asking patients to choose whether or not they had received treatment based on the principles of Chinese medicine or another type of acupuncture, were excluded as well. In these studies, the patients' ability to distinguish between verum and sham needling was not directly addressed and the questionnaire could be misinterpreted. If EOB was measured multiple times within the same study, the data collected at the end of the study or the data collected from the insertion (as opposed to sensation of deqi or needle withdrawal) arm of the study would be used. In one work that reported results of two separate studies using different acupuncture points, one on the patients' back and the other on the upper extremity, the blinding index calculations were done independently [58].

Data Extraction.
For the studies that were included in this meta-analysis, the number of patients who responded as V, S, or DK was extracted from each trial. Additionally, the following variables were extracted from each of the included studies (we hypothesized a priori that these may be associated with EOB): year of publication, subject only or staff and subject blinding, time of blinding assessment, assessment of deqi, type of sham control device used, use of penetrating or nonpenetrating sham control, patient diagnosis, and number of days without acupuncture experience prior to participation. Research staff members that were blinded were those involved in data analysis and interpretation, not those administering acupuncture.

Data Synthesis and Analysis.
Statistical analyses were carried out using the blinding index (BI) in order to objectively assess EOB [63,64]. Blinding index estimates the degree of potential unblinding beyond chance for each arm in a given study by counting the excessive numbers of correct guesses. Blinding index values are always between −1 and 1, where 1 corresponds to all correct guesses, whereas −1 corresponds to all incorrect or opposite guesses. If 50% of patient responses are correct and 50% are incorrect, then BI = 0; this is indicative of random guessing and thus is an ideal blinding scenario. Another plausible scenario indicating effective blinding is that patients tend to believe they received active treatment regardless of actual treatment 106 randomized controlled studies were found using the keywords "acupuncture, " "sham acupuncture, " or "sham procedure. " 50   received, which may reflect patients' wish to receive active intervention. In this case, blinding index will have a positive value in the active treatment arm and a negative value in the sham treatment arm, where this scenario is denoted later as unblinded/opposite.
Verum and sham acupuncture groups were each assigned a separate blinding index value. Based on the calculated blinding index value combinations for the two treatment arms, nine possible blinding scenarios were proposed (Table 1). For classification purposes, we decided to consider 4 Evidence-Based Complementary and Alternative Medicine  [13,64,65]. (Remark: this cutoff value was based on authors' consensus and used as a general tool for classification and explanation; it should not be interpreted as an absolute indication of blinding effectiveness.) Individual variables hypothesized to potentially impact EOB were compared by their average VBI and SBI scores, weighted by sample size. The weighted averages were used to determine the overall blinding index scenario for each variable. Blinding scenarios were also compared to the overall outcome of each study. Finally, we looked for patterns of possible association of study design factors with EOB based on blinding index values and scenarios. The factors included were based on data extraction criteria, sample size, timing of blinding assessment, blinded parties, sensation assessed, subject's status, subject's experience, and sham device used.

Data Search.
Using our search inclusion criteria, 590 peer reviewed journal articles were found, with 186 of these reporting blinding data in RCTs. 133 studies were excluded from the review, most often due to a lack of patient guess of treatment allocation. One article reported two distinct studies that were included separately [58]. Fifty-four studies were included in our final analysis, with a total of 4783 patients ( Figure 1).

Blinding Index Calculations.
The blinding index values (point and interval estimates) computed from all 54 studies can be found in Figure 2. The average weighted blinding index values for the entire review were 0.34 for verum and −0.20 for sham groups, respectively. Overall, a correct guess is quite common in the verum arm, and opposite guess is not uncommon in the sham arm.
After grouping studies into the nine possible blinding scenarios based on the blinding index (Table 1), 33 out of 54 (61%) of the studies might be adequately blinded. Of these, 70% (23/33) had a positive outcome reported overall; similarly, 62% (13/21) with less ideal blinding had an overall positive outcome (Table 3). Unblinded/opposite for V, S is most common, with 46% of the studies belonging to this scenario, followed by unblinded/random with 22%.

Design Characteristics and Effectiveness of Blinding.
The variables hypothesized to affect blinding were compared by their average VBI and SBI values, and blinding scenarios in Table 2.
Evidence-Based Complementary and Alternative Medicine 5  Of the 54 studies, 22 studies used a commercially developed sham control device, 14 studies used a custom-made sham control device, 12 used penetrating sham control, and 6 used a toothpick or cocktail stick. According to their averaged blinding index scenarios, all of the sham control devices with the exception of custom devices seemed to be effective in blinding the subjects, with the penetrating sham controls providing relatively more effective blinding.
In looking at the penetrating versus nonpenetrating dichotomy in a more direct way, the nonpenetrating group was unblinded/opposite, the mixed penetrating/nonpenetrating group was unblinded/opposite, and the penetrating group was random/random. All of these scenarios indicated effective blinding.
Studies with a greater number of subjects had a greater unblinding in the verum group. Measurements assessed later tended to have more ideal effectiveness of blinding than measurements assessed immediately. Interestingly, there was a higher tendency in both arms, when only the subjectsnot staff-were blinded, to believe they received the verum treatment. The deqi group had slightly better EOB than the insertion/puncture group, and the symptomatic group had more ideal EOB than the healthy group.
Twenty-four studies used acupuncture-naïve subjects, and 19 used subjects with prior acupuncture experience. Eleven of the studies were excluded from this section of the review due to unknown prior experience or mixed experience of the subjects. According to their averaged blinding index scenarios, both groups were unblinded/random.

Discussion
This systematic review of 54 randomized controlled acupuncture studies showed that overall 61% of the studies (as a conservative estimate) meeting our inclusion/exclusion criteria were effectively blinded. The most common scenario encountered was unblinded in the acupuncture group and opposite guess in the sham acupuncture group, which could be indeed interpreted as "well-blinded. " In this scenario there may be a psychological phenomenon of "wishful thinking. " A majority of people, in both the verum and sham groups, guessed that they received real acupuncture. Thus, guesses are inflated towards real acupuncture in both study arms. It is also possible that once a needle is administered, a subject believes it is real acupuncture, or subjects may not know what to expect, creating a similar trend, or there is a strong placebo effect.
A similar pattern emerges when looking at the V and S groups individually; the average VBI was "unblinded" and the average SBI was "opposite. " Is it possible for one not to know when a needle is penetrating his or her skin? Perhaps the answer is "no, " given that unblinded V may mean that subjects know when their skin is being penetrated by a needle and thus increases the chance a subject chooses the V acupuncture group over the S acupuncture group upon questioning. Is it possible for one to know when a needle is not penetrating their skin? The answer seems to be "no" again; opposite guesses in S may indicate that subjects are not able to tell if they are not being penetrated by a needle, and thus are truly guessing.
Most sham control devices with the exception of custom devices were effective in blinding the subjects ( Table 2). Since there was a great diversity within the custom sham group a more in depth case by case analysis of each device could be performed, but at this time there does not seem to be compelling evidence supporting the use of custom sham devices. Even though commercial sham devices and toothpick/cocktail stick devices appear to provide effective blinding, the penetrating sham controls provided relatively more effective blinding.
By comparing study blinding and study outcomes, the majority of studies reported positive outcomes, regardless of the degree of guess correctness. This leads us to believe that there is no obvious association between EOB and reported study outcomes. The current literature provides conflicting evidence so the direction of bias may be specific to context or treatment [6-9, 14-49, 51-62] or random.
Exploration of individual variables and their possible effect on EOB indicated that some design characteristics such as larger sample size, symptomatic subjects, and later assessment were associated with more effective blinding, and these may be encouraged to be further evaluated and considered in designing future acupuncture trials.

Recommendations for Future Acupuncture
Research. The effect size was not a part of the extracted information from the reviewed studies. This poses an interesting idea for future investigation. Additional future research should include a sufficiently powered, prospective randomized trial comparing the EOB of different methods and sham devices by direct comparison, as well as investigating the influence of practitioner behavior, the patient's expectations and beliefs, and the location of treatment points on blinding effectiveness. A good treatment should have a greater treatment effect  than placebo effect. It is important that we collect more data in this field, especially qualitative data (e.g., reasons of guessing in a particular way). It is possible that the reasons for correct guesses in individual trials may be more revealing than our numbers. Individual trialists should be willing to share their experiences with others, as individual trialists and patients must have greater insight and more stories to tell in specific conditions within the trial than meta-analysis and readers. This is particularly important, because any analysis of available numeric data on blinding is destined to be prone to some biases.

Systematic Review Limitations.
There are several potential weaknesses of this review that we recognize. Language bias may be a possibility given that we included only English language publications while acupuncture is a popular treatment modality in Asia and Europe. There is also a possibility of publication bias, which is an inherent problem in virtually all systematic reviews or meta-analyses. Moreover, it is possible that some investigators collected blinding data, saw the data, and decided not to report it in the paper, particularly when blinding was shown to be unsatisfactory. Also, for classification and decision making purposes, we used categorization/dichotomization (e.g., numerical BI cutoff of 0.2) and summary statistics (e.g., average BI). Alternative ways of analyzing the data could yield different results. Finally, in this systematic review, we used the conventional terms commonly used in the literature and equated "unblinded" with "correctly guessed, " whereas there could be other reasons for correct guesses. With its limitations, to our knowledge, this is the first systematic review based on empirical blinding data.
In conclusion, based on the status of blinding, the most common scenario encountered was a more correct guess in the real acupuncture group and an opposite guess in the sham acupuncture group, and the overall subject blinding of the evaluated acupuncture studies was satisfactory. In addition, quantitatively calculated blinding data, ideally together with more qualitative data for individual trials, may offer meaningful means to further interpret the findings of RCTs and improve the practice in the direction of a higher validity.  Blinding index values with 95% confidence intervals. * Individual blinding index estimate and confidence intervals raw data are provided in Table 3. Confidence intervals are unadjusted for multiple comparisons.