Misuse of Statistical Methods in 10 Leading Chinese Medical Journals in 1998 and 2008

Statistical methods are vital to biomedical research. Our aim was to find out whether progress has been made in the last decade in the use of statistical methods in Chinese medical research. We reviewed 10 leading Chinese medical journals published in 1998 and in 2008. Regarding statistical methods, using a multiple t-test for multiple group comparison was the most common error in the t-test in both years, which significantly decreased in 2008. In contingency tables, no significant level adjustment for multiple comparison significantly decreased in 2008. In ANOVA, over a quarter of articles misused the method of multiple pair-wise comparison in both years, and no significant difference was seen between the two years. In the rank transformation nonparametric test, the error of using multiple pair-wise comparison for multiple group comparison became less common. Many mistakes were found in the randomised controlled trial (56.3% in 1998; 67.9% in 2008), non- randomised clinical trial (57.3%; 58.6%), basic science study (72.9%; 65.5%), case study or case series study (48.4%; 47.2%), and cross-sectional study (57.1%; 44.2%). Progress has been made in the use of statistical methods in Chinese medical journals, but much is yet to be done.


INTRODUCTION
Statistics play a key role in biomedical research [1][2][3][4][5][6]; their correct use is thus essential to a high-quality study. The misuse or inaccurate use of statistical methods may point the research in the wrong direction and produce incorrect study results.
China produces a large number of biomedical articles. According to the database of the Institute for Scientific Information (ISI), there has been a significant increase in the quantity and quality of Chinese biomedical publications in the last two decades, especially in the last decade [7]. However, it is common to find inappropriate statistical methods in Chinese medical journals. He et al. reported in 2009 that many more statistical errors existed in Chinese medical journals than in international journals [8].
Our previous study compared the research design, statistical analyses, and presentation and interpretation of results of 10 leading Chinese medical journals published in 1998 and 2008 in Chinese [9]. The main results we obtained were the frequencies of different types of study design, defective proportions in design and statistical analyses, and the inappropriate presentation and interpretation of results. Further, we mentioned that the most frequently used statistical methods were still the simple tests, although more sophisticated statistical methods were already being applied in 2008. As for the study design, our focus was primarily on retrospective studies, with clinical trials receiving relatively little attention.
In this research, we again used the 10 leading Chinese medical journals published in 1998 and 2008 and extracted new data on the misuse and inaccuracy of each statistical method as an extension. We listed and compared the most common errors of each method that appeared in medical articles in 1998 and 2008. We also compared the proportions of the incorrect use of statistical methods in different study designs between the two years; in our previous study [9], we had compared the proportions of design defects in various study designs. All statistical procedures and methods in each article were reviewed, and trends in the misuse of statistical methods were reported. We summarised the progress that had been made during the past 10 years and discussed the current concerns about Chinese medical journals. We analysed the possible reasons of the main errors and suggested some improvements on the quality of Chinese medical journals.

Errors in the Different Statistical Methods
As described in our previous study, 492 and 570 articles used the t-test in 1998 and 2008; 319 and 523 used contingency tables; 202 and 446 used ANOVA; 67 and 187 used the rank transformation nonparametric test [9]. The specific errors that occurred in the four types of statistical methods of both years are listed in Table 1.
There were two main errors in the rank transformation nonparametric test. One was the use of multiple pair-wise comparison for multiple groups (χ 2 = 4.43, P = 0.035, OR = 2.21, 95% CI: 1.04 to 4.47), although fewer errors of the sort were found in 2008. The other one, wherein the wrong type of rank sum test was used for different study types, did not show a significant difference (χ 2 = 2.07, P = 0.150, OR = 3.89, 95% CI: 0.85 to 17.88).

Misuse of Statistical Methods in Different Study Designs
Despite the significant growth in use of statistical methods, substantive errors still existed in different study designs. Table 2 shows the quantities and proportions of the statistical methods used and the errors that were found. Errors mainly occurred in clinical trials, basic science study, and retrospective study.
In the clinical trials, over half of the articles with statistical methods had mistakes in both years. No statistical significance was seen in the clinical trials during the last 10 years for randomised controlled trials (χ 2 = 1.70, P = 0.192, OR = 0.61, 95% CI: 0.29 to 1.29) and nonrandomised clinical trials (χ 2 = 0.02, P = 0.878, OR = 0.95, 95% CI: 0.48 to 1.87). A mass of statistical errors existed in basic science study, which was used frequently in both years. The proportions of errors were 72.9% (175/240) in 1998 and 65.5% (268/409) in 2008 (χ 2 = 3.81, P = 0.051, OR = 1.42, 95% CI: 1.00 to 2.01). The situation was equally worrisome in retrospective study, case-control study, and case study or case-series study. Although a downward trend in mistakes was seen in case-control study (χ 2 = 7.05, P = 0.008, OR = 1.59, 95% CI: 1.13 to 2.24), there was no significant improvement in case study or case-series study (χ 2 = 0.04, P = 0.837, OR = 1.05, 95% CI: 0.68 to 1.62). It was gratifying to see a significant drop in the proportion  Incorrect use of n (%): for each statistical method, n is the number of articles using this statistical methods incorrectly and the percentage = n/the number of papers using this statistical methods × 100%; for each error under certain statistical methods, n is the number of articles with this mistake and the percentage = n/ the number of papers using these statistical methods × 100%. All articles n (%): n is the number of articles of each type of study design and percentage = n/N× 100%. Articles that used statistical methods n (%): n is the number of articles using statistical methods in each type of study design and the percentage = n/the number of articles of each type of study design, incorrect use of statistical methods n (%): n is the number of articles using statistical methods incorrectly and percentage = n/the number of articles using statistical methods in each type of study design. of errors in cohort study (χ 2 = 19.01, P < 0.001, OR = 5.46, 95% CI: 2.48 to 12.05), but no improvement was observed in cross-sectional study (χ 2 = 1.80, P = 0.180, OR = 1.68, 95% CI: 0.79 to 3.60).

Possible Reasons for the Occurrence of Errors
Among the errors, the biggest problem was the inappropriate choice of statistical methods. The possible reason for this was that not much attention was paid to the distributional characteristics of the variables and the nature of the data. Apparently, due to the researchers' lack of basic knowledge of statistics, they ignored the application condition of a certain method. When the quantitative data did not meet the prerequisites for parametric tests, they blindly applied the tests. Many researchers mistakenly believed that the Chi-square test was a universal tool for dealing with contingency tables, and they used it on data without taking the data characteristics into consideration. Some multifactorial experimental studies were split into a series of single-factor studies, which dissevered the intrinsic link or interactions among factors and led to one-sided or even wrong conclusions. Park et al. stated that the selection of the correct statistical method depends on the data structure and underlying statistical assumptions [10]. However, some errors were very common among articles, and they were wrongly cited or used by others, resulting in a vicious circle. As Altman DG said, "once incorrect procedures become common, it can be hard to stop them from spreading through the medical literature like a genetic mutation" [11].

Correct Methods Should Be Used in These Situations
Regarding the t-test, the most frequent error was using multiple t-tests for multiple group comparison, which may increase the probability of making a Type 1 error. there are several methods for multiple comparison, such as the Bonferroni method, Scheffé method, Tukey method, Newman-Keuls method, and Duncan method [12]. Around 4.95% and 6.95% of the articles which used ANOVA in 1998 and 2008 employed one-factorial ANOVA to analyse data from multifactorial designs. One-factorial ANOVA is used when there is only one experimental factor; when two or more experimental factors are involved, multifactorial ANOVA should be used [13,14]. The t-test and standard ANOVA require independent data that have no correlation with each other. Repeated-measure data do not meet this requirement; instead, repeatedmeasures ANOVAs or mixed-effects models should be used. Mixed-effects models are recommended, as they have greater flexibility to model time effects and can handle missing data more appropriately [15]. A common error encountered in contingency tables in both years was that there did not exist continuity correction or Fisher exact test if needed. It is considered incorrect to use the Chi-square test directly in contingency table analysis if the total sample size is not more than 20, or if more than 20% of the expected frequencies are less than five; Fisher's exact test should be applied in both cases [16]. Nonparametric tests are often used in place of parametric tests when the assumptions of the parametric test have been grossly violated (e.g., if the distributions are too severely skewed.) Nonparametric tests are also recommended for small sample sizes or data sets with many ties. The error proportions of using the Chi-square test for ranked data were 9.09% and 5.93% in the two years. Instead of a Chi-square test, a rank transformation nonparametric test should be used on ranked data. For study designs, the more complex the study design was, the more mistakes in statistical methods were likely to appear.

Progress and Worries
In general, progress has been made in the statistical methods of Chinese medical journals in the last decade. The percentage of articles using statistical methods has increased, and the proportion of errors has significantly decreased in most of the statistical methods and study designs. This conclusion was consistent with what Wang et al. reported in 1998 that the proportion of papers in Chinese medical journals using appropriate statistical methods had increased in 1995 compared with 1985 [17]. From this point, we can see that Chinese medical researchers have made great efforts to employ statistical methods in their studies. However, we cannot be overoptimistic because the situation is very far from satisfactory. Although statistical errors also exist in the medical journals of western countries, the proportion is smaller. McGuigan reviewed all papers published in the British Journal of Psychiatry in 1993 and found that 40% of the papers contained statistical errors [18]. Welch and Gabbe reviewed 145 clinical articles published in American Journal of Obstetrics and Gynecology in 1994 and pointed out 46 articles (31.7%) that were deemed to have applied statistics inappropriately [19]. Kurichi and Sonnad reported that only 27% of the studies in five selected surgical journals of America in 2003 included incorrect selection or reporting of statistical methods [20]. Another study, conducted by Neville et al., assessed the frequency of statistical errors in dermatological literature. The study revealed that only 14% of the articles with statistical analysis contained errors in the methods; 26.5%, in the presentation of the results; 2.6%, in both [21].
In China, the situation is quite depressing, as the error rate of statistical methods remains high. In ANOVA, the total error rate hit approximately 60% in 1998 and 2008. Many mistakes were made even in the most basic aspects. For instance, 31.10% of articles used t-test for multiple group comparison in 1998 and 22.63% in 2008. In clinical trials, over half of the articles had statistical errors. The proportion of errors was extremely high in basic science study and retrospective study, even if these were frequently used.
In addition, many sophisticated statistical methods, such as analysis of covariance, repeated-measures analysis, logistic regression, and survival analysis were seldom used in Chinese medical journals-an observation that was also made by Wang and Zhang in their study [17]. This suggests that a large amount of data is not being efficiently analysed, so that much of the information is wasted. Considering the high incidence of errors in the simple statistical methods, it is not hard to imagine how bad the situation is with regard to sophisticated statistical methods. Moreover, since we studied only the 10 leading medical journals, it is likely that our results were above the actual average of Chinese medical journals. It must be noted, though, that some Chinese research papers published in international medical journals, whose statistical methodology might be of better quality, were not included in this study. Thus, as our next step, we intend to conduct a survey on Chinese clinical studies that have been published in international journals.

RECOMMENDATIONS
The 10 medical journals we selected are representative of excellent Chinese medical journals. Nevertheless, there is still a wide gap between them and the international journals with respect to statistics. Some measures are needed to decrease the errors in the statistical methods and improve the quality of articles.
Firstly, clinicians and medical researchers should correct their attitude about writing. Their purpose of publication should be to make their results known to their colleagues and raise the level of medical science of mankind; it should not be personal aggrandisement. Only correct medical outcomes can benefit people. Secondly, statistical education should be enhanced among clinicians and researchers; they should have a basic concept of statistics and study design. An integrated and detailed protocol should be made beforehand. And when performing and analyzing RCTs, CONSORT statement is recommended as a guideline, which is accepted internationally now. Thirdly, statisticians should assume an important role in the research; in other words, a research group should include a statistician as a consultant. Finally, statistical reviewers should be included in the editorial boards of the journals. Some journals merely intend to make profits through page charges, publishing random articles without taking quality into account. Measures must be implemented to prevent such practices.