In the last decade, a new statistical methodology, namely, network meta-analysis, has been developed to address limitations in traditional pairwise meta-analysis. Network meta-analysis incorporates all available evidence into a general statistical framework for comparisons of all available treatments. A further development in the network meta-analysis is to use a Bayesian statistical approach, which provides a more flexible modelling framework to take into account heterogeneity in the evidence and complexity in the data structure. The aim of this paper is therefore to provide a nontechnical introduction to network meta-analysis for dental research community and raise the awareness of it. An example was used to demonstrate how to conduct a network meta-analysis and the differences between it and traditional meta-analysis. The statistical theory behind network meta-analysis is nevertheless complex, so we strongly encourage close collaboration between dental researchers and experienced statisticians when planning and conducting a network meta-analysis. The use of more sophisticated statistical approaches such as network meta-analysis will improve the efficiency in comparing the effectiveness between multiple treatments across a set of trials.
With the rise of evidence-based medicine movement in the last two decades, systematic reviews and meta-analyses have been widely used for synthesis of evidence on beneficial and/or harmful effects of different treatments. Results from those reviews and meta-analyses provide important information for drawing clinical guidelines and making health policy recommendations. For most clinical conditions, several interventions (which may be drugs, medical devices, surgeries, or a combination of them) are usually available, but most systematic reviews of randomised controlled trials (RCTs) tend to limit their scopes by only evaluating two active treatments or comparing one treatment to a control. Even if a systematic review evaluates multiple treatments, traditional meta-analysis can only perform pairwise comparisons.
There are several limitations to this approach [
In the last decade, a new statistical methodology, namely, network meta-analysis, has been developed to address those limitations [
Although systematic reviews with network meta-analysis for evidence synthesis has been published in mainstream medical journals [
The basic rationale behind network meta-analysis is simple: suppose we have three treatments A, B, and C. Results from RCTs comparing A and B provide direct evidence on the difference in the treatment effects between A and B. In contrast, results from RCTs comparing A–C and those comparing B-C provide indirect evidence on the difference between A and B. The three treatments A, B, and C therefore form a network for treatment effect comparisons (Figure
Diagram for the network of three treatments A, B, and C.
There are several assumptions for the network meta-analysis to yield meaningful results [
It has been suggested that results from the network meta-analysis may be less trustworthy than results from multiple pairwise comparisons, because indirect evidence is less reliable than direct evidence and more prone to biases [
Consistency in direct and indirect evidence is another assumption behind network meta-analysis. Suppose results from trials comparing A with B show A is on average better than B, and trials comparing B with C show B is on average better than C, and indirect comparisons will then show A is better than C. If trials comparing A with C also show A is on average better than C, the indirect and direct evidence is consistent. However, what if direct evidence shows C is better than A? Does this contradiction mean that evidence from indirect comparisons is unreliable and should be disregarded?
If this argument is true, it implies the direct comparisons for A-B and B-C cannot be trusted either. This can be applied to all pairwise comparisons in the network. Therefore, when there is inconsistency in the direct and indirect evidence, the issue is not whether indirect comparisons are less reliable but how this inconsistency may be explained. If trials involved in the indirect comparisons have a better quality and fewer biases, results from indirect evidence may be more reliable than those from direct evidence. Heterogeneity in study protocols (e.g., doses, follow-up time, etc.), patient populations (age, underlying medical conditions, clinical settings, etc.), and methods for the assessment of outcomes is likely to be the source for inconsistency, while random variations may also cause inconsistency. Therefore, it is imperative to check consistency in evidence when undertaking network meta-analysis, and we will explain later in this paper how to conduct a simple test for checking it. Interpretation of network meta-analysis should always take inconsistency into account as the interpretation of traditional meta-analysis should take heterogeneity into account [
Different statistical approaches have been proposed in the literature for comparing multiple treatments [
Statistical models for network meta-analysis become more complex, when some of the included studies have more than 2 treatment arms. This is because the differences in treatment effects within a study with multiple arms are not independent. Currently, a Bayesian approach to network meta-analysis developed by researchers in the Bristol and Leicester Universities is the most popular approach in the literature and used for the analysis of our example data in the next section [
Bayesian network meta-analysis, also known as mixed treatments comparison, uses a Bayesian statistical framework for a synthesis of direct and indirect comparisons of different treatments [
In Bayesian analysis, posterior distributions are usually quite complex, so software packages use simulations-based algorithms known as Markov Chain Monte Carlo method to derive the approximate posterior probability distributions. Markov Chain Monte Carlo method starts with a set of initial values and then runs an iteration process to obtain the approximate posterior distribution. Samples are then taken from these posterior distributions for calculating each parameter in the model. Because Bayesian approach is a simulations-based methodology, it also has several other advantages. For instance, it can estimate predicted treatment effects based on the observed data, and it can provide ranking for treatments, which is especially useful when the differences between treatments are small. The statistical software for Bayesian approach is also more flexible than standard software in adapting to each unique situation where, for example, some studies have more than two treatment arms and where studies have different designs, such as parallel-group and split-mouth design.
The downside to Bayesian approach is that the statistical theory and estimation method for Bayesian network meta-analysis are mathematically advanced, and the software for Bayesian analysis, such as WinBUGS, has a steep learning curve. The website of the department of community-based medicine in Bristol University (
In this section, we used an example to illustrate how to undertake a network meta-analysis. We recently conducted a systematic review for the effectiveness of guided tissue regeneration (GTR), enamel matrix derivatives (EMD), and their combination therapies on the treatment of periodontal infrabony lesions [
Summary of studies included in the network meta-analysis for CAL gain. SE: standard error; FO: flap operation; GTR-N: guided tissue regeneration with nonresorbable membranes; GTR-R: guided tissue regeneration with resorbable membranes.
Study | Mean | SE | Treatment | Study design |
---|---|---|---|---|
Sculean et al. 2001 [ |
1.70 | 0.40 | FO | parallel-group |
Sculean et al. 2001 [ |
3.10 | 0.40 | GTR-R | parallel-group |
Silvestri et al. 2000 [ |
1.20 | 0.33 | FO | parallel-group |
Silvestri et al. 2000 [ |
4.80 | 0.66 | GTR-N | parallel-group |
Zucchelli et al. 2002 [ |
2.60 | 0.15 | FO | parallel-group |
Zucchelli et al. 2002 [ |
4.90 | 0.29 | GTR-N | parallel-group |
Mayfield et al. 1998 [ |
1.30 | 0.40 | FO | parallel-group |
Mayfield et al. 1998 [ |
1.50 | 0.42 | GTR-R | parallel-group |
Tonetti et al. 1998 [ |
2.18 | 0.18 | FO | parallel-group |
Tonetti et al. 1998 [ |
3.04 | 0.20 | GTR-R | parallel-group |
Cortellini et al. 2001 [ |
2.60 | 0.24 | FO | parallel-group |
Cortellini et al. 2001 [ |
3.50 | 0.28 | GTR-R | parallel-group |
Cortellini et al. 1995 [ |
2.50 | 0.46 | FO | parallel-group |
Cortellini et al. 1995 [ |
4.70 | 0.53 | GTR-N | parallel-group |
Cortellini et al. 1996 [ |
2.30 | 0.23 | FO | parallel-group |
Cortellini et al. 1996 [ |
5.20 | 0.40 | GTR-N | parallel-group |
Cortellini et al. 1996 [ |
4.6 | 0.35 | GTR-R | parallel-group |
Paolantonio et al. 2008 [ |
1.50 | 0.25 | FO | parallel-group |
Paolantonio et al. 2008 [ |
3.10 | 0.34 | GTR-R | parallel-group |
Stavropoulos et al. 2003 [ |
1.50 | 0.58 | FO | parallel-group |
Stavropoulos et al. 2003 [ |
2.90 | 0.54 | GTR-R | parallel-group |
Blumenthal and Steinberg 1990 [ |
0.75 | 0.06 | FO | split-mouth |
Blumenthal and Steinberg 1990 [ |
1.17 | 0.03 | GTR-R | split-mouth |
Pritlove-Carson et al. 1995 [ |
1.73 | 0.36 | FO | split-mouth |
Pritlove-Carson et al. 1995 [ |
1.78 | 0.45 | GTR-N | split-mouth |
Ratka-Kr |
4.00 | 0.77 | FO | split-mouth |
Ratka-Kr |
4.18 | 0.64 | GTR-R | split-mouth |
Loos et al. 2002 [ |
1.29 | 0.31 | FO | split-mouth |
Loos et al. 2002 [ |
1.40 | 0.28 | GTR-R | split-mouth |
Cortellini et al. 1998 [ |
1.60 | 0.38 | FO | split-mouth |
Cortellini et al. 1998 [ |
3.00 | 0.35 | GTR-R | split-mouth |
Chung et al. 1990 [ |
−0.71 | 0.29 | FO | split-mouth |
Chung et al. 1990 [ |
0.56 | 0.18 | GTR-R | split-mouth |
Mora et al. 1996 [ |
2.55 | 0.32 | FO | split-mouth |
Mora et al. 1996 [ |
3.85 | 0.28 | GTR-N | split-mouth |
Traditional pairwise comparisons only use the direct evidence, and they should be undertaken in order to evaluate the consistency in direct and indirect comparisons. We used statistical software Stata (Version 12, StataCorp, College Station, TX, USA) for the analysis, and results were shown in Figure
Forest plot for the three pairwise meta-analyses: flap operation (FO) versus GTR-N, FO versus GTR-R, and GTR-N versus GTR-R.
Funnel plot for the comparison between GTR-R and flap operation. The red line is fitted line from the Egger’s test, indicating a small study bias as studies with small sample sizes tended to show greater treatment benefit for GTR-R.
Bayesian network meta-analysis was then undertaken using the software WinBUGS (MRC Biostatistics Unit, Cambridge, England). WinBUGS codes used in our analysis were a modification of codes available on the website of the department of community-based medicine in Bristol University to accommodate continuous outcomes and studies with split-mouth design. Noninformative priors were used throughout the analyses. Markov Chain Monte Carlo method with 50,000 burn-in and further 50,000 simulations and with three chains of different initial values (i.e., 150,000 simulations in total) was used to obtain medians and 95% credible intervals (i.e., the 2.5 and 97.5 percentiles of simulation results), which may be interpreted as the likely range of the estimated parameter by excluding the extreme values. Results from the Bayesian network meta-analysis using the 17 studies were similar to those from the traditional pairwise comparison: GTR-N and GTR-R achieved 1.88 mm (95% credible intervals (CrI): 1.15 to 2.63) and 0.99 mm (95% CrI: 0.48 to 1.52) greater CAL gain than flap operation, respectively; GTR-N achieved 0.88 mm (95% CrI: 0.09 to 1.78) greater CAL gain than GTR-R. Figure
Diagram for the network meta-analysis. The width of lines is proportional to the number of studies included in the pairwise comparisons. The estimates for the differences in treatment effects from traditional meta-analysis were in black, whilst those from the Bayesian network meta-analysis were in blue.
Figure
Note that, for any loop, results of testing inconsistency remain the same, irrespective of the reference group chosen from the loop.
Because Markov Chain Monte Carlo method for estimation used by Bayesian analysis is a simulation-based approach, we can calculate the rank for each treatment according its performance in each simulation [
Treatment rankings. The bar chart showed the probability of each treatment for being the best, the second best, and the third in terms of CAL gain.
As shown in our example, the main difference between a network meta-analysis and multiple pairwise meta-analyses for comparisons of multiple treatments is that the former use both direct and indirect evidence whilst the latter only use the direct evidence. All meta-analyses require comprehensive literature search, careful evaluation of available studies, and attentions to potential biases and heterogeneity. In this section, we discussed a few practical issues that may arise in conducting a network meta-analysis in dental research and provided some recommendations on how to deal with those issues.
Decisions on which studies should be included in the network meta-analysis are not always straightforward when undertaking network meta-analysis, because indirect evidence can be very broad [
When there are more than two treatments in a trial, the study-specific treatment effects are unlikely to be independent; for example, suppose a trial has three treatment arms, GTR-N, GTR-R, and flap operation, and they are carried out by experienced periodontists. Consequently, treatment effects of both GTR-N and GTR-R compared to flap operation are likely to be greater than those in a trial where treatments are carried out by a periodontist in training. Special care has to be taken to avoid using the same control group more than once in estimating the differences in treatment effects between the test groups and the control group [
In our example, there is no significant inconsistency in direct and indirect evidence, and, consequently, the discrepancy between results from traditional pairwise and Bayesian network meta-analyses is small. If the inconsistency test is significant, the discrepancy is likely to be large, as results from the Bayesian analysis are a combination of direct and indirect evidence. Substantial discrepancies usually indicate the heterogeneity, and similarity assumptions as discussed in Section
The most common missing information encountered in conducting a meta-analysis in dental research is the standard errors for the mean treatment effects which are usually used as the weights for combining the treatment effects reported by different studies, when the outcomes are continuous variables such as CAL. For binary outcomes, such as success or survival, treatment effects and its confidence intervals are derived from the numbers of patients in different categories, for example, the numbers of patients with or without success in different treatment groups. When the outcome is a continuous variable, we need the mean and its standard error for meta-analysis. The most common scenario is that a study reported the mean treatment effects and their standard errors without reporting the difference in treatment effects and its confidence intervals. Table
Results from a hypothetic study with two treatment groups. The outcome is the mean clinical attachment level (CAL) and the standard deviation in brackets.
CAL | Baseline | Followup at 12 months | Change |
---|---|---|---|
Test | 10.5 (1.9) | 6.2 (1.5) | 4.3 (1.3) |
Control | 10.3 (1.8) | 8.4 (1.6) | 1.9 (1.1) |
The change in the outcome CAL for the test group is 4.3 mm with standard deviation 1.3 mm, and the change in the outcome CAL for the control group is 1.9 mm with standard deviation 1.1 mm. Suppose this study uses a parallel-group design and has 10 patients in each group; the mean difference in treatment effects between the test and control groups is
In a study with parallel-group design, the two treatments effects are supposed to be independent, but, in a study with split-mouth design (e.g., two teeth of the same patient are randomly assigned to the test and control groups), the treatment effects are likely to be correlated [
Imputing missing values can increase the number of included studies in a meta-analysis, and this is especially useful for a network meta-analysis when the number of treatments in comparisons is large but the number of studies is relatively few.
Network meta-analysis is an extension of pairwise comparisons of treatments to comparisons of all available treatments by incorporating both direct and indirect evidence. This new methodology has become widely adopted by meta-analysts in medical research and has been proven to be a very useful tool for evidence synthesis. The use of more sophisticated statistical approaches such as network meta-analysis can improve the efficiency in comparative effectiveness research and in the quality of decisions making. This tutorial aims to bring this new methodology to the attentions of dental researchers and to facilitate its adoption but also highlight several important issues in conducting and interpreting a network meta-analysis. Further information and technical details can be found in [
The authors declare that they have no conflict of interests.
The first author was funded by the United Kingdom government’s Higher Education Funding Council for England (HEFCE).