If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
To compare the use of pair-wise meta-analysis methods to multiple treatment comparison (MTC) methods for evidence-based health-care evaluation to estimate the effectiveness and cost-effectiveness of alternative health-care interventions based on the available evidence.
Pair-wise meta-analysis and more complex evidence syntheses, incorporating an MTC component, are applied to three examples: 1) clinical effectiveness of interventions for preventing strokes in people with atrial fibrillation; 2) clinical and cost-effectiveness of using drug-eluting stents in percutaneous coronary intervention in patients with coronary artery disease; and 3) clinical and cost-effectiveness of using neuraminidase inhibitors in the treatment of influenza. We compare the two synthesis approaches with respect to the assumptions made, empirical estimates produced, and conclusions drawn.
The difference between point estimates of effectiveness produced by the pair-wise and MTC approaches was generally unpredictable—sometimes agreeing closely whereas in other instances differing considerably. In all three examples, the MTC approach allowed the inclusion of randomized controlled trial evidence ignored in the pair-wise meta-analysis approach. This generally increased the precision of the effectiveness estimates from the MTC model.
The MTC approach to synthesis allows the evidence base on clinical effectiveness to be treated as a coherent whole, include more data, and sometimes relax the assumptions made in the pair-wise approaches. However, MTC models are necessarily more complex than those developed for pair-wise meta-analysis and thus could be seen as less transparent. Therefore, it is important that model details and the assumptions made are carefully reported alongside the results.
A core component of evidence-based health-care evaluations is to estimate the effectiveness and cost-effectiveness of alternative health-care interventions based on the available evidence. Ideally, effectiveness data are obtained from well-conducted randomized controlled trials (RCTs). Where multiple relevant RCTs exist, appropriate evidence synthesis methods should be used. For comparisons between two specific interventions it is common practice to use pair-wise meta-analysis methods [
] to obtain a pooled estimate of effectiveness that may be used to inform the associated economic analyses. Nevertheless, there may be interest in comparing more than two competing health-care interventions to answer policy-relevant questions, or the interventions of interest may have been trialled against different and multiple comparators. In the former case, it is unlikely that RCTs exist that compare all the interventions of interest directly.
] have been proposed that allow the simultaneous estimation of the comparative effectiveness of multiple treatments using an evidence base of trials that individually do not compare all treatment options. Such methods are a logical extension to more established meta-analysis methods. Currently there is much deliberation regarding the use of MTC methods for health technology assessment (HTA) with the UK National Institute for Health and Clinical Excellence (NICE) (which provides guidance for England and Wales) now advising they can be used, but not as the base case in their Methods Guide [
Report of the Indirect Comparisons Working Group to the Pharmaceutical Benefits Advisory Committee: Assessing indirect comparisons. Pharmaceutical Benefits Advisory Committee, Canberra, Australia, 2009.
In light of this current controversy, the aim of this article is to assess the added value of MTC methods by comparing their use with standard pair-wise meta-analysis models when estimating pooled estimates of effectiveness and to inform decision modelling. Three case studies are considered: 1) use of aspirin to prevent stroke in individuals with atrial fibrillation. This case study considers issues surrounding the expansion of the evidence network used with respect to estimates of effectiveness obtained; 2) use of drug-eluting stents in percutaneous coronary intervention (PCI) in patients with coronary artery disease. This example considers the impact of using an MTC, allowing for estimates at multiple time points, on an economic decision model; and 3) use of neuraminidase inhibitors in the treatment of influenza. In this case study, information of censoring is incorporated into a simple evidence network to inform a decision model. Before these case studies, a brief overview of MTC methods is presented.
Overview of MTC Methods
MTCs extend the more established meta-analysis methods to allow the comparison of three or more interventions [
The trials to be synthesised form a connected network (Examples of diagrammatic representations of trial networks are given in Fig. 1, Fig. 3, Fig. 6) . In each of these, there are no treatments that are isolated and not compared to at least one of the other treatments in the network (i.e., all the treatment nodes are connected by lines indicating that a randomized comparison exists leaving no isolated nodes without a connection).
There is a consistency across the evidence base. Consider a three-treatment network with treatments labelled A, B, and C. The method assumes that, if two-arm trials comparing B versus C exist, then if such trials had a third, A, arm, then they would produce an estimate of A versus C and A versus B that was consistent (i.e., assumes the underlying effects to be identical or sampled from the same distribution depending whether fixed or random effects are assumed in the synthesis model) with any A versus C and A versus B trials that may actually exist. A further feature of MTC is that networks can be extended to include RCTs for which only one, or even none, of the treatments relevant to the decision question of interest are evaluated. Although such evidence may not initially seem relevant to the decision of interest (and may require a non-traditional search strategy to identify [
]), they can reduce uncertainty in the comparisons of interest (as well as providing an opportunity for assessing the consistency of the evidence). Therefore, when MTC methods are used, issues relating to the structure and scope of the network require careful consideration. Network scope will be discussed in the case studies that follow.
Many of the methods for implementing MTC have been developed using the Markov chain Monte Carlo (MCMC) WinBUGS software [
] and therefore are possible, but challenging, to fit using classical statistical methods. Within an MCMC framework it is fairly straightforward to extend the standard MTC model to incorporate more complex data structures (such as multiple time points and censored time to event data). As we will show, this potentially allows more data to be included in the synthesis while relaxing assumptions made in standard pair-wise meta-analysis.
In this section, both pair-wise meta-analysis and more complex evidence syntheses, incorporating an MTC component, are applied to three examples and compared with respect to the 1) assumptions the models make; 2) empirical estimates produced; and 3) conclusions drawn. Extended MTC methods, which account for outcomes reported at multiple time points [
], are presented in cases studies 2 and 3. These two case studies also consider cost-effectiveness by inputting the effectiveness estimates obtained from the alternative synthesis approaches into decision models originally developed as part of treatment appraisals commissioned by the National Institute of Health and Clinical Excellence in the UK.
Where multiple estimates of effectiveness are required for the decision model (e.g., estimates for different treatments and/or time points) then it is important to maintain the correlation structure between these estimates when inputting them into the decision model. One approach is to evaluate the decision model within the same modelling framework as the synthesis by fitting all the analysis within a single coherent MCMC framework [
] ensuring any correlations between parameters are automatically respected (used in case study 2). Alternatively the parameter estimates from each MCMC simulation could be used to inform a decision model evaluated using Monte Carlo simulation (e.g., Excel, Microsoft Corp, Redmond, WA) retaining their correlation structure throughout (used in case study 3).
All of the MTC models are evaluated within an MCMC framework using the WinBUGS software [
] used a random effect MTC to estimate effectiveness. The network diagram from this analysis, displaying the interventions that have been considered in RCTs together with the number of times each intervention has been compared to another intervention, is presented in Figure 1C.
Using the evidence base associated with the network compiled by Cooper et al., we now explore the implications of using three different approaches to the estimation of effectiveness for the single comparison of aspirin versus placebo: 1) pair-wise random effects meta-analysis of aspirin versus placebo (M-A); 2) random effects MTC of all trials, including aspirin and/or placebo arms (MTC aspirin or placebo RCTs); and 3) random effects MTC of all trials of anticoagulant and antiplatelet therapies (MTC all RCTs).
Currently the standard approach in HTA for investigating the clinical effectiveness of aspirin compared to placebo (M-A model) would be to search and identify the (four) RCTs that directly address this question (Fig. 1A).
However, how would the estimate and its associated uncertainty change if a broader evidence base were used and MTC methods utilized? To investigate this we now consider extending the evidence base to the RCTs in atrial fibrillation that include arms randomizing to either of the two treatments of interest; that is, placebo or aspirin (MTC Aspirin or Placebo RCTs model). This extends the network to include 18 further randomized comparisons, which are derived from 12 further RCTs (Fig. 1B) (i.e., some trials included more than two arms and hence made multiple comparisons although this information is not represented on the Fig. 1B) introducing a further four treatment nodes. Notice that some of these new comparisons form alternative indirect “routes” for comparing placebo to aspirin (i.e., indirect routes via warfarin, alternate-day aspirin, and low-dose warfarin and aspirin all now exist) thus providing further information about the comparison of interest and reducing uncertainty.
Finally, we extend the network to all RCTs of anticoagulant and antiplatelet therapies that were available when the original analysis was published, by including RCTs that did not consider either placebo or aspirin therapy (MTC all RCTs model). This adds in a further 10 comparisons from six RCTs and introduces a further two treatment nodes (Fig. 1C). This further increases the indirect “routes” that connect placebo to aspirin.
The results from these three different analyses, all implemented using the WinBUGS software, are displayed in Figure 2.
It can be observed that incorporating more data into the analysis through the MTC models greatly reduced the uncertainty changing a non-statistically significant result obtained from the pair-wise meta-analysis (pooled relative risk [RR] 0.744; 95% credible interval [CrI] 0.406–1.576, between-study variance [τ2] 0.091; 95% CrI 0.000–2.311) into a statistically significant one (RR 0.648; 95% CrI 0.457–0.877 and τ2 0.027; 95% CrI 0.000–0.276, and RR 0.633; 95% CrI 0.441–0.885 and τ2 0.044; 95% CrI 0.000–0.298 for MTC aspirin or placebo RCTs and MTC all RCTs models, respectively). It can also been observed that the uncertainty in the pooled rate ratio is increased slightly for MTC all RCTs model compared to MTC aspirin or placebo RCTs model despite the inclusion of more information. This is due to the increase in the between-study variance (which measures the within-treatment comparison between-study heterogeneity) that in turn reduces the absolute weight given to each study in the synthesis.
This example clearly illustrates that use of MTC methods can have an influence on estimates of effectiveness (both point estimate and uncertainty around it). The inclusion of more evidence will generally reduce the uncertainty of an estimate. Concern has been raised regarding whether such estimates are reliable (i.e., unbiased), and this depends on whether the assumptions of the model hold (as outlined in the previous section). An initial assessment of the goodness of fit of the model predictions to the observed data for each of the models using the posterior mean residual deviance (D̄) suggested that all models fitted the data well (i.e., under the null hypothesis that the model provides an adequate fit to the data, it is expected that D̄ is approximately equal to the number of unconstrained data points (pair-wise M-A D̄ = 7.32 compared to eight unconstrained data points, MTC (aspirin or placebo RCTs) 30.02 compared to 33 unconstrained data points, and MTC (all RCTs) 45.31 compared to 45 unconstrained data points).
We have focused on the results of one comparison here. Of course, one of the advantages of MTC is that it can compare all treatments in a connected network simultaneously, and even obtain probabilities that each treatment is ‘best' (i.e., most cost-effective). Case studies 2 and 3 consider decision problems with more than two alternatives. Clearly situations such as the one considered above, in which the effectiveness parameter changes considerably with synthesis model, could have important implications for any decision model, and subsequent policy decisions, which are sensitive to such parameters. We consider the impact on the results of the synthesis on the results of economic decision models in case studies 2 and 3.
Use of drug-eluting stents in PCI in patients with coronary artery disease
In 2007 a UK National Health Service HTA was published that assessed the effectiveness and cost-effectiveness of using drug-eluting stents in PCI in patients with coronary artery disease [
]. Pair-wise meta-analyses were carried out in the systematic review component of the report for different drug-eluting stent designs compared to bare-metal stents, for a range of outcomes (e.g., mortality, myocardial infarction events, revascularisation), and time points. However, the economic evaluation only considered bare-metal stents versus drug-eluting stents regardless of the type and the outcome of target lesion revascularisation at 1 year assuming the estimated effect is stable beyond the first year and all other outcomes to be equal. The model was evaluated deterministically (i.e., uncertainty was not taken into account).
] published a MTC of outcomes associated with drug-eluting stents (Cypher, Cordis, Bridgewater, NJ and Taxus, Boston Scientific, Natick, MA) and bare-metal stents. To incorporate data from the range of follow-up times reported by the different studies simultaneously, their analysis used a random walk model based on piece-wise constant hazards [
]. This model assumed estimates of effectiveness to be more similar at adjacent time points than estimates at far-off time points. This hierarchical (random effects) model also allows heterogeneous variance between studies so that the two types of drug eluting stents (Cypher and Taxus) can be assumed to be more similar to each other than to bare metal stents. Figure 3 shows the network diagram for this MTC analysis but note that the numbers on the diagram only relate to the number of trials reporting target lesion revascularisation at 1 year (i.e., the time point required for the economic decision model), as this number will vary across time points.
Here we consider the following three evidence synthesis models to estimate the pooled relative risk or hazard ratio for target lesion revascularisation in a subgroup of individuals (i.e., those experiencing an elective procedure, with a narrow definition of target lesion revascularisation, as given by Hill et al. [
]) with Cypher or Taxus drug-eluting stents versus bare-metal stents: 1) pair-wise random effects meta-analysis of Taxus or Cypher versus bare-metal stents using data at 1 year (M-A); 2) random effects MTC of Taxus versus Cypher versus bare-metal stents using data at 1 year (MTC); and 3) hierarchical random effects MTC using data from multiple time points (1, 2, 3, and 4 years) of Taxus versus Cypher versus bare-metal stents (MTC multiple time points) [
] and were carried out in the WinBUGS software. Note that the M-A and MTC were carried out on the relative risk scale but the MTC multiple time points model was carried out on the hazard ratio scale; therefore, care should be taken when directly comparing and interpreting the results.
The pooled relative risk/hazard ratio for target lesion revascularisation in individuals with Cypher or Taxus drug-eluting stents versus bare-metal stents from the three different evidence syntheses are presented in Figure 4.
This shows the point estimates of Cypher and Taxus versus bare metal stents to be very similar for the M-A and MTC models but shows a slight increase for the MTC multiple time points model. The uncertainty is reduced in the MTC and MTC multiple time points models compared to M-A model (depicted by the narrower confidence/credible intervals).
The M-A and MTC models fitted the data well (i.e., M-A Taxus vs. bare-metal D̄ = 19.47 compared to 18 unconstrained data points, M-A Cypher vs. bare-metal 32.80 compared to 34 unconstrained data points, and MTC 70.85 compared to 71 unconstrained data points), but the MTC multiple time points model fitted less well (i.e., D̄ = 181.3 compared to 162 unconstrained data points). This latter finding concurs with the original analysis by Stettler et al. [
], to assess the effect of using different evidence synthesis models, if any, on the overall cost-effectiveness result. Unlike the 2007 HTA model, many of the parameters in the decision model were expressed as distributions to represent the uncertainty in their estimation. A list of the distributions assigned to each parameter is given in the Appendix found at: doi:10.1016/j.jval.2010.09.001.
Figure 5 shows the cost-effectiveness acceptability curves for the M-A, MTC, and MTC multiple time points models. The plot shows the probability that Cypher and Taxus are cost-effective compared to bare metal stents for a range of values a decision-maker may be willing to pay per additional quality-adjusted life years (QALYs). In this particular example, as the amount a decision maker is willing to pay per additional QALY increases the probability that drug-eluting stents are cost-effective compared to bare metal stents also increases.
Use of neuraminidase inhibitors in the treatment of influenza
The third case study revisits a recent HTA conducted to inform UK National Institute for Health and Clinical Excellence guidance on the effectiveness and cost-effectiveness of using neuraminidase inhibitors (NIs) (antiviral drugs) for the treatment of influenza [
]. Both HTAs considered two NIs: zanamivir and oseltamivir. The evidence base considered in both appraisals was relatively straightforward. A number of RCTs existed comparing the effectiveness of zanamivir or oseltamivir versus placebo but there were no head-to-head trials comparing the two NIs directly, nor were there any trials of the NIs versus any other active comparator. Therefore, the network structure was relatively straightforward allowing an indirect comparison to be made between zanamivir and oseltamivir (Fig. 6).
Here we consider the different evidence synthesis approaches to estimate the effectiveness parameters for use in the economic decision model (i.e., mean difference in time to the alleviation of symptoms and return to normal activities) adopted by the two HTAs:
pair-wise random effect meta-analysis of zanamivir or oseltamivir versus placebo assuming time to event curves follow an exponential distribution (M-A) [
For comparison, the M-A model was updated to incorporate the same data as used in the MTC model (i.e., to include the more recently published RCTs).
Due to a proportion of individuals still with symptoms at the end of the RCTs, the challenge for both appraisals was how to deal with this censoring of the data. This was particularly problematic as the outcomes of interest for the decision model (that is, mean time to symptoms alleviated and mean time to return to normal activities) are undefined where censoring was present (clinically, the focus had been on the difference in median times and this was estimable from all trials because censoring was always on less than 50% of patients). To estimate the mean durations (and associated standard errors) in each of the arms of the RCTs in the presence of the censoring, the 2003 appraisal assumed that the survival curves for times to alleviation of symptoms/return to normal activities followed an exponential distribution [
]. Separate pair-wise meta-analysis models were fitted for each of the three patient groups of interest—otherwise healthy adults, otherwise healthy children and at-risk individuals (i.e., individuals of any age with a concurrent disease severe enough to require regular medical follow-up or hospital care (e.g., chronic disorders such as chronic respiratory disease, cardiovascular disease, and pulmonary disorders) plus otherwise healthy elderly individuals aged 65 years and older). For comparison to the 2009 appraisal results, an estimate for the indirect comparison of zanamivir versus oseltamivir has been calculated classically using the methodology outlined by Bucher et al. [
]. This analysis relaxes the assumption that the survival curves are exponential in shape and instead fits a more flexible (i.e., two parameter) Weibull distribution to the survival curves for both outcomes using the median data. In doing so, it takes into account further data that was available on the numbers still ill at the end of the reported follow-up since this informs a second point on the time to alleviation of symptoms/return to normal activities survival curve. The analysis models both outcomes simultaneously so information can be borrowed across outcomes for RCTs that do not report both outcomes. The three specific patient subgroups defined earlier plus a mixed population (the latter was used to include patients in trials where it was not possible to obtain stratified results for the subgroups of interest) were considered distinctly in the economic evaluation. These were also simultaneously modelled assuming exchangeability across each of the treatment/subgroup combinations that allow a borrowing of strength that increases the precision of the subgroup-specific estimates. Although this model automatically provides indirect comparison estimates for zanamivir versus oseltamivir, given the simple evidence structure, such an estimate could be obtained classically using standard indirect comparison methodology [
], and thus this is not the main advantage of the sophisticated modelling. Rather, it is in the other modelling complexities outlined above and facilitated within the network framework, which sets this analysis apart from the frequentist approach.
Figure 7 reports the estimates of treatment effect for each NI versus placebo and for the two NIs head to head for time to alleviation of symptoms and time to return to normal activities.
Across all the outcomes, except one, the NIs are estimated to be associated with a larger treatment difference compared to placebo in the hierarchical MTC model compared to the M-A model. The uncertainty is usually greater in the Bayesian analysis also reflected in the wider (and often asymmetric) credible intervals. The uncertainty was also generally greater in the head-to-head comparisons from the Bayesian synthesis model although the difference between the point estimates from the two models was less predictable. In summary, while unpredictable, the differences between the two analyses are quite considerable for a proportion of the estimates. Both M-A and hierarchical MTC models fit the data well; that is, the posterior mean residual deviance, D̄, is approximately equal to the number of unconstrained data points. However, when the hierarchical MTC model was fitted assuming an exponential (as assumed in the M-A model) rather than Weibull distribution the fit of the model was poor (i.e., D̄ = 405.5 compared to 139 unconstrained data points).
Figure 8 presents the acceptability curves for an at-risk adult population resulting from a decision model using the hierarchical MTC model and the simple M-A model for effectiveness inputs. It can be seen that for a willingness to pay of £5000 per QALY gained or above, zanamivir would appear to be the most cost-effective intervention for this subgroup regardless of approach used to estimate effectiveness, although the hierarchical MTC model increases the difference between the two NI treatments by approximately 20% (after £5,000 per QALY gained). Clearly, these sorts of differences could impact on model conclusions in situations where acceptability curves for different treatments are closer together.
In this article we have attempted to compare three state-of-the-art Bayesian synthesis models with more standard (pair-wise) meta-analytic approaches for estimation of clinical effectiveness. Since the latter two case studies consider MTC models with further modelling refinements, these comparisons are not “pure” MTC versus pair-wise synthesis but complex evidence synthesis versus pair-wise meta-analysis.
Important findings that these case studies highlight include: 1) MTC methods allow the inclusion of evidence that is ignored in pair-wise modelling, and this inclusion of further evidence generally reduces the uncertainty in effectiveness parameters; 2) imposing hierarchical structures on data (e.g., to allow for multiple time points) can also decrease uncertainty through the inclusion of extra evidence; 3) relaxing strong assumptions in the pair-wise modelling facilitated by the MCMC framework (e.g., in case study 3, to include a second time-point to estimate a survival curve, the curve was assumed to follow a Weibull rather than exponential shape) may appropriately increase uncertainty; and 4) both MTC and hierarchical aspects of the synthesis modelling can change point estimates, and it is difficult to anticipate by how much and in which direction before carrying out the evidence synthesis and thus the impact on the results of the cost-effectiveness models.
A common, but unworkably vague, phrase in guidance for decision models is that “all relevant evidence” should be used to inform (effectiveness) model parameters [
]. We believe this article highlights just how difficult it is to produce a workable definition of relevant evidence, but one would clearly need to address issues relating to trial networks (i.e., as shown in case study 1), and time points (i.e., case studies 2 and 3). Both of these issues relate to evidence that may influence the effectiveness parameters of interest, although it may not be immediately obvious that such evidence is “relevant.” Further, this article has only considered randomised evidence, although it is acknowledged that observational evidence, or even expert opinion, may sometimes be considered “relevant”; for example, in situations where there is no or limited trial evidence, or where the trial evidence may not relate to the patient populations being considered in the decision modelling. We are currently exploring further case studies where we compare the results of using trial data only with results obtained by using trial data augmented with observational data and expert opinion to further explore these issues.
It is important to note that the case studies cannot demonstrate one method is superior to the other with respect to bias and precision of parameter estimates since no gold standard approach exists and thus there is no way of knowing what the truth is. What we can say is that the more complex approaches consider the evidence as a coherent whole, include more data, and sometimes relax the assumptions made in the pair-wise approaches (although imposing hierarchical structures on the MTC network can also make stronger assumptions by assuming subgroups/regression parameters are exchangeable across treatments). Against this is the acknowledgement that the more complex models can be very nonintuitive to understand and time-consuming to undertake. It is possible to check the goodness-of-fit of the model predictions to the observed data, and it is important to do this for all synthesis models regardless of complexity (i.e., including standard pair-wise meta-analysis).
The authors thank Professor Adrian Bagust for providing the necessary information that allowed us to replicate the original decision model for case study 2, and Louise Longworth, PhD, and conference delegates for their interesting and useful discussion of a previous version of this work presented at Health Economists' Study Group meeting in Sheffield, UK, July 2009.
Source of financial support: This work was funded by a Medical Research Council (MRC) (MRC reference: G0800770). KRA is partly supported by the UK National Institute for Health Research (NIHR) as a Senior Investigator (NF-SI-0508-10061). NC, AS, KA, SB, and NW have run courses on MTC methodology commercially. NC, AS, KA, and NW have all done consultancy work on MTC methods.
Report of the Indirect Comparisons Working Group to the Pharmaceutical Benefits Advisory Committee: Assessing indirect comparisons. Pharmaceutical Benefits Advisory Committee, Canberra, Australia, 2009.