Methodology| Volume 24, ISSUE 5, P699-706, May 2021

Ok

# Joint Longitudinal Models for Dealing With Missing at Random Data in Trial-Based Economic Evaluations

Open ArchivePublished:February 18, 2021

## Highlights

• Standard methods to handle missing data in trial-based economic evaluations discard some of the observed responses. In this article, we illustrate how joint longitudinal models provide an alternative and potentially less biased approach for handling missing data with respect to current practice under a missing at random assumption.
• Methods that ignore some of the available information may be associated with biased results and mislead the decision-making process. Given the common problem of missing data, many study conclusions could be based on imprecise economic evidence.
• This is a potentially serious issue for those who use these evaluations in their decision making, thus possibly leading to incorrect policy decisions about the cost-effectiveness of new treatment options.

## Abstract

### Objectives

In trial-based economic evaluation, some individuals are typically associated with missing data at some time point, so that their corresponding aggregated outcomes (eg, quality-adjusted life-years) cannot be evaluated. Restricting the analysis to the complete cases is inefficient and can result in biased estimates, while imputation methods are often implemented under a missing at random (MAR) assumption. We propose the use of joint longitudinal models to extend standard approaches by taking into account the longitudinal structure to improve the estimation of the targeted quantities under MAR.

### Methods

We compare the results from methods that handle missingness at an aggregated (case deletion, baseline imputation, and joint aggregated models) and disaggregated (joint longitudinal models) level under MAR. The methods are compared using a simulation study and applied to data from 2 real case studies.

### Results

Simulations show that, according to which data affect the missingness process, aggregated methods may lead to biased results, while joint longitudinal models lead to valid inferences under MAR. The analysis of the 2 case studies support these results as both parameter estimates and cost-effectiveness results vary based on the amount of data incorporated into the model.

### Conclusions

Our analyses suggest that methods implemented at the aggregated level are potentially biased under MAR as they ignore the information from the partially observed follow-up data. This limitation can be overcome by extending the analysis to a longitudinal framework using joint models, which can incorporate all the available evidence.

## Introduction

Trial-based cost-effectiveness analyses (CEAs) rely on patient-level data, which are often collected at baseline and some follow-up points through self-reported questionnaires (eg, the EuroQol (EQ)-5D).
EQ-5D-3L
User guide: Basic information on how to use the EQ-5D-3L instrument.
These are combined with national value sets and unit prices to generate utility and cost measures at each time point, and then aggregated over the study period into some patient-level effectiveness, for example, quality-adjusted life-years (QALYs), and total cost outcomes, which are used in the economic analysis.
A typical problem is that some individuals may withdraw from the study before its completion or may be associated with missing values at some time (thus impairing the calculation of their aggregated measures), which forces the analyst to make untestable assumptions about the unobserved data. Rubin
• Rubin D.
Multiple Imputation for Nonresponse in Surveys.
introduced 3 classes of mechanisms responsible for the missing values: missing completely at random (MCAR), where missingness is unrelated to the data; missing at random (MAR), where missingness is unrelated to unobserved data conditional on observed data; and missing not at random (MNAR), where missingness depends on unobserved data.
Different methods can be used to handle missing data, each relying on assumptions about the missingness mechanism. A popular approach is case deletion methods, such as complete case analysis (CCA) and available case analysis (ACA), which remove from the analysis all unobserved data
• Noble S.
• Hollingworth W.
• Tilling K.
Missing data in trial-based cost-effectiveness analysis: the current state of play.
• Gabrio A.
• Mason A.
• Baio G.
Handling missing data in within-trial cost-effectiveness analysis: a review with future recommendations.
• Leurent B.
• Gomes M.
• Carpenter J.
Missing data in trial-based cost-effectiveness analysis: an incomplete journey.
and are typically valid only under MCAR.
• Little R.
• Rubin D.
Statistical Analysis With Missing Data.
However, when baseline covariates are incorporated, case deletion methods can lead to valid inferences under less restrictive assumptions than MCAR (referred to as covariate dependent MCAR
• Little R.
Modeling the drop-out mechanism in repeated-measures studies.
). Baseline imputation methods, such as mean imputation (MEAN), replace the missing baseline data with a single imputed value (eg, the mean).
• White I.
• Thompson S.
Adjusting for partially missing baseline measurements in randomized trials.
MEAN is more efficient than CCA or ACA and can lead to valid inferences under MAR.
• Sullivan T.
• White I.
• Salter A.
• Ryan P.
• Lee K.
Should multiple imputation be the method of choice for handling missing data in randomized trials?.
However, in trial-based CEAs, a drawback is that aggregated outcomes may also be unavailable when baseline values are observed.
Estimation methods based on a joint model for baseline and outcome data can appropriately assess the impact of missing data uncertainty.
• Little R.
Regression with missing x’s: a review.
Although different methods may be used to fit joint models, here we focus on two: multiple imputation (MI) and full Bayesian (FB) methods. MI uses an imputation model to produce multiple values for each missing observation, fits the model of interest to each imputed data set, and combines the parameter estimates into a pooled estimate.
• Rubin D.
Multiple Imputation for Nonresponse in Surveys.
Provided that the imputation model is correctly specified, MI gives consistent and asymptotically efficient parameter estimates under MAR.
• Carpenter G.
• Kenward M.
Multiple Imputation and Its Applications.
Multiple imputation by chained equations (MICE) is the most popular MI approach among practitioners.
• Van Buuren S.
Flexible Imputation of Missing Data.
In contrast to MI, FB methods estimate the parameters of interest simultaneously with the imputation of the missing values. FB methods are typically implemented using some iterative algorithms such as Markov Chain Monte Carlo (MCMC)
• Brooks S.
• Gelman A.
• Jones G.
• Meng X.
Handbook of Markov Chain Monte Carlo.
methods. When weakly informative prior distributions are specified, inferences from FB are based on the observed data and are valid under MAR.
• Daniels M.
• Hogan J.
Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis.

### Standard Approach in Trial-Based CEAs

According to recent reviews, standard practice in trial-based CEAs handles missingness at the level of the aggregated outcomes and baseline variables.
• Noble S.
• Hollingworth W.
• Tilling K.
Missing data in trial-based cost-effectiveness analysis: the current state of play.
• Gabrio A.
• Mason A.
• Baio G.
Handling missing data in within-trial cost-effectiveness analysis: a review with future recommendations.
• Leurent B.
• Gomes M.
• Carpenter J.
Missing data in trial-based cost-effectiveness analysis: an incomplete journey.
Indeed, estimates of interest are obtained by directly modeling the aggregated outcomes rather than the utility and cost data at each time. This requires the analyst to process the data collected on individual i at time j in treatment t, to derive the aggregated measures over the study duration. Figure 1 shows a typical data set of trial-based CEA, formed by the sets of utility $uj$ and cost $cj$ variables collected at baseline j = 0 and some follow-ups j = 1,…,J. The graph represents the standard procedure for processing the data and identifying the variables used in the analysis.
At the top left of the diagram, the intended data set (ID) is the ideal scenario where all data are collected for all the individuals in the trial. In real trials, however, some individuals drop out or are associated with some unobserved utility and cost values at some time, which leads to the collected data set (CD). Aggregated quantities for the effectiveness (e) and cost (c) outcomes are then computed (eg, using the area under the curve method for the QALYs).
• Drummond M.
• Schulpher M.
• Claxton K.
• Stoddart G.
• Torrance G.
Methods for the Economic Evaluation of Health Care Programmes.
These quantities can be calculated only for those subjects with fully observed utility and cost data. Next, the focus of the analysis is moved from the CD to the aggregated data set (AD), where all disaggregated follow-up data are discarded and only the baseline and aggregated variables are retained. Finally, in many cases, analysts remove all individuals with missing values and restrict the analysis to the aggregated complete data set (ACD), formed by the ncc < n individuals with fully observed data.

### Joint Longitudinal Models to Deal With Missingness

While methodological work has been carried out to handle missingness at an aggregated level, particularly with the use of joint models,
• Mason A.
• Gomes M.
• Grieve R.
• Carpenter J.
A Bayesian framework for health economic evaluation in studies with missing data.

Gabrio A, Mason A, Baio G. A full Bayesian model to handle structural ones and missingness in economic evaluations from individual-level data. Stat Med. 38(8):1399-1420.

• Gomes M.
• Camarena J.
• Marra G.
Copula selection models for non-Gaussian outcomes that are missing not at random.
the consideration of the missingness problem from a longitudinal perspective has been less extensively addressed. Only recently, Gabrio et al (2020)
• Gabrio A.
• Daniels M.
• Baio G.
A Bayesian parametric approach to handle missing longitudinal outcome data in trial-based health economic evaluations.
proposed a Bayesian longitudinal model to assess the impact of alternative missingness assumptions. A general limitation of any aggregated method is to ignore the longitudinal nature of the data and discard all follow-up values for partially observed individuals. Conversely, methods that handle missingness at each time point account for the longitudinal structure, incorporate all available evidence, and potentially make the missingness assumptions (eg, MAR) more reasonable.
In this study, we illustrate the impact that alternative missing data approaches have on trial-based CEAs. Focus is given to how joint longitudinal models can extend aggregated models for dealing with missingness under MAR. We compare the results obtained from alternative methods using a simulation study and 2 real case studies. The rest of the article is structured as follows. The “Methods” section briefly outlines the different missing data approaches considered. The “Implementation” section summarizes the assumptions and the implementation details for each approach. The design and results from a simulation study are presented in the “Simulation Study” section, while the “Application” section shows the statistical and economic results from the 2 case studies. Finally, the “Discussion” section concludes the article.

## Methods

This section briefly revises some of the most popular missing data methods in trial-based CEAs, which include: case deletion (CCA and ACA), mean baseline imputation (MEAN), and joint aggregated (MI and FB) methods. Joint longitudinal methods (L-MI and L-FB) are then introduced and discussed. Throughout the article, to ease notation, we describe the methods under the simplified scenario where only 2 treatment groups are compared and the only covariates are the baseline utilities and costs. However, we note that the methods can be easily extended to deal with multiple treatments and baseline covariates, such as prognostic factors or demographic variables.

### Case Deletion and Baseline Imputation Methods

The standard approach fits the model to the ACD or AD based on the aggregated effectiveness and costs (ei,ci), which are computed after processing the utility and cost data collected in the study (uij,cij). Linear regression models are used to adjust for baseline values, often under simplifying assumptions such as normality and independence between the outcomes:
$Equation 1.$
(1)

The models include: the intercepts (α0, β0); the regression coefficients associated with the treatment indicator ti1, β1) and the baseline utility ui0 and baseline cost ci0 variables (α2, β2); and the standard deviations (σe, σc). Once the models are fitted, treatment-specific mean estimates (μet, μct) can be obtained by plugging in the regression coefficient estimates into Equation 1 and replacing the baseline variables with their empirical means. When confronted with missing values, parameter estimates can be obtained using different approaches. For example, CCA and ACA fit the model to the complete cases and generate the estimates using the mean of the baseline variables computed from either the complete or all observed cases, respectively. MEAN fits the model to the entire data set using mean-imputed baseline variables and predicts the missing values in the response based on Equation 1, either ignoring (with point predictions) or accounting (by adding some random term) for the uncertainty around the missing values. In all our analyses based on MEAN, we include a random term to take into account missing data uncertainty.

### Joint Aggregated Models

Joint aggregated models simultaneously handle missingness in the aggregated and baseline variables through a joint (often normal) distribution. Sometimes it is more convenient to represent these using conditional probabilities in terms of the product of marginal and conditional distributions.
• Nixon R.
• Thompson S.
Methods for incorporating covariate adjustment, subgroup analysis and between centre differences into cost-effectiveness evaluations.
This factorization allows to re-express the models as:
$Equation 2.$
(2)

where the baseline variables have their own means and standard deviations (μu0, σu0, μc0, σc0) in addition to the regression model for the aggregated outcomes. Different approaches, such as MI and FB,
• Little R.
Regression with missing x’s: a review.
can be used to fit the model shown in Equation 2 and then the mean effectiveness and cost estimates can be retrieved as in the “Case Deletion and Baseline Imputation Methods” section.

### Joint Longitudinal Models

Joint longitudinal models extend aggregate models to account for the longitudinal nature of the data and use all the available utility and costs in the CD. Assumptions about the dependence can be made to simplify the implementation. For example, under a first-order Markov dependence assumption we have:
$Equation 3.$
(3)

where all modeled variables are associated with standard deviations (σuj, σcj), and treatment-specific regression coefficients (αjt, βjt) capturing the response association between time j and j-1. The model allows to capture the time dependence between variables while also minimizing the overall number of variables included, thus making it relatively easy to implement. Thus, although alternative specifications are possible (eg, via multivariate distributions), we believe that the proposed specification provides a good balance between model complexity and flexibility. When the models in Equation 3 are fitted using MI, the joint distributions are approximated by first imputing the variables and then fitting the model to the imputed data (L-MI). If a FB approach is used, the models can be directly fitted to the partially observed data (L-FB). Regardless of whether L-MI or L-FB is used, the mean effectiveness and costs in each treatment can be retrieved in 2 steps. First, the estimates of the marginal means of the utilities and costs for each time and treatment (μujt, μcjt) are derived from Equation 3. Second, the formulae used for calculating the QALYs and total costs can be applied to the marginal response means to obtain the aggregated mean estimates.

## Implementation

### Model Specification

Table 1 summarizes the key assumptions of each missingness approach described in the “Methods” section.
Table 1List of the different methods for handling missing data in trial-based CEAs that are compared.
MethodModeled variablesData setImputation
CCAei|ti,ui0

ci|ti,ci0
ACDnone
ACAei|ti,ui0

ci|ti,ci0
ACDnone
MEANei|ti,u∗i0

ci|ti,c∗i0

regression for (ei,ci)
FB/MIui0,ei|ti

ci0,ci|ti

FB/MI for (ci0,ci)
L-FB/L-MIui0,...,uiJ|ti

ci0,...,ciJ|ti
CDFB/MI for (ui0,…,uiJ)

FB/MI for (ci0,…,ciJ)
A total of 5 approaches are considered: complete (CCA) and available (ACA) case analysis, mean baseline imputation (MEAN), joint aggregated (FB/MI), and longitudinal (L-FB/L-MI) models. The methods are fitted to different data sets: aggregated completed (ACD), aggregated (AD), and collected (CD). The methods impute different variables: none, quality-adjusted life-years and total costs (ei,ci), baseline utilities and costs (ui0,ci0), and all utilities and costs (ui0,…,uiJ and ci0,…,ciJ). The terms u∗i0 and c∗i0 denote the mean-imputed baseline variables.
A total of 5 approaches are considered: complete (CCA) and available (ACA) case analysis, mean baseline imputation (MEAN), joint aggregated (FB/MI), and longitudinal (L-FB/L-MI) models. The methods are fitted to different types of data sets (ACD, AD, or CD) and imputed variables (none, aggregated and baseline, or all collected variables).
With the exception of L-FB/L-MI, which are fitted to the CD, all other methods are either fitted to the ACD (CCA and ACA) or the AD (MEAN and MI/FB). In the following sections, we compare these methods using a simulation study (the “Simulation Study” section) and real data from 2 randomized trials (the “Application” section). In the simulation study, to ensure the comparability of the results, we fit all models using a Bayesian approach and assess the performance of the methods in terms of bias and efficiency. In the application to the case studies, we additionally fit the joint models using MI, and summarize the cost-effectiveness conclusions.
For all Bayesian models, estimates are derived based on the observed data under MAR using weakly informative prior distributions. Specifically, we choose normal distributions centered at 0 with a standard deviation of 1000 for all the regression coefficients and uniform distributions between 0 and 100 for standard deviations. Prior sensitivity was conducted by varying the hyperprior values over the set {1000; 10000; 100000} for the standard deviation of the regression coefficients and for the upper bound of the standard deviations. Results from all analyses were robust to these variations. MI models are fitted using MICE, setting the number of imputations to M = 20, and generating bootstrap samples to replicate the sampling distribution of the estimates. For all models fitted under a Bayesian approach (CCA, ACA, MEAN, FB, and L-FB), 95% credible intervals are calculated from the 2.5 and 97.5 percentiles of the posterior distributions of the parameters. For all models fitted with MI (MI and L-MI), confidence intervals are evaluated using the percentile-t bootstrap method.
• Efron B.
Nonparametric standard errors and confidence intervals.

### Software

All Bayesian models are implemented using JAGS,
JAGS: Just Another Gibbs Sampler.
a program dedicated for Bayesian analysis using MCMC simulation. We interface JAGS with the statistical software R, using the package R2jags.
Package R2jags.
We ran 2 chains for a total sample of 20 000 iterations for posterior inference. MI models are fitted using MICE through the R package mice. Based on current recommendations from the literature,
• Carpenter G.
• Kenward M.
Multiple Imputation and Its Applications.
we used M = 20 imputed data sets and for each of these we generated B = 1000 bootstrap samples, for a total of 20 000 samples. For all models, the convergence of the algorithms was assessed using different types of diagnostic measures, such as the potential scale reduction factor.
• Gelman A.
• Hill J.
Data Analysis Using Regression and Multilevel/Hierarchical Models.
The JAGS code used to fit all Bayesian models to the MenSS data is provided in the online supplemental material 1.

## Simulation Study

In this section we present the design and results of a simulation study whose aim is to assess the performance of the missingness approaches described in the “Implementation” section, under different assumptions about the data generating and missingness process. We consider a simple randomized trial setting with 2 treatments being compared and only 1 type of outcome (the utilities), which is assumed to be normally distributed and collected at 3 time points evenly spaced over 1 year (at baseline, 6 months, and 12 months).

### Design of the Simulation

We design the simulation study using different specifications for the data-generating process and missingness mechanism to compare the performance of the methods over a range of scenarios. Detailed information on the motivation and type of the data-generating process and missingness mechanism in the simulations are described in the Appendices in Supplemental Materials found at https://doi.org/10.1016/j.jval.2020.11.018 due to space limitations. Each scenario is defined by varying 2 types of parameters: the sample size (n) of each simulated data set and the values of the parameters indexing the missingness mechanism. The latter are varied over a discrete grid to obtain different configurations in terms of the proportions of missing values and type of missing data mechanism. We consider a total of 45 scenarios based on the combination of:
• Three values of sample size: 100, 500, and 1000.
• Three values of average proportions of missing data: a “low” rate of 0.15, a “medium” rate of 0.30, and a “high” rate of 0.5.
• Five missingness mechanisms: MCAR, and 2 alternative definitions of MAR and MNAR.
The mechanisms can be interpreted as follows. MCAR: dropout probabilities are totally random at any time; MAR1 and MAR2: dropout at baseline is totally random, while at follow-ups it is more likely among individuals with higher values either at j = 0 (MAR1) or at j = 1 (MAR2); MNAR1 and MNAR2: dropout at j = 1,2 is more likely among individuals with higher values at the same time, for either all time points (MNAR1) or excluding baseline (MNAR2). We acknowledge that alternative scenarios could be investigated in a more comprehensive simulation. However, since an extensive simulation is not the main objective of this article, we decided to investigate a limited number of scenarios that could replicate the framework of the case studies analyzed in the “Application” section.
For each scenario, we fit the models in Table 1 using a Bayesian approach and repeat the process S = 500 times. In each simulation, we derive the values of the mean QALYs differentials (Δ) between the treatment groups, which are taken to be the mean evaluated over the posterior samples of the estimates. As part of the assessment of the models, we look at the bias, empirical standard errors, and root mean square errors of the estimates of interest under each scenario.

### Results of the Simulation

Results of the simulation in terms of bias for each method and scenario are summarized in Figure 2.
As expected, under MCAR (black lines and dots), all methods show, on average, unbiased estimates. Under MAR1 (blue lines and dots), where follow-up missingness depends on baseline data, CCA is the only method that shows biased estimates, while all other methods show comparable good performances. Under MAR2 (green lines and dots), where follow-up missingness at a given time depends on the observed data at the previous time, all methods that ignore the longitudinal nature of the data (CCA, ACA, MEAN, and FB) show, on average, biased estimates, with CCA having the worst performance. L-FB is the only method showing unbiased estimates under both MAR1 and MAR2. Finally, under MNAR mechanisms, where missingness depends on the unobserved values at the same time either including (MNAR1, red lines and dots) or excluding (MNAR2, magenta lines and dots) baseline data, all methods show, on average, biased estimates. The results under each type of mechanism are comparable with few variations across the scenarios. The length of the intervals increases as the sample size decreases, while the magnitude of the bias increases as the proportion of missingness increases. Similar conclusions for all methods are obtained in terms of empirical and root mean square errors, which are shown in Appendix 1 in Supplemental Materials found at https://doi.org/10.1016/j.jval.2020.11.018.

## Application

### The MenSS Trial

The Men’s Safer Sex (MenSS) is a pilot study of a new digital intervention aimed at increasing condom use and reducing the incidence of sexually transmitted infections (STI) in young men.
• Bailey J.
• Webster R.
• Hunter R.
• et al.
The men’s safer sex project: intervention development and feasibility randomised controlled trial of an interactive digital intervention to increase condom use in men.
Individuals (n = 159) enrolled in the study are men aged 16 with recent unprotected sex or suspected acute STI. Participants were randomized to receive the intervention plus clinic care (reference intervention, n2 = 84), or clinic care only (control, n1 = 75). Sexual health–related resource use was collected via self-reported questionnaires at 3, 6, and 12 months. Utility scores to calculate QALYs were collected at baseline and at the same times as costs using the EQ-5D instrument. With few exceptions, the non-completers in the control are associated with lower mean utilities and higher costs compared with the completers, while no clear pattern between the 2 is observed in the intervention. The summary statistics for the utilities and costs for the completers and those with partially observed data (non-completers), the missingness patterns and histograms of the empirical distributions of the QALYs, and total costs in the MenSS study are provided in Appendix 2 in Supplemental Materials found at https://doi.org/10.1016/j.jval.2020.11.018.

#### Results

Figure 3 displays the mean and 95% confidence/credible intervals for the mean QALYs and total costs for each method, both in the control (red dots and lines) and intervention (blue dots and lines) group.
Under CCA, the mean QALYs difference between groups is near to zero. However, when the information from the observed values among the non-completers is incorporated into the model, either using ACA, MEAN, MI, or FB, the mean estimates in the intervention become systematically higher compared to those in the control. Similar conclusions are obtained from L-MI and L-FB, even though mean QALYs are shifted downward in both groups. Mean total costs are similar across all methods fitted to the aggregated variables, while estimates from the joint longitudinal models are slightly higher for the control. For each method, we provide in Appendix 2 summary statistics associated with different CEA quantities (see in Supplemental Materials found at https://doi.org/10.1016/j.jval.2020.11.018).
We summarize the economic results from the trial by looking at the probability that the new intervention is cost-effective with respect to the control for different values of the acceptance threshold. Figure 4 shows the cost-effectiveness acceptability curves (CEAC),
• Van Hout B.
• Al M.
• Gordon G.
• Rutten F.
• Kuntz K.
Costs, effects, and c/e-ratios alongside a clinical trial.
which are computed based on the posterior/bootstrapped samples of the mean QALYs and total costs for each method.
Solid lines denote the methods fitted under a Bayesian framework (black for CCA, blue for ACA, green for MEAN, red for FB, and magenta for L-FB), whereas dashed lines denote the methods fitted using multiple imputations (red for MI and magenta for L-MI). With the exception of CCA, all aggregated methods indicate a high probability of cost-effectiveness for the intervention for most values of the acceptance threshold, while L-MI and L-FB indicate milder conclusions.

### The PBS Trial

The Positive Behaviour Support (PBS) is a multicenter randomized trial involving intellectual disability services for people with mild to severe intellectual disability.
• Hassiotis A.
• Poppe M.
• Strydom A.
• et al.
Clinical outcomes of staff training in positive behaviour support to reduce challenging behaviour in adults with intellectual disability: cluster randomised controlled trial.
The new intervention (PBS) is designed to foster prosocial actions and enhance the person’s integration within the local community. Participants (n = 244) were enrolled and randomly allocated to staff teams trained to deliver PBS in addition to treatment as usual (reference intervention, n2 = 108), or treatment as usual alone (control, n1 = 136). Measures for quality of life (EQ-5D) and health-related cost (health records) were collected at baseline, 6, and 12 months. No systematic differences are observed between the completers and non-completers for the utilities, but non-completers are generally associated with higher values for the costs. The summary statistics for the utilities and costs for the completers and those with partially observed data (non-completers), the missingness patterns and histograms of the empirical distributions of the QALYs, and total costs in the MenSS study are provided in Appendix 3 in Supplemental Materials found at https://doi.org/10.1016/j.jval.2020.11.018.

#### Results

Due to space limitations, we report the posterior results for the mean estimates from each model and the associated CEACs in Appendix 3 in Supplemental Materials found at https://doi.org/10.1016/j.jval.2020.11.018. Overall, estimates for the mean QALYs and total costs do not vary largely across the methods, and they indicate that the new intervention is both more effective and more expensive than the control. This is also reflected in the CEACs, which show a similar pattern across all threshold values and methods. The largest difference is observed for the results of CCA, which indicate an increased chance of cost-effectiveness with respect to the other methods.

## Discussion

The objective of this article was to assess the impact on trial-based CEA results of alternative missingness methods. Focus was given to the difference between joint aggregated and longitudinal models. For comparison, we also included other approaches that are routinely used by practitioners. The results from the 2 case studies can be related to those from the simulation. In the MenSS trial, joint longitudinal models lead to considerably different cost-effectiveness conclusions compared to aggregated models, so that decision maker may be skeptical about the validity of the results from methods that do not make full use of all available evidence. In the PBS study, differences in the results between aggregated and longitudinal models are relatively small and the overall cost-effectiveness conclusions do not change substantially. As shown in the simulation study, the potential benefits in terms of bias reduction from using a longitudinal model increase with the amount of observed values discarded by the alternative aggregated model. Our recommendation is that, unless missingness proportions are negligible, analysts should try to avoid the implementation of methods that discard some of the data and should instead prefer the use of methods that can make full use of the observed data collected. In this article we only considered joint longitudinal models as an example of the preferred methods, but other approaches could also be considered (eg, mixed-effects model).
Both MI and FB methods can be used to fit joint longitudinal models. While MI is often computationally faster, the analyst needs to guard against potential incompatibility issues between imputation and analysis models that may bias the inferences.
• Van Buuren S.
Flexible Imputation of Missing Data.
In addition, different ways for combining imputations and bootstrapping exist but no consensus has been reached for which approach to use.
• Brand J.
• van Buuren S.
• le Cessie S.
• van den Hout W.
Combining multiple imimputation and bootstrap in the analysis of cost-effectiveness trial data.
,
• Schomaker M.
• Heumann C.
Bootstrap inference when using multiple imputation.
Key strengths of this study include proposal of joint longitudinal models for handling CEA data under MAR as a way to improve the aggregated methods used in current practice and comparison of the methods through a simulation and 2 case studies. Key limitations include simplification of the modeling framework, which did not consider additional complexities, such as correlation between utilities and costs,
• O’Hagan A.
• Stevens J.
A framework for cost-effectiveness analysis from clinical trial data.
skewness,
• Basu A.
• Manca A.
Regression estimators for generic health-related quality of life and quality-adjusted life years.
structural values,

Gabrio A, Mason A, Baio G. A full Bayesian model to handle structural ones and missingness in economic evaluations from individual-level data. Stat Med. 38(8):1399-1420.

,
• Ng E.
• Diaz-Ordaz K.
• Grieve R.
• et al.
Multilevel models for cost-effectiveness analyses that use cluster randomised trial data: an approach to model choice.
and MNAR assumptions
• Daniels M.
• Hogan J.
Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis.
,

Gabrio A, Mason A, Baio G. A full Bayesian model to handle structural ones and missingness in economic evaluations from individual-level data. Stat Med. 38(8):1399-1420.

,
• Mason A.
• Richardson S.
• Plewis I.
• Best N.
Strategy for modelling nonrandom missing data mechanisms in observational studies using Bayesian methods.
,
• Leurent B.
• Gomes M.
• Faria R.
• et al.
Sensitivity analysis for not-at-random missing data in trial-based cost-effectiveness analysis: a tutorial.
and evaluation of the performance of the methods using a selected number of scenarios, which may not be representative of all possible situations.
In conclusion, models that ignore the longitudinal nature of the data may lead to biased results and mislead the decision-making process. Conversely, joint longitudinal models properly recognize the longitudinal structure, use all the available evidence, and can improve the estimates of the quantities of interest under MAR.
Author Contributions: Concept and design: Gabrio, Hunter, Mason, Baio
Acquisition of data: Hunter, Baio
Analysis and interpretation of data: Gabrio, Mason, Baio
Drafting of the manuscript: Gabrio
Critical revision of the paper for important intellectual content: Gabrio, Mason, Baio
Statistical analysis: Gabrio
Supervision: Hunter, Mason, Baio
Conflict of Interest Disclosures: The authors reported no conflicts of interest.
Funding/Support: This work was partially supported by a research grant sponsored by the Foundation Blanceflor Boncompagni Ludovisi, nee Bildt, and an unrestricted research grant sponsored by Mapi Group. The MenSS trial was supported by a Health Technology Assessment grant from the National Institute for Health Research. Ref. 10/131/01 (http://www.nets.nihr.ac.uk/projects/hta/1013101). The Positive Behavior Support study (https://www.journalslibrary.nihr.ac.uk/programmes/hta/1010413#/) was funded by the Health Technology Assessment program of the National Institute for Health Research (Reference 10/104/13).
Role of the Funder/Sponsor: The funder had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Acknowledgment: The authors are grateful to 2 anonymous reviewers for careful reading and thoughtful comments that have greatly improved an earlier version of the article. We would like to acknowledge the hard work of all the people involved in the MenSS and Positive Behavior Support trials and to thank them for providing us with access to their data.

## Supplemental Material

• Supplementary Material
• Appendices 1-3

## References

• EQ-5D-3L
User guide: Basic information on how to use the EQ-5D-3L instrument.
• Rubin D.
Multiple Imputation for Nonresponse in Surveys.
John Wiley & Sons, New York, NY1987
• Noble S.
• Hollingworth W.
• Tilling K.
Missing data in trial-based cost-effectiveness analysis: the current state of play.
Health Econ. 2012; 23: 187-200
• Gabrio A.
• Mason A.
• Baio G.
Handling missing data in within-trial cost-effectiveness analysis: a review with future recommendations.
Pharmacoecon Open. 2017; 1: 79-97
• Leurent B.
• Gomes M.
• Carpenter J.
Missing data in trial-based cost-effectiveness analysis: an incomplete journey.
Health Econ. 2018; 27: 1024-1040
• Little R.
• Rubin D.
Statistical Analysis With Missing Data.
John Wiley & Sons, Hoboken, NJ2002
• Little R.
Modeling the drop-out mechanism in repeated-measures studies.
J Am Stat Assoc. 1995; 90: 1112-1121
• White I.
• Thompson S.
Adjusting for partially missing baseline measurements in randomized trials.
Stat Med. 2005; 24: 993-1007
• Sullivan T.
• White I.
• Salter A.
• Ryan P.
• Lee K.
Should multiple imputation be the method of choice for handling missing data in randomized trials?.
Stat Methods Med Res. 2016; 27: 2610-2626
• Little R.
Regression with missing x’s: a review.
J Am Stat Assoc. 2012; 87: 1227-1237
• Carpenter G.
• Kenward M.
Multiple Imputation and Its Applications.
John Wiley & Sons, Chichester, UK2012
• Van Buuren S.
Flexible Imputation of Missing Data.
Chapman and Hall/CRC, Boca Raton, FL2013
• Brooks S.
• Gelman A.
• Jones G.
• Meng X.
Handbook of Markov Chain Monte Carlo.
CRC Press, Boca Raton, FL2011
• Daniels M.
• Hogan J.
Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis.
Chapman and Hall, New York, NY2008
• Drummond M.
• Schulpher M.
• Claxton K.
• Stoddart G.
• Torrance G.
Methods for the Economic Evaluation of Health Care Programmes.
3rd ed. Oxford University Press, Oxford, UK2005
• Mason A.
• Gomes M.
• Grieve R.
• Carpenter J.
A Bayesian framework for health economic evaluation in studies with missing data.
Health Econ. 2018; 27: 1670-1683
1. Gabrio A, Mason A, Baio G. A full Bayesian model to handle structural ones and missingness in economic evaluations from individual-level data. Stat Med. 38(8):1399-1420.

• Gomes M.
• Camarena J.
• Marra G.
Copula selection models for non-Gaussian outcomes that are missing not at random.
Stat Med. 2019; 38: 480-496
• Gabrio A.
• Daniels M.
• Baio G.
A Bayesian parametric approach to handle missing longitudinal outcome data in trial-based health economic evaluations.
J R Stat Soc Ser A Stat Soc. 2020; 183: 607-629
• Nixon R.
• Thompson S.
Methods for incorporating covariate adjustment, subgroup analysis and between centre differences into cost-effectiveness evaluations.
Health Econ. 2005; 14: 1217-1229
• Efron B.
Nonparametric standard errors and confidence intervals.
Can J Stat. 1981; 9: 139-172
2. JAGS: Just Another Gibbs Sampler.
3. Package R2jags.
4. Package mice.
• Gelman A.
• Hill J.
Data Analysis Using Regression and Multilevel/Hierarchical Models.
Cambridge University Press, New York, NY2007
• Bailey J.
• Webster R.
• Hunter R.
• et al.
The men’s safer sex project: intervention development and feasibility randomised controlled trial of an interactive digital intervention to increase condom use in men.
Health Technol Assess Rep. 2016; 20: 1-115
• Van Hout B.
• Al M.
• Gordon G.
• Rutten F.
• Kuntz K.
Costs, effects, and c/e-ratios alongside a clinical trial.
Health Econ. 1994; 3: 309-319
• Hassiotis A.
• Poppe M.
• Strydom A.
• et al.
Clinical outcomes of staff training in positive behaviour support to reduce challenging behaviour in adults with intellectual disability: cluster randomised controlled trial.
Br J Psychiatry. 2018; 212: 161-168
• Brand J.
• van Buuren S.
• le Cessie S.
• van den Hout W.
Combining multiple imimputation and bootstrap in the analysis of cost-effectiveness trial data.
Stat Med. 2019; 38: 210-220
• Schomaker M.
• Heumann C.
Bootstrap inference when using multiple imputation.
Stat Med. 2018; 37: 2252-2266
• O’Hagan A.
• Stevens J.
A framework for cost-effectiveness analysis from clinical trial data.
Health Econ. 2001; 10: 303-315
• Basu A.
• Manca A.
Regression estimators for generic health-related quality of life and quality-adjusted life years.
Med Decis Making. 2012; 32: 56-69
• Ng E.
• Diaz-Ordaz K.
• Grieve R.
• et al.
Multilevel models for cost-effectiveness analyses that use cluster randomised trial data: an approach to model choice.
Stat Methods Med Res. 2016; 25: 2036-2052
• Mason A.
• Richardson S.
• Plewis I.
• Best N.
Strategy for modelling nonrandom missing data mechanisms in observational studies using Bayesian methods.
J Off Stat. 2012; 28: 279-302
• Leurent B.
• Gomes M.
• Faria R.
• et al.
Sensitivity analysis for not-at-random missing data in trial-based cost-effectiveness analysis: a tutorial.
Pharmacoeconomics. 2018; 36: 1-13