Methodology|Articles in Press

# Comparing Modeling Approaches for Discrete Event Simulations With Competing Risks Based on Censored Individual Patient Data: A Simulation Study and Illustration in Colorectal Cancer

Open AccessPublished:September 07, 2021

## Highlights

• Competing events or risks can be implemented in discrete event simulations (DESs) according to different approaches. Alternative modeling approaches for implementing competing events based on individual patient data have been defined and compared for uncensored data, showing differences in accuracy and resulting cost-effectiveness point estimates.
• This study shows that for “censored” data, the censoring affects the feasibility and accuracy of modeling approaches for implementing competing events in DESs, with both the type and level of censoring having a profound impact on accuracy and cost-effectiveness point estimates. What modeling approach is preferable depends on the data characteristics and whether the incidence of events or the time to events is considered more important.
• Those developing or appraising DES models should be aware of the alternative modeling approaches that are available and that these might yield different accuracy and health economic outcomes. Model developers should perform and report on cross-validation efforts to assure individual patient data are appropriately represented.

## Abstract

### Objectives

This study aimed to provide detailed guidance on modeling approaches for implementing competing events in discrete event simulations based on censored individual patient data (IPD).

### Methods

The event-specific distributions (ESDs) approach sampled times from event-specific time-to-event distributions and simulated the first event to occur. The unimodal distribution and regression approach sampled a time from a combined unimodal time-to-event distribution, representing all events, and used a (multinomial) logistic regression model to select the event to be simulated. A simulation study assessed performance in terms of relative absolute event incidence difference and relative entropy of time-to-event distributions for different types and levels of right censoring, numbers of events, distribution overlap, and sample sizes. Differences in cost-effectiveness estimates were illustrated in a colorectal cancer case study.

### Results

Increased levels of censoring negatively affected the modeling approaches’ performance. A lower number of competing events and higher overlap of distributions improved performance. When IPD were censored at random times, ESD performed best. When censoring occurred owing to a maximum follow-up time for 2 events, ESD performed better for a low level of censoring (ie, 10%). For 3 or 4 competing events, ESD better represented the probabilities of events, whereas unimodal distribution and regression better represented the time to events. Differences in cost-effectiveness estimates, both compared with no censoring and between approaches, increased with increasing censoring levels.

### Conclusions

Modelers should be aware of the different modeling approaches available and that selection between approaches may be informed by data characteristics. Performing and reporting extensive validation efforts remains essential to ensure IPD are appropriately represented.

## Introduction

Decision analytic modelers increasingly use alternatives to the commonly used cohort-level state-transition modeling (STM) technique to reflect the complex dynamics of clinical pathways.
• Annemans L.
• Redekop K.
• Payne K.
Current methodological issues in the economic assessment of personalized medicine.
• Caro J.J.
• Möller J.
• Getsios D.
Discrete event simulation: the preferred technique for health economic evaluations?.
• Miller J.D.
• Foley K.A.
• Russell M.W.
Current challenges in health economic modeling of cancer therapies: a research inquiry.
Modeling techniques, such as discrete-time agent-based modeling and continuous-time discrete event simulation (DES), are able to incorporate patient-level characteristics and clinical histories, multiple timescales, competing resources, and interactions among different actors, such as physicians and patients. Nevertheless, these techniques are more demanding than cohort-level STM, mainly in terms of computational complexity and required analytical skills to implement them.
• Caro J.J.
• Briggs A.H.
• Siebert U.
• Kuntz K.M.
ISPOR-SMDM Modeling Good Research Practices Task Force. Modeling good research practices—overview: a report of the ISPOR-SMDM modeling good research practices task Force-1.
,
• Degeling K.
• Franken M.D.
• May A.M.
• et al.
Matching the model with the evidence: comparing discrete event simulation and state-transition modeling for time-to-event predictions in a cost-effectiveness analysis of treatment in metastatic colorectal cancer patients.
Therefore, evidence-based methodological guidance is essential to inform the design choices that need to be made and reported on when applying these techniques to support the development of high-quality models.
DES is a useful alternative to STM and can be used to model personalized treatment processes because of its ability to reflect dynamic pathways based on patient-level histories and characteristics.
• Karnon J.
• Haji Ali Afzali H.
When to use discrete event simulation (DES) for the economic evaluation of health technologies? A review and critique of the costs and benefits of DES.
,
• Standfield L.
• Comans T.
• Scuffham P.
Markov modeling and discrete event simulation in health care: a systematic comparison.
Although aggregated data can be used to develop DES models, individual patient data (IPD) are the preferred source of evidence to account for stochastic uncertainty and patient heterogeneity. An important design choice in developing DES models based on IPD is how competing events are implemented.
• Caro J.J.
• Möller J.
Decision-analytic models: current methodological challenges.
Competing events are those that prevent other events of interest from occurring or change the probability of their occurrence.
• Pintilie M.
Competing Risks: A Practical Perspective.
Four strategies have been identified to implement competing events in DES models: (1) sampling times for all competing events and selecting the event that is the first to occur, (2) sampling the event to occur first and sampling the corresponding time-to-event second, (3) sampling the time-to-event first and sampling the corresponding event second, and (4) using discretized cyclic probabilities to resemble discrete-time STM.
• Barton P.
• Jobanputra P.
• Wilson J.
• Bryan S.
• Burls A.
The use of modelling to evaluate new drugs for patients with a chronic condition: the case of antibodies against tumour necrosis factor in rheumatoid arthritis.
Modeling approaches corresponding to these strategies have been proposed and compared when informed by “uncensored” IPD generated according to mixtures of event-specific distributions (ESDs).
• Degeling K.
• Koffijberg H.
• Franken M.D.
• Koopman M.
• IJzerman M.J.
Comparing strategies for modeling competing risks in discrete-event simulations: a simulation study and illustration in colorectal cancer.
A general recommendation was made to sample the time to event based on a multimodal time-to-event distribution and then to determine the corresponding event second. An approach that first selects the event to occur and then the corresponding time to event also showed good performance with an easier implementation. Unfortunately, this guidance cannot be generalized to DES models informed by “censored” IPD. For “uncensored” IPD, it is known which competing event occurred for each patient, allowing the data to be analyzed conditional on which competing event occurred. For “censored” IPD, there are patients for whom it is unknown which event would eventually occur and extrapolation is required. Methods for extrapolation differ between modeling approaches, and hence, performance of the approaches may differ between uncensored and censored data. It is important to appropriately account for censoring in analyses involving competing events, because neglecting to do so may affect model outcomes.
• Donoghoe M.W.
• Gebski V.
The importance of censoring in competing risks analysis of the subdistribution hazard.
Hence, there is a need for guidance on how to implement competing events in DES models informed by censored IPD.
This study aims to describe, illustrate, and compare modeling approaches for implementing competing events in DES models informed by censored IPD. Modeling approaches are compared in a simulation study, and potential differences in terms of health economic outcomes are analyzed for a case study in colorectal cancer.

## Methods

Censoring can be classified as informative or noninformative and according to whether individuals are interval, left, or right censored.
• Leung K.M.
• Elashoff R.M.
• Afifi A.A.
Censoring issues in survival analysis.
Here, focus is on noninformative, right-censored data, because this type of censoring is most common in a clinical and health economic context. Two modeling approaches for implementing competing events in DES models informed by censored IPD were defined: (1) using event-specific time-to-event distributions (ESD) and (2) using a unimodal time-to-event distribution and regression (UDR) model. The approaches are defined using statistical notation in Supplemental Material 1 found at https://doi.org/10.1016/j.jval.2021.07.016. The code for implementation in R
R Core Team
R: a language and environment for statistical computing.
is provided online: https://personex.nl/research/competing-risks/. Modeling approaches using event-specific probabilities and distributions (ESPD) or a multimodal distribution and regression model (MDR), which have been recommended for uncensored IPD,
• Degeling K.
• Koffijberg H.
• Franken M.D.
• Koopman M.
• IJzerman M.J.
Comparing strategies for modeling competing risks in discrete-event simulations: a simulation study and illustration in colorectal cancer.
are not presented here because their implementation was considered too cumbersome for censored IPD (see Discussion for a more detailed discussion).

### Event-Specific Time-to-Event Distributions

The ESD modeling approach uses event-specific time-to-event distributions to sample a time-to-event for each competing event and subsequently selects the first event to occur to be simulated. See Box 1 for a pseudoalgorithm.
Pseudoalgorithm for the ESD approach to model the time to event T for k mutually exclusive competing events.
Data Analysis:
• For each competing event $j,j=1,…,k$, fit a time-to-event distribution $Dj$:
• For each individual $i,i=1,…,n$, define observed censoring indicator $cij$ that indicates whether individual i experienced event j $(cij=1)$ or not $(cij=0)$.
• Parameterize cause-specific likelihood function $Lj(t1,…,tn|ϕj)$ according to a specific distribution type.
• Estimate parameters $ϕj$ that define distribution $Dj$ by maximum likelihood.
Simulation:
• Obtain time to events for each competing event:
• Draw a time $tj$ for each competing event $j,j=1,…,k$ by performing a random draw from the corresponding distribution $Dj$.
• Select the competing event to occur:
• Select event j with the lowest time to event (ie, the first event to occur).
• Simulate the selected event j at the corresponding time $tj$.
Note. ESD indicates event-specific distribution.
The ESD are estimated according to a cause-specific hazards model, which assumes that the risk set only includes individuals who are event free at the respective point in time.
• Austin P.C.
• Lee D.S.
• Fine J.P.
Introduction to the analysis of survival data in the presence of competing risks.
When fitting the ESDs, both censored individuals and individuals who experienced a competing event are considered to be censored. This is a strong assumption that was believed to negatively affect the performance of the ESD approach for uncensored IPD,
• Degeling K.
• Koffijberg H.
• Franken M.D.
• Koopman M.
• IJzerman M.J.
Comparing strategies for modeling competing risks in discrete-event simulations: a simulation study and illustration in colorectal cancer.
given that it is unlikely that censoring that arises from an occurring competing event would be noninformative.
In a simulation according to the ESD approach, a time for each competing event needs to be sampled from each corresponding time-to-event distribution. Subsequently, the event that is the first to occur, that is, the event corresponding to the lowest sampled time to event, is selected and will be simulated.

### Unimodal Joint Time-to-Event Distribution and Regression Model

The UDR modeling approach samples the time at which an event will occur using a single time-to-event distribution that represents the minimum of all competing time to events, that is, the time-to-event for the event that is observed. A regression model is subsequently used to predict to which event the selected time corresponds. See Box 2 for a pseudoalgorithm.
Pseudoalgorithm for the UDR approach to model the time to event T for k mutually exclusive competing events.
Data Analysis:
• For all competing events combined, fit a single unimodal time-to-event distribution D:
• For each individual $i,i=1,…,n$, define observed censoring indicator $ci$ so that individual i experienced any event $(ci=1)$ or is censored $(ci=0)$.
• Parameterize likelihood function $L(t1,…,tn|ϕ)$ according to a specific unimodal distribution type.
• Estimate parameters ϕ that define distribution D by maximum likelihood.
• Fit a (multinomial) logistic regression model to predict the competing event to occur:
• Fit (multinomial) logistic regression model $r(t)$ that predicts probabilities $P(Cj=1|T=t)$ of each competing event $j,j=1,…,k$ to occur (dependent variable) based on time t (independent variable).
Simulation:
• Obtain a time to event for an event to occur:
• Draw a time t for the event to occur by performing a random draw from unimodal distribution D.
• Select the competing event to occur:
• Obtain probabilities $P(Cj=1|T=t)$ for each competing event $j,j=1,…,k$ to occur based on time to event t, using (multinomial) logistic regression model $r(t)$.
• Draw a random number from a Uniform distribution $U[0,1]$.
• Select event j to occur using event probabilities $P(Cj=1|T=t)$ and the random number.
• Simulate selected event j at time t.
Note. UDR indicates unimodal distribution and regression.
Here, we assume that the single time-to-event distribution is unimodal. In theory, this distribution could also be multimodal, such as a mixture or flexible parametric distribution, but estimation of such distributions may prove challenging for censored IPD (see Discussion). Because the time-to-event distribution represents all competing events, only 1 hazard function needs to be modeled. In doing so, all competing events are treated as 1 event, and consequently, an individual is either censored or experienced any type of event that is not censored.
The event corresponding to a time to event is modeled using a (multinomial) logistic regression model. A logistic regression model can be used to model binary data, that is, 2 competing events, whereas a multinomial logistic regression model is required to model more than 2 competing events. The time to event is included as a predictor variable in this regression, which may involve the use of transformations or splines to accurately model the relation between the time to event and type of event (response variable). Transformations of the time to event were not considered in the current simulation or case study, because this process is hard to automate (see Simulation Study).
A simulation according to the UDR approach is performed by first sampling a time from the time-to-event distribution and subsequently an event using the (multimodal) logistic regression model conditional on the sampled time.

### Simulation Study to Compare the Performance of the Approaches

To compare the modeling approaches’ performance for different data and disease pathway characteristics, different hypothetical datasets were simulated by first sampling which event would occur based on event-specific probabilities and then conditionally on the sampled event, sampling a time from the corresponding event-specific time-to-event distribution (see Supplemental Material 2 found at https://doi.org/10.1016/j.jval.2021.07.016 for the ESPD parameters). The conceptual model structure used for this simulation study is presented in Appendix Figure 1 (see Supplemental Material 3 found at https://doi.org/10.1016/j.jval.2021.07.016). As illustrated in Figure 1, simulations were performed for scenarios including different numbers of competing events $nevent$ ($nevent=2,3,4$), level of overlap of the corresponding competing time-to-event distributions $poverlap$ ($poverlap≈10%,50%,90%$, ie low, medium, or high; see Supplemental Material 4 found at https://doi.org/10.1016/j.jval.2021.07.016), sample sizes $nsample$ ($nsample=50,150,500$), and levels of censoring $pcensoring$ ($pcensoring≈0%,10%,30%,60%$). Two types of noninformative right censoring were considered: random censoring and follow-up censoring. In the “random censoring” approach, individuals were censored at a random point before their event would have happened. In the follow-up “censoring” approach, individuals were censored if their time to event exceeded a certain threshold, representing the scenario in which there is a maximum follow-up time per individual. For the follow-up censoring approach, $pcensoring≈60%$ could not be applied, because that would censor all observations of some events. Similarly, $pcensoring≈30%$ could not be applied for $poverlap≈90%$ because of convergence issues.
For each unique combination of the censoring approach, $nevent$, $poverlap$, $nsample$, and $pcensoring$, a total of 10 000 simulation runs were performed. In each simulation run, an uncensored sample $suncensored$ of corresponding sample size $nsample$ was sampled based on ESPD parameters. This sample $suncensored$ was right censored to censoring level $pcensoring$ according to the censoring approach to obtain censored sample $scensored$. Next, both modeling approaches were applied to analyze sample $scensored$ and, subsequently, to simulate event incidences and time to events for 100 000 new individuals to obtain a simulation sample $ssimulation$ for each modeling approach. Finally, the modeling approaches’ performance was assessed by comparing event incidences and time-to-event distributions in these newly simulated samples $ssimulation$ with those in the corresponding uncensored sample $suncensored$.
Bias in terms of relative absolute incidence difference (RAID) was used as performance measure for the event incidence:
$Equation 1.$
(1)

The bias with regard to simulated time-to-event distributions was based on the relative entropy, that is, the Kullback–Leibler divergence (KLD), which is a measure of the difference between probability distribution $f(t)$ and $g(t)$, for which lower values indicate a better performance
• Cover T.M.
• Thomas J.A.
Elements of Information Theory.
:
$Equation 2.$
(2)

Both RAID and KLD were calculated separately for each competing event. To summarize the RAID and KLD across competing events per scenario, event-specific performance outcomes were weighted according to theoretical event incidences (see Supplemental Material 2 found at https://doi.org/10.1016/j.jval.2021.07.016).
The simulation study was performed in R version 3.4.1.
R Core Team
R: a language and environment for statistical computing.
Time-to-event data were simulated and analyzed using Weibull distributions to rule out potential bias because of mismatching distributions. Weibull distributions were selected because these distributions are commonly used in survival analysis and showed to accurately represent the distribution of the time-to-event data in the case study (see Case Study). The nnet package was used to estimate (multinomial) logistic regression models
• Venables W.N.
• Ripley B.D.
Modern Applied Statistics With S.
and the flexmix package to calculate KLD.
• Grün B.
• Leisch F.
Fitting finite mixtures of generalized linear regressions in R.
• Grün B.
• Leisch F.
FlexMix version 2: finite mixtures with concomitant variables and varying and constant parameters.
• Leisch F.
FlexMix a general framework for finite mixture models and latent class regression in R.

### Case Study to Illustrate the Potential Impact on Health Economic Outcomes

A case study based on data from the randomized phase III CAIRO3 study of the Dutch Colorectal Cancer Group (NCT00442637) was performed to illustrate the potential impact of the different modeling approaches on health economic outcomes in a real-world scenario. The CAIRO3 study randomized 558 metastatic colorectal cancer patients with stable disease or better after 6 cycles of capecitabine, oxaliplatin, and bevacizumab induction therapy to either capecitabine and bevacizumab maintenance treatment (intervention) or observation (control) until progression of disease.
• Simkens L.H.J.
• van Tinteren H.
• May A.
• et al.
Maintenance treatment with capecitabine and bevacizumab in metastatic colorectal cancer (CAIRO3): a phase 3 randomised controlled trial of the Dutch Colorectal Cancer Group.
For both the maintenance and observation strategy, capecitabine, oxaliplatin, and bevacizumab treatment was to be reintroduced upon progression and continued until second progression, which was the primary endpoint of the study. For the case study, we adapted a DES that was previously developed for comparison with a discrete-time cohort STM in a health economic evaluation of the CAIRO3,
• Degeling K.
• Franken M.D.
• May A.M.
• et al.
Matching the model with the evidence: comparing discrete event simulation and state-transition modeling for time-to-event predictions in a cost-effectiveness analysis of treatment in metastatic colorectal cancer patients.
to allow for simulations according to the 2 modeling approaches. The DES model was structured according to the treatment stages used in the CAIRO3 study: postinduction, reintroduction, salvage, and death (see Appendix Fig. 1 in Supplemental Material 3 found at https://doi.org/10.1016/j.jval.2021.07.016).
In addition to an analysis based on the complete CAIRO3 patient cohort, clinically relevant subgroup analyses were performed to illustrate potential sample size impact on modeling outcomes. A total of 8 subgroups with sample sizes ranging from 50 to 410 were defined according to patient characteristics that were found relevant in the evaluation of the CAIRO3 study,
• Simkens L.H.J.
• van Tinteren H.
• May A.
• et al.
Maintenance treatment with capecitabine and bevacizumab in metastatic colorectal cancer (CAIRO3): a phase 3 randomised controlled trial of the Dutch Colorectal Cancer Group.
that is, treatment response (stable disease vs complete or partial response) and stage of disease (synchronous vs metachronous). See Supplemental Material 5 found at https://doi.org/10.1016/j.jval.2021.07.016 for an overview of these subgroups and their event incidences. Different levels of censoring $pcensoring$ ($pcensoring≈0%,10%,30%,60%$) were applied for each subgroup analysis to assess the impact of this data characteristic on the modeling approaches’ performance and health economic outcomes. Censoring was performed according to the 2 censoring approaches as for the simulation study, with $pcensoring≈60%$ not considered for follow-up censoring (discussed earlier). For each subgroup, censoring approach, and $pcensoring$ combination, a probabilistic analysis was performed based on 5000 runs of 10 000 patients per treatment strategy in each run. Uncertainty in time-to-event parameters was reflected using a nonparametric bootstrap approach,
• Degeling K.
• IJzerman M.J.
• Koopman M.
• Koffijberg H.
Accounting for parameter uncertainty in the definition of parametric distributions used to describe individual patient variation in health economic models.
and uncertainty in other parameters was reflected using standard parametric distributions according standard practice, for example, beta distributions for health utility values (see Degeling et al
• Degeling K.
• Franken M.D.
• May A.M.
• et al.
Matching the model with the evidence: comparing discrete event simulation and state-transition modeling for time-to-event predictions in a cost-effectiveness analysis of treatment in metastatic colorectal cancer patients.
for details).

## Results

The results of the simulation study are summarized in Figure 2, which visualizes trends in the bias of the modeling approaches according to the data characteristics. Results for selected scenarios are presented in Table 1 and Figure 3, whereas results for all scenarios of the simulation study are presented in Supplemental Material 6 found at https://doi.org/10.1016/j.jval.2021.07.016. For the case study, cost-effectiveness outcomes are presented for selected subgroup analyses in Figure 4 and for all subgroup analyses in Supplemental Materials 7 and 8 found at https://doi.org/10.1016/j.jval.2021.07.016. In summarizing these results, the following section refers to censoring levels of 10%, 30%, and 60% as low, moderate, and high, respectively.
Table 1Mean weighted bias (95% CI) of the modeling approaches for selected scenarios in the simulation study (lower is better).
Number of eventsDistribution overlap (%)Censored proportion (%)Sample sizeRelative absolute incidence difference (%)Relative entropy (Kullback-Leibler divergence)
Random censoringCensoring owing to maximum follow-upRandom censoringCensoring owing to maximum follow-up
ESDUDRESDUDRESDUDRESDUDR
21010506.3 (0.3, 16.3)17.8 (5.7, 28.3)10.3 (5.6, 17.4)
Results for the modeling approach that performed best with regard to that respective outcome (ie, lower bias) if the corresponding mean bias was outside the CI of the other approach.
15.8 (12.0, 21.2)0.231 (0.126, 0.371)0.257 (0.126, 0.424)0.226 (0.118, 0.367)0.190 (0.085, 0.336)
210105005.0 (1.9, 8.3)
Results for the modeling approach that performed best with regard to that respective outcome (ie, lower bias) if the corresponding mean bias was outside the CI of the other approach.
17.3 (13.7, 20.7)9.7 (8.0, 11.9)
Results for the modeling approach that performed best with regard to that respective outcome (ie, lower bias) if the corresponding mean bias was outside the CI of the other approach.
15.5 (13.7, 17.5)0.208 (0.173, 0.245)0.237 (0.196, 0.282)0.206 (0.171, 0.244)0.173 (0.137, 0.213)
21030507.9 (0.3, 22.1)
Results for the modeling approach that performed best with regard to that respective outcome (ie, lower bias) if the corresponding mean bias was outside the CI of the other approach.
28.9 (11.7, 46.9)25.1 (9.4, 53.5)14.7 (2.1, 25.3)0.245 (0.135, 0.389)
Results for the modeling approach that performed best with regard to that respective outcome (ie, lower bias) if the corresponding mean bias was outside the CI of the other approach.
0.419 (0.268, 0.613)0.404 (0.143, 1.266)0.201 (0.054, 0.454)
210305002.5 (0.1, 7.0)
Results for the modeling approach that performed best with regard to that respective outcome (ie, lower bias) if the corresponding mean bias was outside the CI of the other approach.
28.4 (22.9, 33.8)26.6 (19.1, 37.1)14.1 (9.7, 17.7)
Results for the modeling approach that performed best with regard to that respective outcome (ie, lower bias) if the corresponding mean bias was outside the CI of the other approach.
0.214 (0.180, 0.252)
Results for the modeling approach that performed best with regard to that respective outcome (ie, lower bias) if the corresponding mean bias was outside the CI of the other approach.
0.395 (0.348, 0.446)0.245 (0.181, 0.416)0.140 (0.081, 0.211)
Results for the modeling approach that performed best with regard to that respective outcome (ie, lower bias) if the corresponding mean bias was outside the CI of the other approach.
210605017.1 (0.7, 44.9)31.1 (6.6, 58.5)--0.315 (0.169, 0.539)
Results for the modeling approach that performed best with regard to that respective outcome (ie, lower bias) if the corresponding mean bias was outside the CI of the other approach.
0.553 (0.378, 0.763)--
2106050012.2 (1.7, 23.0)
Results for the modeling approach that performed best with regard to that respective outcome (ie, lower bias) if the corresponding mean bias was outside the CI of the other approach.
30.4 (22.9, 38.4)--0.247 (0.208, 0.288)
Results for the modeling approach that performed best with regard to that respective outcome (ie, lower bias) if the corresponding mean bias was outside the CI of the other approach.
0.515 (0.463, 0.568)--
29010503.9 (0.2, 10.0)5.8 (0.3, 14.9)5.3 (0.2, 14.5)4.3 (0.2, 12.3)0.054 (0.020, 0.120)0.058 (0.022, 0.126)0.050 (0.015, 0.121)0.047 (0.015, 0.113)
290105001.2 (0.0, 3.4)
Results for the modeling approach that performed best with regard to that respective outcome (ie, lower bias) if the corresponding mean bias was outside the CI of the other approach.
4.6 (1.4, 7.8)3.0 (0.2, 6.6)1.4 (0.1, 4.0)0.012 (0.005, 0.023)0.016 (0.008, 0.027)0.013 (0.005, 0.026)0.011 (0.004, 0.021)
29030507.5 (0.3, 20.8)9.8 (0.4, 25.8)11.6 (0.5, 30.9)10.4 (0.4, 29.4)0.067 (0.022, 0.151)0.074 (0.025, 0.160)0.074 (0.015, 0.214)0.065 (0.017, 0.173)
290305002.4 (0.1, 6.7)
Results for the modeling approach that performed best with regard to that respective outcome (ie, lower bias) if the corresponding mean bias was outside the CI of the other approach.
8.0 (2.6, 13.3)6.1 (0.3, 14.0)7.9 (2.2, 14.1)0.016 (0.008, 0.028)0.024 (0.013, 0.039)0.017 (0.005, 0.038)0.014 (0.006, 0.031)
290605014.5 (0.5, 40.0)14.6 (0.6, 38.2)--0.127 (0.034, 0.318)0.118 (0.036, 0.262)--
290605004.8 (0.2, 13.3)
Results for the modeling approach that performed best with regard to that respective outcome (ie, lower bias) if the corresponding mean bias was outside the CI of the other approach.
12.3 (4.8, 20.0)--0.042 (0.022, 0.067)0.050 (0.028, 0.077)--
410105013.3 (7.9, 19.9)31.9 (13.2, 51.4)15.4 (10.8, 18.3)18.4 (14.5, 24.5)0.654 (0.508, 0.830)0.587 (0.415, 0.795)0.651 (0.498, 0.864)0.645 (0.499, 0.823)
4101050011.0 (9.4, 12.7)
Results for the modeling approach that performed best with regard to that respective outcome (ie, lower bias) if the corresponding mean bias was outside the CI of the other approach.
27.9 (21.7, 34.4)17.0 (11.4, 19.4)
Results for the modeling approach that performed best with regard to that respective outcome (ie, lower bias) if the corresponding mean bias was outside the CI of the other approach.
19.2 (17.1, 20.7)0.622 (0.579, 0.667)0.550 (0.500, 0.602)
Results for the modeling approach that performed best with regard to that respective outcome (ie, lower bias) if the corresponding mean bias was outside the CI of the other approach.
0.667 (0.585, 0.773)0.601 (0.552, 0.660)
410305016.8 (5.9, 30.6)
Results for the modeling approach that performed best with regard to that respective outcome (ie, lower bias) if the corresponding mean bias was outside the CI of the other approach.
43.3 (22.6, 73.2)--0.646 (0.501, 0.830)0.682 (0.525, 0.875)--
4103050011.2 (7.0, 16.4)
Results for the modeling approach that performed best with regard to that respective outcome (ie, lower bias) if the corresponding mean bias was outside the CI of the other approach.
36.6 (29.5, 44.7)--0.605 (0.559, 0.652)0.649 (0.600, 0.699)--
410605027.2 (7.4, 52.8)48.4 (25.3, 84.3)--0.670 (0.518, 0.868)0.742 (0.583, 0.938)--
4106050017.6 (8.1, 27.1)
Results for the modeling approach that performed best with regard to that respective outcome (ie, lower bias) if the corresponding mean bias was outside the CI of the other approach.
40.2 (32.8, 48.8)--0.609 (0.563, 0.656)
Results for the modeling approach that performed best with regard to that respective outcome (ie, lower bias) if the corresponding mean bias was outside the CI of the other approach.
0.715 (0.658, 0.773)--
49010507.6 (2.1, 14.9)16.5 (7.3, 27.5)10.9 (4.0, 18.4)11.4 (3.7, 22.5)0.156 (0.062, 0.322)0.205 (0.090, 0.415)0.176 (0.065, 0.394)0.139 (0.054, 0.286)
490105003.3 (1.4, 5.6)
Results for the modeling approach that performed best with regard to that respective outcome (ie, lower bias) if the corresponding mean bias was outside the CI of the other approach.
13.3 (10.4, 16.5)9.7 (6.9, 12.7)7.8 (4.7, 10.9)0.089 (0.057, 0.124)
Results for the modeling approach that performed best with regard to that respective outcome (ie, lower bias) if the corresponding mean bias was outside the CI of the other approach.
0.127 (0.094, 0.165)0.100 (0.064, 0.145)0.067 (0.041, 0.100)
490305013.2 (3.7, 25.6)25.5 (11.0, 44.6)--0.179 (0.074, 0.358)0.245 (0.115, 0.455)--
490305004.4 (1.3, 8.5)
Results for the modeling approach that performed best with regard to that respective outcome (ie, lower bias) if the corresponding mean bias was outside the CI of the other approach.
20.4 (15.7, 25.4)--0.095 (0.062, 0.132)
Results for the modeling approach that performed best with regard to that respective outcome (ie, lower bias) if the corresponding mean bias was outside the CI of the other approach.
0.161 (0.122, 0.206)--
490605024.2 (6.6, 47.6)34.4 (15.0, 62.4)--0.269 (0.113, 0.544)0.296 (0.144, 0.502)--
490605008.2 (2.4, 16.6)
Results for the modeling approach that performed best with regard to that respective outcome (ie, lower bias) if the corresponding mean bias was outside the CI of the other approach.
28.5 (21.6, 36.3)--0.130 (0.090, 0.175)
Results for the modeling approach that performed best with regard to that respective outcome (ie, lower bias) if the corresponding mean bias was outside the CI of the other approach.
0.201 (0.153, 0.252)--
CI indicates confidence interval; ESD, event-specific distribution; UDR, unimodal distribution and regression.
Results for the modeling approach that performed best with regard to that respective outcome (ie, lower bias) if the corresponding mean bias was outside the CI of the other approach.

### Bias of the ESD and UDR Approaches in the Simulation Study

A higher number of competing events resulted in worsened (ie, higher) RAID of 8.4%, 13.9%, and 17.1% on average over all other scenario variables for 2, 3, and 4 events, respectively. Higher overlap between time-to-event distributions resulted in improved (ie, lower) RAID of 18.6%, 12.7%, and 7.5% on average for 10%, 50%, and 90% overlap, respectively. Although the impact was lower than the other scenario variables, higher sample sizes resulted in improved RAID of 14.5%, 12.6%, and 11.7% on average for a sample size of 50, 150, and 500 individuals, respectively. Higher levels of censoring resulted in worsened RAID of 7.0%, 11.5%, 16.9%, and 21.2% on average for levels of 0%, 10%, 30%, and 60% censoring, respectively.
On average, the ESD modeling approach (9.8%) performed better in terms of RAID than the UDR approach (16.1%). Nevertheless, this was mainly because of a substantial difference in performance under random censoring (ESD 8.6%, UDR 18.8%), whereas under follow-up censoring the difference was limited (ESD 11.6%, UDR 11.9%). In the scenarios including random censoring, the ESD approach clearly performed better, with significantly better performance in many experiments. Under follow-up censoring, the ESD approach performed better for low censoring and the UDR approach for moderate censoring, unless when overlap was high.
Trends in performance for KLD were in line with those for RAID. A higher number of competing events resulted in worsened (ie, higher) KLD of 0.117, 0.300, and 0.371 on average over all other scenario variables for 2, 3, and 4 events, respectively. Higher overlap between time-to-event distributions resulted in improved (ie, lower) KLD of 0.443, 0.251, and 0.078 on average for 10%, 50%, and 90% overlap, respectively. Although the impact was lower than the other scenario variables, higher sample sizes resulted in improved KLD of 0.288, 0.247, and 0.237 on average for a sample size of 50, 150, and 500 individuals, respectively. Higher levels of censoring resulted in worsened KLD of 0.189, 0.255, 0.279, and 0.360 on average for levels of 0%, 10%, 30%, and 60% censoring, respectively.
On average, the UDR modeling approach (0.238) performed better in terms of KLD than the ESD approach (0.277). Nevertheless, this was mainly because of a substantial difference in performance under follow-up censoring (UDR 0.179, ESD 0.274), whereas under random censoring the difference was limited (UDR 0.277, ESD 0.278). Under follow-up censoring, the UDR approach overall performed better, with significantly better performance in one-third of the experiments. In the scenarios including random censoring, the UDR approach performed better than the ESD approach when there was no censoring. The ESD approach performed better in scenarios including moderate or high levels of random censoring. For low levels of random censoring, the ESD approach overall performed better for scenarios including 2 competing events, whereas for 3 or 4 events no overall best-performing approach could be identified.

### Impact of Health Economic Outcomes in the Case Study

The case study demonstrated that censoring had an impact on cost-effectiveness point estimates (Fig. 4 and Supplemental Material 7 and 8 found at https://doi.org/10.1016/j.jval.2021.07.016). For example, for the full cohort (n = 558), the net monetary benefit at a willingness to pay of €20 000 per quality-adjusted life-year gained changed from −€22 665 and −€22 130 without censoring to −€22 246 and −€27 078 at 60% random censoring, for ESD and UDR, respectively. This impact overall was larger for clinical subgroups with smaller sample sizes. No consistent direction in the bias introduced by increased censoring could be identified. High levels of censoring consistently resulted in increased uncertainty surrounding cost-effectiveness point estimates. For most clinical subgroups, the impact of censoring was comparable between modeling approaches.

## Discussion

It is widely known that censoring needs to be accounted for in health economic analyses, and our results substantiate this for DES models including competing events. Previous research on modeling approaches for implementing competing events in DES models argued a limited generalizability of recommendations for scenarios “without censoring” to those “with censoring” because of anticipated differences in performance following different implementations of the modeling approaches.
• Degeling K.
• Koffijberg H.
• Franken M.D.
• Koopman M.
• IJzerman M.J.
Comparing strategies for modeling competing risks in discrete-event simulations: a simulation study and illustration in colorectal cancer.
Our study indeed demonstrates that increased levels of censoring affected the performance of modeling approaches in the simulation study and cost-effectiveness outcomes in the case study. Additionally, moderate (30%) or higher levels of censoring inflated uncertainty in cost-effectiveness estimates and, hence, the presence of censoring may affect decision uncertainty and the value of collecting further information.
Which modeling approach is preferable based on the simulation study results depends on the type of noninformative right censoring and whether accuracy in event incidences or time-to-event distributions is considered more important. Guidance for selecting a modeling approach for uncensored IPD based on Degeling et al
• Degeling K.
• Koffijberg H.
• Franken M.D.
• Koopman M.
• IJzerman M.J.
Comparing strategies for modeling competing risks in discrete-event simulations: a simulation study and illustration in colorectal cancer.
and for censored IPD based on the current study is presented in Figure 5. It is important to realize that this summary cannot possibly capture all nuances, such as the interaction between the overlap of competing time-to-event distributions and level of censoring, but it may serve as a general guide for identifying an appropriate modeling approach. As can be seen from the provided R code, both approaches are straightforward to implement, although the ESD may be considered slightly more practical because it only involves time-to-event modeling, whereas the UDR approach also involves (multinomial) logistic regression modeling conditional on the time to event. Importantly, modelers should be aware of the different approaches that are available and perform extensive internal validation to inform which approach will be used or, at least, verify that the chosen approach extrapolates the event incidences and time-to-events appropriately.
These findings are partly in line with the Professional Society for Health Economics and Outcomes Research and Society for Medical Decision Making modeling good research practices guidelines
• Karnon J.
• Stahl J.
• Brennan A.
• et al.
Modeling using discrete event simulation: a report of the ISPOR-SMDM modeling good research practices task Force-4.
to first sample the time to event from a joint time-to-event distribution and then sample the corresponding event, which corresponds to the UDR approach. Nevertheless, it is important to realize this recommendation did not discuss the impact of using censored IPD. Furthermore, the performance of this strategy heavily depends on the absence of multimodality in the combined time-to-event distribution, which is why mixture distributions were used in the study providing guidance for uncensored IPD.
• Degeling K.
• Koffijberg H.
• Franken M.D.
• Koopman M.
• IJzerman M.J.
Comparing strategies for modeling competing risks in discrete-event simulations: a simulation study and illustration in colorectal cancer.
Because it uses event-specific time-to-event distributions, the ESD approach is better able to represent distributions with low overlap.
The results of the simulation study demonstrate that decision analytic modelers should be aware of the alternative modeling approaches available and that these might result in different levels of performance and health economic outcomes. Extensive (reporting of) validation efforts is essential to assess whether IPD are appropriately represented. Nevertheless, we acknowledge that validation in the presence of censoring can be challenging, because common measures to assess performance of the modeling approaches, such as the RAID and KLD, cannot be used for censored data. Nevertheless, these measures could be used in this study, because the simulated event incidences and time-to-event distributions could be compared with the “uncensored truth” (Fig. 1), which will not be available for studies using censored data in practice. Although less straightforward to interpret and challenging to combine into 1 performance measure for modeling approaches as a whole, alternative measures are available to assess discrimination and calibration of single survival models, while accounting for censoring.
• Rahman M.S.
• Ambler G.
• Choodari-Oskooei B.
• Omar R.Z.
Review and evaluation of performance measures for survival prediction models in external validation settings.
An example is Demler et al’s Greenwood-Nam-D’Agostino statistic,
• Demler O.V.
• Paynter N.P.
• Cook N.R.
Tests of calibration and goodness-of-fit in the survival setting.
which is a modification of Hosmer-Lemeshow’s statistic by Nam and D’Agostino.
• D’Agostino R.B.
• Nam B.-H.
Evaluation of the performance of survival analysis models: discrimination and calibration measures.
Another point of consideration is whether multivariable models are being developed to reflect patient heterogeneity, so that patient characteristics or treatment histories influence the occurrence and timing of events. Multivariable models are most easily implemented for the UDR approach, especially for scenarios including more than 2 competing events, because the variable selection procedures, for example, only need to be performed for 1 survival model and 1 (multinomial) logistic regression model. Nevertheless, further research is needed to assess potential differences between approaches when used to reflect such heterogeneity.
In this article, we did not include an ESPD modeling approach based on the strategy of sampling an event first and the corresponding time-to-event second, nor did we include a MDR modeling approach based on the same strategy as UDR. Although both approaches were included in the study considering uncensored IPD
• Degeling K.
• Koffijberg H.
• Franken M.D.
• Koopman M.
• IJzerman M.J.
Comparing strategies for modeling competing risks in discrete-event simulations: a simulation study and illustration in colorectal cancer.
and we have successfully implemented them for censored IPD, we found their implementation unproportionally cumbersome for censored IPD without clear benefits in terms of performance compared with ESD and UDR. The ESPD approach requires event-specific probabilities and time-to-event distributions to be estimated, for which there is no support in standard statistical software packages, requiring modelers to define and optimize custom likelihood functions themselves, preferably for different parametric distribution types. Although mixture distributions can theoretically be used to implement the multimodal time-to-event distribution in an MDR approach,
• Bordes L.
• Chauveau D.
Stochastic EM algorithms for parametric and semiparametric mixture models for right-censored lifetime data.
,
• Davenport J.W.
• Bezdek J.C.
• Hathaway R.J.
Parameter estimation for finite mixture distributions.
we believe this approach will unlikely be convenient in practice. It required the ESPD approach to be applied to generate start values to increase the probability of convergence in the maximum likelihood estimation. We found convergence and performance were poor compared with the other modeling approaches. Another approach that could be explored in further research is to use flexible parametric survival distributions, such as survival splines,
• Royston P.
• Parmar M.K.
Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects.
to model multimodal time-to-event distributions.
This study has certain limitations. First, we did not consider transformations of time in the (multinomial) logistic regression model for the UDR approach, which may have negatively affected its performance with regard to RAID because a linear relation does not necessarily best describe the relation between the time to event and event probabilities. Second, only noninformative and right-censored data were considered. Hence, results and recommendations cannot be generalized to scenarios involving informative, interval-, or left-censored data. Third, generalizability is also limited by the deliberate decision to use Weibull distributions in the simulation study. Although this allowed for an unbiased comparison of the modeling approaches, underlying distributions are unknown in practice and performance of the approaches may vary depending on the flexibility of selected distributions to describe the data. Finally, we generated datasets by first sampling an event based on event-specific probabilities and conditionally sampling times from event-specific time-to-event distributions, which may have benefited one of the approaches. Alternatively, datasets can be generated according mechanisms that are consistent with the ESD or UDR approach. Further research is warranted into all these aspects.

## Conclusions

Censoring has an impact on the performance of modeling approaches to implement competing events in DES models and, thereby, affects cost-effectiveness point estimates. When IPD are censored at random times, the ESD modeling approach performed best in the simulation study. When censoring occurs because of a maximum follow-up time for 2 competing events, the ESD approach performed best for low levels of censoring (ie, 10%) and the UDR approach for moderate levels of censoring (ie, 30%). For scenarios including 3 or 4 competing events and follow-up censoring, the UDR approach represented the time-to-event distributions more accurately, whereas the ESD approach performed better in terms of the event incidences. Nevertheless, substantial differences in performance between the modeling approaches for different scenarios highlighted the need for extensive validation efforts by modelers to assure IPD are appropriately represented.

## Article and Author Information

Author contributions:
Concept and design: Degeling, Koffijberg
Acquisition of data: Degeling, Koopman
Analysis and interpretation of data: Degeling, IJzerman, Koopman, Clements, Koffijberg
Drafting of the manuscript: Degeling, Groothuis-Oudshoorn, Franken, Koffijberg
Critical revision of the paper for important intellectual content: Degeling, IJzerman, Groothuis-Oudshoorn, Franken, Koopman, Clements, Koffijberg
Statistical analysis: Degeling, Groothuis-Oudshoorn, Clements, Koffijberg
Provision of study materials or patients: Franken
Obtaining funding: IJzerman
Supervision: IJzerman
Conflict of Interest Disclosures: Dr IJzerman reported receiving grants, nonfinancial support, and an advisory board honorarium paid to his institution from Illumina. Dr Koopman reported receiving payments to her institution from Bayer, Bristol Myers Squibb, Merck, Roche, Servier, and Pierre Fabre outside the submitted work. Dr IJzerman is an editor for Value in Health and had no role in the peer review process. No other disclosures were reported.
Funding/Support: The authors received no financial support for this research.

• Appendix

## References

• Annemans L.
• Redekop K.
• Payne K.
Current methodological issues in the economic assessment of personalized medicine.
Value Health. 2013; 16: S20-S26
• Caro J.J.
• Möller J.
• Getsios D.
Discrete event simulation: the preferred technique for health economic evaluations?.
Value Health. 2010; 13: 1056-1060
• Miller J.D.
• Foley K.A.
• Russell M.W.
Current challenges in health economic modeling of cancer therapies: a research inquiry.
Am Health Drug Benefits. 2014; 7: 153-162
• Caro J.J.
• Briggs A.H.
• Siebert U.
• Kuntz K.M.
ISPOR-SMDM Modeling Good Research Practices Task Force. Modeling good research practices—overview: a report of the ISPOR-SMDM modeling good research practices task Force-1.
Value Health. 2012; 15: 796-803
• Degeling K.
• Franken M.D.
• May A.M.
• et al.
Matching the model with the evidence: comparing discrete event simulation and state-transition modeling for time-to-event predictions in a cost-effectiveness analysis of treatment in metastatic colorectal cancer patients.
Cancer Epidemiol. 2018; 57: 60-67
• Karnon J.
• Haji Ali Afzali H.
When to use discrete event simulation (DES) for the economic evaluation of health technologies? A review and critique of the costs and benefits of DES.
Pharmacoeconomics. 2014; 32: 547-558
• Standfield L.
• Comans T.
• Scuffham P.
Markov modeling and discrete event simulation in health care: a systematic comparison.
Int J Technol Assess Health Care. 2014; 30: 165-172
• Caro J.J.
• Möller J.
Decision-analytic models: current methodological challenges.
Pharmacoeconomics. 2014; 32: 943-950
• Pintilie M.
Competing Risks: A Practical Perspective.
Wiley, Hoboken, NJ2006
• Barton P.
• Jobanputra P.
• Wilson J.
• Bryan S.
• Burls A.
The use of modelling to evaluate new drugs for patients with a chronic condition: the case of antibodies against tumour necrosis factor in rheumatoid arthritis.
Health Technol Assess. 2004; 8 (iii-91)
• Degeling K.
• Koffijberg H.
• Franken M.D.
• Koopman M.
• IJzerman M.J.
Comparing strategies for modeling competing risks in discrete-event simulations: a simulation study and illustration in colorectal cancer.
Med Decis Making. 2019; 39: 57-73
• Donoghoe M.W.
• Gebski V.
The importance of censoring in competing risks analysis of the subdistribution hazard.
BMC Med Res Methodol. 2017; 17: 52
• Leung K.M.
• Elashoff R.M.
• Afifi A.A.
Censoring issues in survival analysis.
Annu Rev Public Health. 1997; 18: 83-104
• R Core Team
R: a language and environment for statistical computing.
R Foundation for Statistical Computing, Vienna, Austria2019
• Austin P.C.
• Lee D.S.
• Fine J.P.
Introduction to the analysis of survival data in the presence of competing risks.
Circulation. 2016; 133: 601-609
• Cover T.M.
• Thomas J.A.
Elements of Information Theory.
2nd ed. John Wiley, Hoboken, NJ2006
• Venables W.N.
• Ripley B.D.
Modern Applied Statistics With S.
4th ed. Springer, New York, NY2002
• Grün B.
• Leisch F.
Fitting finite mixtures of generalized linear regressions in R.
Comput Stat Data Anal. 2007; 51: 5247-5252
• Grün B.
• Leisch F.
FlexMix version 2: finite mixtures with concomitant variables and varying and constant parameters.
J Stat Softw. 2008; 28: 35
• Leisch F.
FlexMix a general framework for finite mixture models and latent class regression in R.
J Stat Softw. 2004; 11: 18
• Simkens L.H.J.
• van Tinteren H.
• May A.
• et al.
Maintenance treatment with capecitabine and bevacizumab in metastatic colorectal cancer (CAIRO3): a phase 3 randomised controlled trial of the Dutch Colorectal Cancer Group.
Lancet. 2015; 385: 1843-1852
• Degeling K.
• IJzerman M.J.
• Koopman M.
• Koffijberg H.
Accounting for parameter uncertainty in the definition of parametric distributions used to describe individual patient variation in health economic models.
BMC Med Res Methodol. 2017; 17: 170
• Karnon J.
• Stahl J.
• Brennan A.
• et al.
Modeling using discrete event simulation: a report of the ISPOR-SMDM modeling good research practices task Force-4.
Value Health. 2012; 15: 821-827
• Rahman M.S.
• Ambler G.
• Choodari-Oskooei B.
• Omar R.Z.
Review and evaluation of performance measures for survival prediction models in external validation settings.
BMC Med Res Methodol. 2017; 17: 60
• Demler O.V.
• Paynter N.P.
• Cook N.R.
Tests of calibration and goodness-of-fit in the survival setting.
Stat Med. 2015; 34: 1659-1680
• D’Agostino R.B.
• Nam B.-H.
Evaluation of the performance of survival analysis models: discrimination and calibration measures.
Handb Stat. 2003; 23: 1-25
• Bordes L.
• Chauveau D.
Stochastic EM algorithms for parametric and semiparametric mixture models for right-censored lifetime data.
Comput Stat. 2016; 31: 1513-1538
• Davenport J.W.
• Bezdek J.C.
• Hathaway R.J.
Parameter estimation for finite mixture distributions.
Comput Math Appl. 1988; 15: 819-828
• Royston P.
• Parmar M.K.
Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects.
Stat Med. 2002; 21: 2175-2197