Comparative-Effectiveness Research/HTA| Volume 22, ISSUE 7, P772-776, July 01, 2019

# Impact of Nonrandomized Dropout on Treatment Switching Adjustment in the Relapsing–Remitting Multiple Sclerosis CLARITY Trial and the CLARITY Extension Study

Open ArchivePublished:May 16, 2019

## Highlights

• In our previous paper article we applied rank-preserving structural failure time model and iterative parameter estimation analyses to adjust for treatment switching between the Cladribine Tablets Treating Multiple Sclerosis Orally (CLARITY) trial and CLARITY extension study.
• This paper applies an alternative combination of statistical methods for adjusting for treatment switching, which may be useful for situations in which trial characteristics hamper the application of standard adjustment techniques. Results were consistent with those produced by the rank-preserving structural failure time model and iterative parameter estimation methods.
• There was no statistical evidence of a reduction in the cladribine treatment effect during the extension period.

## Abstract

### Objectives

Statistical methods to adjust for treatment switching are commonly applied to randomized controlled trials (RCTs) in oncology. Nevertheless, RCTs with extension studies incorporating nonrandomized dropout require consideration of alternative adjustment methods. The current study used a recognized method and a novel method to adjust for treatment switching in relapsing–remitting multiple sclerosis (MS).

### Methods

The Cladribine Tablets Treating Multiple Sclerosis Orally (CLARITY) RCT evaluated the efficacy of cladribine versus placebo over 96 weeks. Many (but not all) CLARITY participants enrolled in the 96-week CLARITY extension study; placebo-treated patients from CLARITY received cladribine (PP→LL), and cladribine-treated patients were re-randomized to placebo (LL→PP) or continued cladribine (LL→LL). End points were time to first qualifying relapse (FQR) and time to 3-month and 6-month confirmed disability progression (3mCDP, 6mCDP). We aimed to estimate the effectiveness of the LL→PP treatment strategy compared with a counterfactual (unobserved) PP→PP strategy. We applied the commonly used rank-preserving structural failure time model (RPSFTM) and a novel approach that combined propensity score matching (PSM) with inverse probability of censoring weights (IPCW).

### Results

The RPSFTM resulted in LL→PP versus PP→PP hazard ratios (HRs) of 0.48 (95% confidence interval [CI] 0.36-0.62) for FQR, 0.62 (95% CI 0.46-0.84) for 3mCDP, and 0.62 (95% CI 0.44-0.88) for 6mCDP. The PSM+IPCW resulted in HRs of 0.47 (95% CI 0.38-0.63) for FQR, 0.61 (95% CI 0.43-0.86) for 3mCDP, and 0.63 (95% CI 0.40-0.87) for 6mCDP.

### Conclusions

The PSM+IPCW HRs were consistent with those from the RPSFTM, suggesting that the results were not substantially biased by informative dropout, assuming that all relevant confounders were controlled for. There was no statistical evidence of a reduction in the cladribine treatment effect during the extension period.

## Introduction

Statistical methods to adjust for treatment switching are often applied to data from oncology trials,
• Latimer N.R.
• Abrams K.R.
• Amonkar M.M.
• Stapelkamp C.
• Swann R.S.
Adjusting for the confounding effects of treatment switching—The BREAK-3 Trial: dabrafenib versus dacarbazine.
• Latimer N.R.
• Bell H.
• Abrams K.R.
• Amonkar M.M.
• Casey M.
Adjusting for treatment switching in the METRIC study shows further improved overall survival with trametinib compared with chemotherapy.
• Latimer N.R.
• Henshall C.
• Siebert U.
• Bell H.
Treatment switching: statistical and decision-making challenges and approaches.
but are rarely applied to clinical trials in other disease areas. In oncology trials, a switch typically occurs when patients in the control group are permitted to switch onto the intervention treatment after disease progression
• Latimer N.R.
• Abrams K.R.
• Amonkar M.M.
• Stapelkamp C.
• Swann R.S.
Adjusting for the confounding effects of treatment switching—The BREAK-3 Trial: dabrafenib versus dacarbazine.
• Latimer N.R.
• Bell H.
• Abrams K.R.
• Amonkar M.M.
• Casey M.
Adjusting for treatment switching in the METRIC study shows further improved overall survival with trametinib compared with chemotherapy.
• Latimer N.R.
• Henshall C.
• Siebert U.
• Bell H.
Treatment switching: statistical and decision-making challenges and approaches.
while a proportion of nonswitching patients remain on the control treatment.
• Latimer N.R.
• Abrams K.R.
• Amonkar M.M.
• Stapelkamp C.
• Swann R.S.
Adjusting for the confounding effects of treatment switching—The BREAK-3 Trial: dabrafenib versus dacarbazine.
• Latimer N.R.
• Bell H.
• Abrams K.R.
• Amonkar M.M.
• Casey M.
Adjusting for treatment switching in the METRIC study shows further improved overall survival with trametinib compared with chemotherapy.
For this reason, a majority of studies have discussed treatment switching adjustment methods with this context in mind.
• Latimer N.R.
• Abrams K.R.
NICE DSU Technical Support Document 16: Adjusting Survival Time Estimates in the Presence of Treatment Switching.
To adjust for this treatment switching, the National Institute for Health and Care Excellence (NICE) Decision Support Unit (DSU) technical support document (TSD)–16 suggests 4 methods that can be used to adjust time to event efficacy estimates for treatment switching: the rank preserving structural failure time model (RPSFTM), the iterative parameter estimation (IPE) algorithm, inverse probability of censoring weights (IPCW), and a two-stage accelerated failure time adjustment method.
• Latimer N.R.
• Abrams K.R.
NICE DSU Technical Support Document 16: Adjusting Survival Time Estimates in the Presence of Treatment Switching.
The RPSFTM is most commonly applied to randomized controlled trials (RCT) and requires assumptions of randomization and common treatment effect are met.
• Latimer N.R.
• Abrams K.R.
NICE DSU Technical Support Document 16: Adjusting Survival Time Estimates in the Presence of Treatment Switching.
• Robins J.M.
• Tsiatis A.A.
Correcting for noncompliance in randomized trials using rank preserving structural failure time models.
Unfortunately, these assumptions do not strictly hold in more complex designs in which informative dropout occurs or when the average treatment effect received by patients who switch is different from the effect seen in patients originally randomized to the treatment group.
Beyond the methods recommended by Decision Support Unit TSD-16,
• Latimer N.R.
• Abrams K.R.
NICE DSU Technical Support Document 16: Adjusting Survival Time Estimates in the Presence of Treatment Switching.
it is possible to apply alternative methods, specific to the circumstances of the data, to adjust for treatment switching and other sources of confounding. Unlike the RPSFTM, propensity score matching (PSM) coupled with IPCW, together referred to as PSM+IPCW, can be used to account for potential informative dropout. The PSM+IPCW method does not rely on a common treatment effect assumption, but it does require that all relevant confounders be included in the matching process.
• Caliendo M.
• Kopeinig S.
Some practical guidance for the implementation of propensity score matching.
• Rosenbaum P.R.
• Rubin D.B.
The central role of the propensity score in observational studies for casual effects.
The current study presents an application of the RPSFTM and PSM+IPCW to an RCT
• Giovannoni G.
• Comi G.
• Cook S.
• et al.
A placebo-controlled trial of oral cladribine for relapsing multiple sclerosis.
and subsequent extension study
• Comi G.
• Cook S.
• Rammohan K.
• et al.
Long-term effects of cladribine tablets on MRI activity outcomes in patients with relapsing-remitting multiple sclerosis: the CLARITY Extension study.
assessing the efficacy of cladribine tablets (MAVENCLAD®, Merck KGaA), which are used to treat relapsing–remitting multiple sclerosis (MS). Relapsing–remitting MS is a chronic autoimmune neurodegenerative disease that progresses over a long period.
• Weinshenker B.G.
Epidemiology of multiple sclerosis.
Guidelines for the management of this condition advise early treatment with the use of disease-modifying therapies (DMTs), which can favorably alter the course of the disease.
• Goodin D.S.
• Frohman E.M.
• Garmany Jr., G.P.
• et al.
Disease modifying therapies in multiple sclerosis: report of the Therapeutics and Technology Assessment Subcommittee of the American Academy of Neurology and the MS Council for Clinical Practice Guidelines.
Many approved DMTs are administered parenterally or through self-injection, whereas cladribine tablets represent one of a number of orally administered DMT options. Given the possibility that informative dropout occurred between the original RCT and the extension study, because not all patients who entered the original RCT went on to the extension study, the assumption of randomization is potentially violated. Consequently, the aim of the current study was to supplement the RPSFTM analysis with an alternative analysis (PSM+IPCW) that could account for the potential that the dropout is informative and to test the sensitivity of the results of the RPSFTM method.

## Methods

### Trial Design

The Cladribine Tablets Treating Multiple Sclerosis Orally (CLARITY) trial
• Giovannoni G.
• Comi G.
• Cook S.
• et al.
A placebo-controlled trial of oral cladribine for relapsing multiple sclerosis.
and subsequent extension study
• Comi G.
• Cook S.
• Rammohan K.
• et al.
Long-term effects of cladribine tablets on MRI activity outcomes in patients with relapsing-remitting multiple sclerosis: the CLARITY Extension study.
provide an example of a DMT trial with a complex follow-up study design. In the CLARITY trial patients were randomized 1:1:1 to receive low-dose (3.5 mg/kg) cladribine tablets (n = 433), high-dose (5.25 mg/kg) cladribine tablets (n = 437), or placebo (n = 437). Because the purpose of DMTs is to delay or prevent disability progression and relapses in MS patients, assessed outcomes were time to 3-month disability progression (3mCDP), 6-month disability progression (6mCDP), and first qualifying relapse (FQR). See the study of Giovannoni and colleagues
• Giovannoni G.
• Comi G.
• Cook S.
• et al.
A placebo-controlled trial of oral cladribine for relapsing multiple sclerosis.
for a full description of the CLARITY study methodology, including outcomes and patient inclusion and exclusion criteria.
After the completion of the initial 96 weeks of CLARITY, there was a gap period (median of 40.3 weeks) before the start of the CLARITY extension study.
• Comi G.
• Cook S.
• Rammohan K.
• et al.
Long-term effects of cladribine tablets on MRI activity outcomes in patients with relapsing-remitting multiple sclerosis: the CLARITY Extension study.
The CLARITY extension study consisted of 806 patients from the CLARITY study. Figure 1 provides an overview of the assignment of CLARITY patients to the CLARITY extension. Of the 437 patients randomized to receive placebo in CLARITY, 171 patients did not enroll in CLARITY extension, and 22 entered the extension for follow-up but did not receive treatment. The remaining 244 patients in the CLARITY placebo group (PP) who enrolled in the extension received low-dose cladribine (LL) in the extension study period (this cohort is labeled PP→LL). Of the 433 patients who received low-dose cladribine in CLARITY, 132 did not enroll in CLARITY extension, and 284 of the CLARITY low-dose group patients who enrolled in the extension study were re-randomized to receive either low-dose (LL→LL; n = 186 patients) or placebo (LL→PP; n = 98 patients) in the extension period. The remaining 17 patients entered the extension for follow-up but were not randomized to receive treatment.
Simple intention-to-treat (ITT) analyses of the extension study revealed similar outcomes for the PP→LL and LL→PP groups: 3mCDP (hazard ratio [HR] = 0.86 [95% confidence interval {CI} 0.40-1.83]), 6mCDP (HR = 0.88 [95% CI 0.40-1.93]), and FQR (HR = 0.53 [95% CI 0.26-1.10]). These findings suggest some degree of carryover of initial treatment with cladribine but do not speak to the long-term treatment effect of cladribine compared with placebo. Ideally, time to event outcomes for LL→PP could be compared with outcomes for PP→PP. In the absence of a placebo comparator that continued across both 96-week periods, it was necessary to create a counterfactual PP→PP arm by applying complex statistical methods to adjust for the treatment switching from placebo to low-dose cladribine (PP→LL). The creation of the counterfactual arm allowed the estimation of what would have happened if patients had remained on placebo during the extension study.

### Application of RPSFTM

RPSFTM was applied to the combined CLARITY plus CLARITY extension study datasets to estimate HRs for a treatment arm of low-dose cladribine in CLARITY followed by placebo in the extension study (LL→PP) compared with placebo in CLARITY and the extension study (PP→PP). The gap period time between the CLARITY and extension study was included as part of the time to event and was assumed to be part of the placebo (off-treatment) period in both arms. In the context of CLARITY and the extension study, the randomization assumption may have not been satisfied because not all patients who were included in the LL and PP groups of CLARITY chose to take part in the extension study. Consequently, for the results of the RPSFTM to be valid, it must be assumed that patients who did not experience an event of interest during CLARITY and who subsequently entered the extension study were representative of all patients who had not experienced an event of interest during CLARITY. In addition, the common treatment effect assumption is a strong assumption to make for several reasons. First, although the efficacy of low-dose treatment is expected to be sustained over a 4-year period for LL→PP patients, the treatment effect may not be perfectly maintained during the placebo phase of the trial. Second, patients receiving the treatment later may have less capacity to benefit from treatment. Finally, if dropout between CLARITY and the extension study was informative, patients who received cladribine in the extension study phase may have had a different capacity to benefit compared with the patients who were initially randomized to cladribine in CLARITY.
• Robins J.M.
• Tsiatis A.A.
Correcting for noncompliance in randomized trials using rank preserving structural failure time models.

### Application of PSM+IPCW

Owing to the limitations associated with the RPSFTM, the sensitivity of the estimated treatment effects was examined using an alternative adjustment method that did not require the same assumptions as RPSFTM. To relax the common treatment effect assumption, PSM was used to create samples of placebo and low-dose cladribine patients at CLARITY baseline. These samples closely resembled the characteristics, measured at baseline of the extension study, of the placebo group patients who went on to receive low-dose cladribine in the extension. Using these matched samples, an acceleration factor was estimated using a Weibull accelerated failure time model specific to this group of patients to represent the treatment effect of low-dose cladribine observed in CLARITY for this group. This acceleration factor was then used to shrink observed event times for those placebo group patients who received low-dose cladribine in the extension study and had not experienced an event of interest during CLARITY. This resulted in creation of a placebo counterfactual arm for the entire CLARITY plus CLARITY extension period. More specifically, counterfactual survival times for the PPLL group were obtained using Equation 1:
$Ui=TAi+TBiμB$
(1)

where $TAi$ represents time until first event while in CLARITY and the gap period, $TBi$ represents time until first event while in the extension study, and $μB$ is the estimated acceleration factor. We estimated HRs for the LL→PP versus PP→PP comparison using Cox proportional hazard models applied to the counterfactual dataset.
Although this analysis does not rely on the common treatment effect assumption, it does rely on the conditional independence assumption (CIA) and common support assumption.
• Caliendo M.
• Kopeinig S.
Some practical guidance for the implementation of propensity score matching.
• Rosenbaum P.R.
• Rubin D.B.
The central role of the propensity score in observational studies for casual effects.
The CIA states that, conditional on a set of variables, the outcome of interest is independent of treatment.
• Caliendo M.
• Kopeinig S.
Some practical guidance for the implementation of propensity score matching.
• Rosenbaum P.R.
• Rubin D.B.
The central role of the propensity score in observational studies for casual effects.
According to the CIA, if all prognostic characteristics are controlled for by matching patients who received low-dose cladribine in CLARITY with patients who received low-dose cladribine in the extension study, there will be no difference in the observed event times between the two groups. Hence, the assumption is essentially equivalent to the no unmeasured confounders assumption required by the IPCW. In most situations this assumption is untestable because we do not usually have comparison groups that both receive the same treatment. Nevertheless, in this case, we are able to test the validity of this assumption by comparing outcomes in patients who received low-dose cladribine in the extension study to outcomes in patients who received low-dose cladribine in CLARITY; an HR close to 1 would indicate the CIA holds. For the CLARITY placebo-matched group, this assumption cannot be directly tested because the extension counterfactual for placebo is unobserved. Nevertheless, the balance of the characteristics using standardized differences and likelihood ratio tests from logistic regressions on this matched sample can be assessed, and these would provide an indication of how well the matching performed.
The common support assumption requires that for each patient who was treated with low-dose cladribine in the extension study and did not experience an event of interest in CLARITY, there is a patient in the CLARITY comparison group with similar characteristics. Hence, it is not possible to perform matching if all patients in the low-dose cladribine group in the extension study are fundamentally different, in terms of a characteristic that could affect prognosis, from all patients at CLARITY baseline. This assumption may be challenging to satisfy given that patients who entered CLARITY were required to have had at least 1 relapse within the previous 12 months, whereas by definition, the patients in our CLARITY extension subsample had not previously experienced an event for at least 96 weeks. This difference could impact a patient’s capacity to benefit from treatment. Nevertheless, if the HR of low dose in extension versus low dose in CLARITY is close to 1, we can be reasonably confident that the capacity to benefit is not greatly affected by this difference.
To ensure that the estimated acceleration factor closely reflected the treatment effect received by extension group patients, a range of different matching algorithms was applied. To identify the preferred application of matching for each endpoint, the validity of the CIA and covariate balance of the matched samples was assessed. The number of unmatchable patients in the extension group sample was also examined, and the maximum weights were applied to patients to assess whether the matched sample relied on few repeated observations. More details on these assessments are provided in the supplementary materials.
The proportion of patients who dropped out between the CLARITY and extension studies is a threat to meeting the randomization assumption. Although this assumption is intrinsic to the RPSFTM, the PSM adjustment method does not rely on it to obtain the adjustment factor used to estimate counterfactual survival times in PP→LL patients. Nevertheless, estimates of the HR using the PSM adjustment method may still, potentially, be subject to bias from informative dropout. To mitigate this potential risk, the inverse probability of censoring weights (IPCW) was used to adjust for potentially informative dropout. The IPCW method weights patients who enrolled in the extension study and those with similar characteristics to be similar to those who did not enroll in the extension study, thus removing bias from informative dropout.
• Robins J.M.
• Finkelstein D.M.
Correcting for noncompliance and dependent censoring in an AIDS Clinical Trial with inverse probability of censoring weighted (IPCW) log-rank tests.
The IPCW method relies on a “no unmeasured confounders” assumption, which requires that all relevant prognostic characteristics are included in the model.
• Robins J.M.
• Finkelstein D.M.
Correcting for noncompliance and dependent censoring in an AIDS Clinical Trial with inverse probability of censoring weighted (IPCW) log-rank tests.
The covariates included in the IPCW and PSM models were age, sex, region, time since first attack, prior use of any DMTs, expanded disability status scale (EDSS), T1 Gd-enhancing lesion volume, T1 hypointense lesion volume, T2 lesion volume, binary indicators of the number of T1 Gd-enhancing lesions, and the number of T1 hypointense lesions (1 if 10 or more, and 0 if otherwise). All analyses were performed in Stata 13 (StataCorp LP. Stata Statistical Software, Version 13.0, Texas).

## Results

The time-to-event results for 3mCDP, 6mCDP, and FQR are presented in Table 1. HRs for the RPSFTM, PSM-adjusted, and PSM+IPCW-adjusted approaches are presented in addition to ITT (LL→PP vs PP→LL) and CLARITY ITT HRs. The ITT (LL→PP vs PP→LL) represents a comparison of the observed trial data without adjustment for treatment switching. All treatment switching adjustment analyses produced numerically lower HRs than the ITT (LL→PP vs PP→LL) analysis.
Table 1ITT and treatment switching adjusted HRs
MethodHR
Point estimateLower 95% CIUpper 95% CI
Time to 3-month progression
ITT (LLPP vs PPLL)0.670.520.87
CLARITY ITT (LL vs PP)0.600.410.87
RPSFTM treatment group no re-censoring (LLPP vs PPPP)0.620.460.84
PSM (LLPP vs PPPP)0.600.440.83
PSM + IPCW (LLPP vs PPPP)0.610.430.86
Time to 6-month progression
ITT (LLPP vs PPLL)0.670.500.90
CLARITY ITT (LL vs PP)0.580.400.83
RPSFTM treatment group no re-censoring (LLPP vs PPPP)0.620.440.88
PSM (LLPP vs PPPP)0.620.400.84
PSM + IPCW (LLPP vs PPPP)0.630.400.87
Time to first qualifying relapse
ITT (LLPP vs PPLL)0.530.430.67
CLARITY ITT (LL vs PP)0.440.340.58
RPSFTM treatment group no re-censoring (LLPP vs PPPP)0.480.360.62
PSM (LLPP vs PPPP)0.480.370.63
PSM + IPCW (LLPP vs PPPP)0.470.380.63
CI indicates confidence interval; CLARITY, Cladribine Tablets Treating Multiple Sclerosis Orally (trial); HR, hazard ratio; IPCW, inverse probability of censoring weights; ITT, intention to treat; LL, low-dose cladribine in CLARITY; LLPP, low-dose cladribine in CLARITY followed by placebo in CLARITY extension; NN, nearest-neighbor; PP, placebo in CLARITY; PPLL, placebo in CLARITY followed by low-dose cladribine in CLARITY extension; PPPP, placebo in CLARITY and CLARITY extension (counterfactual arm); PS, propensity score; RPSFTM, rank preserving structural failure time model.
For 3mCDP, the RPSFTM results indicate an LL→PP versus PP→PP HR of 0.62 (95% CI 0.46-0.84) over the entire CLARITY plus CLARITY extension period. For 6mCDP, the RPSFTM estimated an LL→PP versus PP→PP HR of 0.62 (95% CI 0.44-0.88), and for FQR, an LL→PP versus PP→PP HR of 0.48 (95% CI 0.36-0.62).
The results of the preferred PSM and PSM+IPCW analyses are presented with bootstrapped confidence intervals. For 3mCDP, the preferred PSM adjustment method resulted in an LLPP versus PPPP HR of 0.60 (95% CI 0.44-0.83), for 6mCDP, an LLPP versus PPPP HR of 0.62 (95% CI 0.40-0.84), and for FQR, an LLPP versus PPPP HR of 0.48 (95% CI 0.37-0.63). Each of these point estimates is similar in magnitude to the RPSFTM estimates.
To adjust for potential informative dropout owing to nonenrollment in the extension study after CLARITY, IPCW to the PSM was applied to the preferred PSM analyses. The results indicate that the PSM+IPCW HR point estimates were very similar to the HRs for the PSM adjustment and RPSFTM for each end point.

## Discussion

Adjusting for treatment switching involves substantial methodological uncertainty because the different adjustment methods can be applied in different ways. Decision makers may be reluctant to use results of standalone adjustment analyses owing to concern that the adjustment analyses reported by manufacturers might be those that produce the most favorable results for the experimental treatment.
• Latimer N.R.
Treatment switching in oncology trials and the acceptability of adjustment methods.
In this study, alternative adjustment methods have been used to validate the results and assumptions of a commonly used treatment switching adjustment method (ie, RPSFTM).
The application of treatment switching adjustment methods in the case of CLARITY and CLARITY extension is complicated because the switching mechanism differs from the standard case, and there was dropout between the end of CLARITY and enrolment in CLARITY extension. To address risks to the common treatment effect and randomization assumptions associated with the RPSFTM, alternative PSM and PSM+IPCW adjustment methods, which rely on different assumptions, were used. The PSM and PSM+IPCW applications that performed most successfully in terms of matching algorithm performance produced results that were very similar to those from the RPSFTM analyses for each of the three outcomes of interest. The estimated HRs did not differ by more than 0.02 for any outcome, and further confidence in the results was provided by the similarity of the adjusted CLARITY plus CLARITY extension HRs with the CLARITY ITT HRs.
The PSM-adjusted analyses primarily relied on the CIA. The HRs derived by comparing events in the “LL” period in the “PP→LL” group to events experienced in the matched CLARITY LL sample were used to test the validity of the CIA for the matched samples. For 3-month progression and 6-month progression, some of the matching techniques that were tested resulted in CLARITY LL–matched samples that could closely replicate the time-to-event outcomes in the “PP→LL” group. Nevertheless, it should be noted that the performance in this measure was not consistent across all matching techniques. This lack of consistency suggests that the CIA may have not been fully met. One difference that could not be controlled for between CLARITY baseline and CLARITY extension baseline arises from the fact that only patients with one or more relapse within the previous 12 months entered CLARITY. By definition, patients in the PP→LL sample of interest had not experienced an event for at least 96 weeks at extension baseline, and hence this could represent an important difference between the samples.
Although the limitations in the matching attempts indicate that there may be some bias in the results of the PSM analysis, it is important to note that for the two disability progression outcomes, the matching did appear to perform well. For the third outcome, time to first relapse, we would expect that the bias would likely cause an underestimation of the treatment benefit of low-dose cladribine compared with placebo; a comparison of the effectiveness of cladribine in the PP→LL arm with the LL arm from CLARITY indicates that the treatment effect may have actually been higher in switchers than in the group initially randomized to cladribine, even when these groups are matched on observable characteristics at extension and baseline, respectively. Nevertheless, this finding should be interpreted with caution because matching may be imperfect.
Although the PSM analysis accounts for potential differences in the treatment effect in those who enrolled into the extension study compared with those who were originally randomized into CLARITY, informative censoring from the dropout of patients who did not enroll in the extension study could remain an issue when estimating the LLPP versus PPPP treatment effect over the entire CLARITY plus extension time period. Therefore, the IPCW was applied to the counterfactual datasets of our preferred matched analyses to adjust for informative censoring. For each end point, the HRs from our preferred IPCW analyses differed from the analyses that were not adjusted for informative censoring by only 0.01. This indicates that censoring at the end of CLARITY and the start of the extension study did not cause substantial bias in the analyses, assuming that all important confounders were controlled for in the model. The IPCW relies on an untestable “no unmeasured confounders” assumption. In an attempt to include all relevant confounders, the models incorporated data on all characteristics assumed to affect prognosis. The maximum stabilized weights derived from the models were relatively low, and hence it is unlikely that the analysis would have been biased because of the application of a large weighting to a time point for a patient that is nonrepresentative of the sample in general. Another potential source of bias arises from the fact that time-dependent data on prognostic characteristics were not collected during the gap period. For the patients who had not experienced an event before the end of CLARITY, the last observation was carried forward until the next observation at the start of the extension. For those who experienced an event during the gap, the last observation was carried forward until an event occurred. This could explain the relatively low stabilized weight because there were a number of time points during the gap period in which variables have values that did not change even though the values may have changed in reality. This missing information is a limitation of the analysis, but given the data available, our analyses advance the best attempt to address the potential for informative censoring owing to noninformative dropout.
In conclusion, this study applied a novel PSM+IPCW method to adjust for treatment switching in the context of a trial combined with an extension study, with less than full enrollment and in which control group patients switch treatments in the extension. The study has shown how to systematically assess different PSM approaches and how to identify which applications appear to be most appropriate. The end results are similar to those from the RPSFTM, which is helpful for decision makers who may be unsure about the validity of a particular method or assumptions. The similarity of the adjusted HRs indicates that the RPSFTM results are not substantially affected by bias from informative dropout, assuming that all relevant patient characteristics were incorporated into the IPCW model and that the results were not affected by data collection limitations during the gap period. In addition, comparisons of the adjusted CLARITY plus CLARITY extension HRs with CLARITY HRs indicate that there is no statistical evidence for waning of the 3.5 mg/kg dose cladribine treatment effect during the extension period.

## Acknowledgment

This study was funded by EMD Serono, Inc (a business of Merck KGaA, Darmstadt, Germany).

• Appendix

## References

• Latimer N.R.
• Abrams K.R.
• Amonkar M.M.
• Stapelkamp C.
• Swann R.S.
Adjusting for the confounding effects of treatment switching—The BREAK-3 Trial: dabrafenib versus dacarbazine.
Oncologist. 2015; 20: 798-805
• Latimer N.R.
• Bell H.
• Abrams K.R.
• Amonkar M.M.
• Casey M.
Adjusting for treatment switching in the METRIC study shows further improved overall survival with trametinib compared with chemotherapy.
Cancer Med. 2016; 5: 806-815
• Latimer N.R.
• Henshall C.
• Siebert U.
• Bell H.
Treatment switching: statistical and decision-making challenges and approaches.
Int J Technol Assess Health Care. 2016; 32: 160-166
• Latimer N.R.
• Abrams K.R.
NICE DSU Technical Support Document 16: Adjusting Survival Time Estimates in the Presence of Treatment Switching.
Decision Support Unit, Sheffield2014
• Robins J.M.
• Tsiatis A.A.
Correcting for noncompliance in randomized trials using rank preserving structural failure time models.
Comm Stats Theory Method. 1991; 20: 2609-2631
• Caliendo M.
• Kopeinig S.
Some practical guidance for the implementation of propensity score matching.
J Econ Surv. 2015; 22: 31-72
• Rosenbaum P.R.
• Rubin D.B.
The central role of the propensity score in observational studies for casual effects.
Biometrika. 1983; 70: 41-55
• Giovannoni G.
• Comi G.
• Cook S.
• et al.
A placebo-controlled trial of oral cladribine for relapsing multiple sclerosis.
N Engl J Med. 2010; 362: 416-426
• Comi G.
• Cook S.
• Rammohan K.
• et al.
Long-term effects of cladribine tablets on MRI activity outcomes in patients with relapsing-remitting multiple sclerosis: the CLARITY Extension study.
Ther Adv Neurol Disord. 2018; 11 (1756285617753365)
• Weinshenker B.G.
Epidemiology of multiple sclerosis.
Neurol Clin. 1996; 14: 291-308
• Goodin D.S.
• Frohman E.M.
• Garmany Jr., G.P.
• et al.
Disease modifying therapies in multiple sclerosis: report of the Therapeutics and Technology Assessment Subcommittee of the American Academy of Neurology and the MS Council for Clinical Practice Guidelines.
Neurology. 2002; 58: 169-178
• Robins J.M.
• Finkelstein D.M.
Correcting for noncompliance and dependent censoring in an AIDS Clinical Trial with inverse probability of censoring weighted (IPCW) log-rank tests.
Biometrics. 2000; 56: 779-788
• Latimer N.R.
Treatment switching in oncology trials and the acceptability of adjustment methods.
Expert Rev Pharmacoecon Outcomes Res. 2015; 15: 561-564