P42 Comparison of Estimation Methods for Single-Arm Trials in Rare Diseases with Historical Control Groups


      Randomized controlled trials are the gold standard for estimating treatment effects. However, in rare diseases with high unmet need, single-arm trials are used when randomization of patients to placebo or standard of care is infeasible or unethical. We evaluated various methods to control for confounding in estimating treatment effects in a small single-arm trial with a historical comparator group.


      We used simulation to evaluate different techniques to estimate the “true” treatment effect on overall survival (OS) and objective response rate (ORR) in a specific target population using an external comparator design. We varied effect size, sample size, confounders, and correlation between confounders. We assessed two broad categories of methods: i) requiring specification for treatment allocation [propensity score (PS)-based inverse probability of treatment weighting (IPTW), standardized mortality/morbidity ratio (SMR), stabilized IPTW (SIPTW), overlap weighting (OW), stratification, and matching], and ii) adding outcome information (g-computation). Their precisions and accuracies were evaluated by a combination of 95% confidence interval (CI) coverage, power, bias, and mean square error (MSE).


      G-computation resulted in the most accurate and precise estimator of OS (95% CI coverage: 93.5%, power: 69.3%, bias: -0.001, MSE: 0.055) in a small sample size scenario of 30 treated subjects compared with 120 comparator subjects. Similar results were observed for ORR. In comparison, results for OS were: 95% CI coverage: 72.8%, 65.6%; power: 75.9%, 62.6%; bias: -0.026, 0.072; MSE: 0.114, 0.167 for SMR and IPTW, respectively.


      In our simulated example, the g-computation estimator performed best to control confounding in a small single-arm trial with an external comparator group. PS based methods (e.g., SMR & IPTW) may be suitable as an initial step in the creation of the comparator arm when researchers are blind to the outcome, while g-computation can be subsequently used to estimate the efficacy of treatment.