Machine learning (ML) approaches can extract clinically relevant information from electronic health records (EHRs) to be used for research purposes, such as comparative effectiveness analyses. This study assessed the effects of misclassification error in ML-extracted clinical variables when used in statistical analyses.
We selected a cohort of 2,948 patients with advanced NSCLC treated with one of two common second line monotherapies from the nationwide Flatiron Health EHR-derived de-identified database. Focusing on smoking and PD-L1 status information extracted from free-text EHR notes, we analyzed the performance of an ML approach against manual abstraction (reference). We fit a Cox proportional hazards model to estimate overall survival (OS) hazard ratios (HRs) between treatments in cohorts reweighted by propensity scores based on a set of confounders (gender, histology, advanced diagnosis age, first-line treatment class, stage, smoking status, and PD-L1 status). We performed sensitivity analyses by corrupting abstracted labels at varying error rates.
Using manually abstracted PD-L1 and smoking status to estimate propensity scores, the HR (95% CI) of treatment A vs B was 0.797 (0.686, 0.911). Using ML-extracted PD-L1 and ML-extracted smoking status, the HR increased slightly, 0.839 (0.721, 0.968). Using ML-extracted PD-L1 and manually abstracted smoking status the HR was 0.848 (0.725, 0.971), and using ML-extracted smoking status and manually abstracted PD-L1 the HR was 0.790 (0.692, 0.896). In a sensitivity analysis, errors introduced into smoking status did not affect HR estimates, though errors in PD-L1 did.
The impact of using ML-extracted instead of manually-abstracted variables is potentially greater for strong confounding variables (i.e., PD-L1 as opposed to smoking). This argues for using downstream analyses as a way to validate ML-extracted variables, as impact on analytical results cannot be inferred by standard ML performance metrics alone.
© 2021 Published by Elsevier Inc.