ML2 Supervised Machine Learning Predicts Mortality in COVID-19 Patients Using Electronic Health Records


      This study implements supervised machine learning (ML) to predict mortality in COVID-19 patients and determine the important features in this prediction.


      Patients were selected from a large US electronic health records database (Cerner Real-World Data) that contains over 87 million patients. We investigated the first in-patient visit for patients with a COVID-19 diagnosis and lab results identified in the database, and with a length of stay of at least 24 hours, non-missing gender, and age between 18 and 90 years. Patient characteristics, hospital characteristics, Charlson Index, quick sequential organ failure assessment (qSOFA), treatments (e.g., mechanical ventilation) and lab values (e.g., minimum white blood cell count) were included in this analysis. Several ML algorithms were compared through 10-fold cross validation. The best performing algorithm was tuned and evaluated with a test dataset. Feature importance was extracted from the final model through permutation importance.


      There were 55,045 patients included in this study. The ML algorithms that were compared included (mean cross-validation accuracy ± cross-validation standard deviation): logistic regression (78.3% ± 0.4%); random forests (87.4% ± 0.5%); extreme gradient boosting (XGBoost) (88.1% ± 0.5%); and support vector machines (83.1% ± 0.4%). XGBoost was selected for the final model, which after hyperparameter tuning, had a prediction accuracy of 88.3%. The final model had an F1 score of 0.57, an area under the receiver operator characteristic curve (AUC ROC) of 0.90, a precision of 0.65, and a recall of 0.50. The top five most important features in this prediction were mechanical ventilation, age, minimum white blood cell count, qSOFA, and maximum temperature.


      Supervised ML was able to perform well in predicting mortality in COVID-19 patients, while identifying the most important features in prediction. Similar ML algorithms may identify higher risk COVID-19 patients earlier in the hospital for additional monitoring and treatment consideration.