Objectives
This study implements supervised machine learning (ML) to predict mortality in COVID-19 patients and determine the important features in this prediction.
Methods
Patients were selected from a large US electronic health records database (Cerner Real-World Data) that contains over 87 million patients. We investigated the first in-patient visit for patients with a COVID-19 diagnosis and lab results identified in the database, and with a length of stay of at least 24 hours, non-missing gender, and age between 18 and 90 years. Patient characteristics, hospital characteristics, Charlson Index, quick sequential organ failure assessment (qSOFA), treatments (e.g., mechanical ventilation) and lab values (e.g., minimum white blood cell count) were included in this analysis. Several ML algorithms were compared through 10-fold cross validation. The best performing algorithm was tuned and evaluated with a test dataset. Feature importance was extracted from the final model through permutation importance.
Results
There were 55,045 patients included in this study. The ML algorithms that were compared included (mean cross-validation accuracy ± cross-validation standard deviation): logistic regression (78.3% ± 0.4%); random forests (87.4% ± 0.5%); extreme gradient boosting (XGBoost) (88.1% ± 0.5%); and support vector machines (83.1% ± 0.4%). XGBoost was selected for the final model, which after hyperparameter tuning, had a prediction accuracy of 88.3%. The final model had an F1 score of 0.57, an area under the receiver operator characteristic curve (AUC ROC) of 0.90, a precision of 0.65, and a recall of 0.50. The top five most important features in this prediction were mechanical ventilation, age, minimum white blood cell count, qSOFA, and maximum temperature.
Conclusions
Supervised ML was able to perform well in predicting mortality in COVID-19 patients, while identifying the most important features in prediction. Similar ML algorithms may identify higher risk COVID-19 patients earlier in the hospital for additional monitoring and treatment consideration.
Article info
Identification
Copyright
© 2021 Published by Elsevier Inc.
User license
Elsevier user license | How you can reuse
Elsevier's open access license policy

Elsevier user license
Permitted
For non-commercial purposes:
- Read, print & download
- Text & data mine
- Translate the article
Not Permitted
- Reuse portions or extracts from the article in other works
- Redistribute or republish the final article
- Sell or re-use for commercial purposes
Elsevier's open access license policy