ML3 LASSO (Least Absolute Shrinkage and Selection Operator) and XGBoost (eXtreme Gradient Boosting) Models for Predicting Depression-Related Work Impairment in US Working Adults


      Work productivity loss among adults with depression are associated with multiple patient characteristics. The current study examined predicted total work impairments as a result of absenteeism and presenteeism using regularized linear regression and decision-tree-based ensemble algorithm.


      Data on employed US adults (18-64 years old) were analyzed from the 2019 National Health and Wellness Survey. Analysis sample included respondents who self-reported diagnosis of depression or having experienced depression in the past 12 months. Work productivity loss was derived from Work Productivity and Activity Impairment questionnaire. Group LASSO with Nesterov’s method and XGBoost regression were used separately to predict work impairments and to extract model feature importance views. Given the count-like nature of productivity loss, poisson distribution was specified in both LASSO and XGBoost. Variable selection was based on model fit statistics Akaike Information Criterion (AIC) (LASSO) and the gain in feature importance (XGBoost). Forty variables on respondent demographics, health behavior (e.g., smoking and alcohol use), depression-related variables, comorbidities, and doctor visits were entered into both models. Data was split into training, validation, and testing datasets. Hyperparameters were tuned based on the validation data. Root mean squared errors (RMSE) for the testing data were compared to assess model performance.


      Among 11,478 working adults with depression, XGBoost made more accurate predictions compared with LASSO (RMSE=26.6 and 27.6, respectively). Overestimation of impairment was slightly greater in the LASSO model compared with that from XGBoost (mean impairment=33% and 30%, respectively). The LASSO model selected more demographic and health behavior variables than XGBoost which ranked comorbidity variables (arthritis, sleep conditions, migraine, liver or renal diseases) as the most important features in predicting productivity loss.


      In a broadly representative US population of working adults with depression, XGBoost model was found to better predict productivity loss compared with LASSO.