If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Health Services Management Department, Guizhou Medical University, Guiyang, ChinaCenter of Medicine Economics and Management Research, Guizhou Medical University, Guiyang, China
In the past 3 decades, the modeling of EQ-5D value sets has been using the 20-parameter main-effects model.
•
Using the out-of-sample cross-validation method, we found that the cross-attribute level effects constantly outperform the additive model in modeling the original 5-level version of EQ-5D (EQ-5D-5L) and 2 types of modified EQ-5D-5L health state values.
•
This study used a small composite time trade-off design, which works well with the cross-attribute level effects model and should be considered lower the costs for future EQ-5D-5L valuation study.
Abstract
Objectives
Cross-attribute level effects (CALE) model has demonstrated better predictive accuracy for out-of-sample health states than the conventional additive main-effects model in cross-validation analysis of the 5-level version of EQ-5D (EQ-5D-5L) composite time trade-off (cTTO) datasets. In this study, we aimed to further test the performance of CALE model using a different design and modified EQ-5D-5L states.
Methods
A total of 29 EQ-5D-5L self-care bolt-off states, 30 EQ-5D-5L states, and 31 EQ-5D-5L vision bolt-on states were selected from the same orthogonal array. A total of 600 university students were interviewed face-to-face to value a subset of these health states using the cTTO method. For each type of health state, we fitted both the conventional main-effects model and the CALE model. Predictive accuracy was assessed in a series of cross-validation analysis using the leave-one-state-out method.
Results
Overall, the CALE model outperformed the conventional model for each of the 3 types of health states in predicting the cTTO values of out-of-sample health states. The prediction accuracy of using the CALE model improved with the number of dimensions in health states, for example, the MAE decreased about 24%, 67%, and 77% for the EQ-5D-5L self-care bolt-off, EQ-5D-5L, and EQ-5D-5L vision bolt-on states, respectively, when using CALE models.
Conclusion
Our study supported the strengths of the CALE model for modelling the utility values of both original and modified EQ-5D-5L health states. Investigators with limited resources may consider using the CALE model to lower the costs for their valuation studies for EQ-5D-5L or similar health state descriptive systems.
The use of EQ-5D in economic evaluation requires a value set, namely, utility values for all health states defined by the instrument. Such a value set is derived using a 2-step approach: first, a subset of the health states is directly valued by members of the general public, and second, the observed values are modeled to predict values for all health states.
In the case of 5-level version of EQ-5D (EQ-5D-5L), the default main-effects model consists of 20 parameters, 4 dummy variables per dimension representing the 4 different levels of problems beyond level 1 (reference level).
The CALE model contains the same main effects in the conventional model but those are specified with fewer parameters. For example, 8 parameters can be used to specify the 20 main effects for modeling EQ-5D-5L values, including 5 dimension parameters, each for the level-5 problems of a different dimension, and 3 level parameters for the 3 intermediate levels: “slight” (level-2), “moderate” (level-3), and “severe” (level-4) problems. With the assumption that the ratios of the effects of the levels within a dimension are constant across dimensions, the CALE model specifies the 20 main effects in the conventional model with either a dimension parameter or the multiplication of a dimension parameter and a level parameter. In an analysis of the EQ-5D-5L valuation data collected from China, The Netherlands, Spain, and Singapore,
the 8-parameter CALE model outperformed the conventional 20-parameter main-effects model in predicting the values of out-of-sample EQ-5D-5L health states.
Although the CALE model is novel and promising, the favorable results observed in the previous study could be specific to the valuation study design, particularly, the choice of the health states for valuation. All the 4 EQ-5D-5L valuation studies used the same protocol to value the same set of 86 health states using the composite time trade-off (cTTO) method. It is possible that the CALE model would not outperform the conventional model in out-of-sample predictions, had a different set of EQ-5D-5L health states been valued and used for cross-validation analysis. Moreover, the superior out-of-sample prediction could be specific to the (number of) dimensions of the EQ-5D health states. Last but not least, the CALE model may allow estimating an EQ-5D value set with smaller amount of data; nevertheless, this potential was not evaluated in the previous study. The currently recommended sample size of at least 1000 individuals for an EQ-5D-5L valuation study is estimated for the case of using the conventional 20-parameter model.
Oppe M, Hout B. The ‘power’ of eliciting EQ-5D-5L values: the experimental design of the EQ-VT; 2017, EuroQol Working Paper Series 17003. https://euroqol.org/publications/working-papers/. Accessed December 1, 2022.
Extant EQ-5D-5L valuation studies typically used 5 to 10 interviewers. The large general population sample and the manpower needed for collecting EQ-5D valuation data could deter countries with limited resources to develop their own value sets. Therefore, a smaller valuation study design that allows a value set to be estimated with a lower budget would be desirable.
In this study, we aimed to further test the performance of the CALE model relative to the conventional main-effects model. In particular, we compared the 2 models in modeling the cTTO values of both original and modified EQ-5D-5L health states. We tested 2 types of modified EQ-5D health states, one comprising 6 dimensions, that is, the EQ-5D-5L plus a vision dimension (hereafter referred to as “EQ + VI”), and the other comprising 4 dimensions, that is, the EQ-5D-5L without the self-care dimension (hereafter referred to as “EQ − SC”). Vision has been confirmed to be a useful bolt-on dimension,
The choice of the dimensions for bolt-on and bolt-off was primarily for assessing the potential of the CALE model for health state descriptive systems that are simpler or more complex that the original EQ-5D system.
Methods
Experimental Design
We designed a 2-arm valuation study. The first arm was for participants to value 10 EQ − SC health states followed by 10 or 11 EQ-5D-5L states; the second arm was for participants to value 10 or 11 EQ-5D-5L states followed by 11 EQ + VI health states. EQ − SC health states were developed by simply removing the self-care descriptor and the descriptor of vision bolt-on in the EQ + VI followed the descriptors of the mobility, self-care, and usual activities, that is “no/slight/moderate/severe problems” seeing for the first 4 levels and “unable to” for the fifth level.
We used an existing orthogonal array of 6 × 5 to select 29 EQ − SC, 30 EQ-5D, and 31 EQ + VI states for valuation in this study.
This orthogonal array has 6 columns, each representing a different dimension; therefore, the array could be used to select EQ + VI states. To derive the EQ − SC states, we took out the second and last columns of the orthogonal array. Similarly, to derive the EQ-5D states, we took out the last column of the orthogonal array. The orthogonal array remains orthogonal when one or more columns are taken out. Given that the columns in an orthogonal design are open to code, we chose one variant of the design which contained the state 55555 and the most plausible states.
We blocked the EQ − SC, EQ-5D, and EQ − SC states each into 3 blocks using Alg-algorithm in R studio software, with the numbers of states per block being 10 for EQ − SC, 10 or 11 for EQ-5D, and 11 for EQ + VI. We then combined the 3 EQ − SC blocks with the 3 EQ-5D blocks to create blocks 1, 2, and 3 for valuation in the first arm. Similarly, we created block 4, 5, and 6 by combining the EQ-5D and EQ + VI blocks for valuation in the second arm (see all health state blocks in Appendix 1 in Supplemental Materials found at https://doi.org/10.1016/j.jval.2022.12.012). Each block has 20 to 22 EQ-5D health states. The first arm of 300 respondents valued the 10 EQ − SC + 10/11 EQ-5D, and the second arm of another 300 respondents valued the 10/11 EQ-5D + 11 EQ + VI. The respondents were reminded of the added dimension by interviewers.
Participants
We used a student sample drawn from Guizhou Medical University, China. First- and second-year students from the Schools of Public Health, Health and Medicine Management, Stomatology, and Medical Humanities were recruited using advertisement. The inclusion criteria were those who (1) gave informed consent and (2) did not participate in a cTTO valuation interview before. Following sample size considerations put forward by Oppe et al,
Oppe M, Hout B. The ‘power’ of eliciting EQ-5D-5L values: the experimental design of the EQ-VT; 2017, EuroQol Working Paper Series 17003. https://euroqol.org/publications/working-papers/. Accessed December 1, 2022.
we targeted a sample size of 600 participants with 300 for each study arm.
The Interviews
All consenting participants were interviewed in a designated interview room in the campus to value the health states. The interviews were conducted using the EQ-PVT tool, a Microsoft PowerPoint-based version of EQ-VT v1.1. Each participant was paid 100 RMB (equivalent to 14 euros) upon completion. The Institutional Ethic Review Board of the Guizhou Medical University did not consider an ethic application necessary given that this study as was a non-interventional, interview-based study.
All the interviews consisted of 3 sections. First, the study background was introduced. Second, the participants provided personal background information and reported their health states using the EQ-5D or EQ − SC descriptive system. Third, participants were randomized to value 1 block of the EQ − SC and EQ-5D states (study arm 1) or 1 block of the EQ-5D and EQ + VI states (study arm 2). Before start, each participant was familiarized with the cTTO valuation tasks by valuing 3 hypothetical health states (ie, being in wheelchair, a health state worse than being in wheelchair, and a health state better than being in wheelchair) and 3 EQ − SC or EQ-5D practice states. The cTTO valuation task was described in detail elsewhere.
Briefly, health states perceived as better than dead by respondents is valued with a 10-year time frame, whereas health states considered worse than dead is values with a 20-year time frame (ie, 10 years in full health followed by 10 years in the health state for valuation). For each health state, an iterative elicitation procedure is used to identify the indifferent point between life A (ie, x years of full health followed by death) and life B (ie, 10 years in the health state for valuation followed by death or followed by 10 years in full health and then death). The value of the health state is given by x/10 for better-than-dead states or (x − 10)/10 for worse-than-dead states.
All interviews were conducted by 8 trained interviewers including 3 graduate students and 5 third-year undergraduate students. All the interviewers had conducted cTTO valuation tasks using EQ-PVT in a previous study and were trained again before data collection in this study. Interviews were conducted in accordance with current EuroQol guidelines for quality control (QC) in EQ-VT studies. For every 10 consecutive interviews completed by an interviewer, a QC report was generated by Z.Y. to indicate possible protocol violation and provide individualized suggestions for improvement.
Data Analysis
By arm and type of health states, we fitted the cTTO data into both the conventional and CALE models, including the 16-, 20-, and 24-parameter models for EQ − SC, EQ-5D, and EQ + VI and the 7-, 8-, and 9-parameter CALE models for EQ − SC, EQ-5D, and EQ + VI, respectively. Below we showed the formula of the random effects conventional additive model (formula 1) and CALE model (formula 2) for EQ + VI health states as a reference. To account for the left-censored at −1 nature of cTTO values, all models were censored at −1 by using a tobit estimator. For each model specification, we applied a fixed intercept model for all participants and a random intercept model with random intercepts at the level of individual study participants to account for the panel nature of the data.
(1)
(2)
ui ∼ iid N(0, ) is a individual level random intercept; εij ∼ iid N(0, ) is an error term.
The interpretation of the additive model is described in detail in published EQ-5D valuation studies. In the CALE model, each dimension has 1 dimension parameter. For example, 5 dimension parameters were used to represent the disutility of having problems at level 5 on each dimension (βMO, βSC, βUA, βPD, and βAD) for EQ-5D health state; a set of scalars L2-L4 (ie, level parameters) was estimated to identify where the cutoffs of the intermediate levels were located for all dimensions. iid is independent and identically distributed, ε is the error term, i is the respondent, and j accounts for the panel structure of the data set. To calculate the decrement of a dimension at certain level, one needs to multiply the level parameter with the dimension parameter. For example, the disutility of level 2 on mobility was βMO × L2 and the disutility of level 5 on mobility was βMO.
Next, we used a cross-validation procedure similar to the one used in a previous study of CALE models
to compare the performance of the CALE and conventional models in prediction of out-of-sample health state values. This procedure sequentially left out part of the data for out-of-sample validation and used the remainder for model estimation. We performed 2 cross-validation analyses by leaving out (1) 1 state and (2) 1 block of health states sequentially for all states and blocks. To compare prediction accuracy, we calculated the mean absolute errors (MAE), root mean squared errors (RMSE), Pearson product-moment correlation (Pearson R), Intraclass correlation coefficient and Lin’s concordance correlation coefficient using observed and model predicted values for the left-out states part. The observed mean was also calculated using tobit estimator. All analyses were conducted in R studio and Stata. As in the cross-validation process, there are multiple models; we chose to report the coefficients of full models (all data were used) for reference.
Results
In total, 601 students participated in the study and finished the valuation tasks. On average, the participants spent 36.8 minutes (SD: 10.8) on all the cTTO tasks including 6 practice tasks and considered 6.8 life durations (SD: 2.0) before reaching the indifferent point in the tasks. Figure 1 shows the histograms of the cTTO values by arm and type of health states. In general, the value distributions are similar between arms and types of health states. In all histograms, there is a “spike” at −1, suggesting that some values may be lower than −1 if the cTTO procedure had been designed to allow such values.
Figure 1cTTO value distributions by arms and types of health states.
All coefficients were significant at 0.05 level for CALE models (Table 1). For conventional additive models (Table 2), there were some insignificant coefficients, which all occurred on second level, except for the coefficient of the third level vision, which is also the only nonmonotonic coefficient. From Table 1, it can be seen these 2 arms of students had slightly different preference; for example, the rank order of dimension weights was different across 2 arms. Nevertheless, between 2 sets of coefficients within each arm, there was an agreement in terms of rank order of dimension weight and level weight similarity.
Table 1Model coefficients of CALE models.
Parameters
First arm
Second arm
EQ − SC, n = 2999
EQ-5D_1, n = 3200
EQ-5D_2, n = 3209
EQ + VI, n = 3298
Coef, SE
Coef, SE
Coef, SE
Coef, SE
Intercept
0.053, 0.027
0.008, 0.023
0.052, 0.023
0.018, 0.022
VI
0.388, 0.018
MO
0.341, 0.017
0.299, 0.016
0.293, 0.016
0.266, 0.017
SC
0.299, 0.016
0.256, 0.016
0.214, 0.017
UA
0.386, 0.017
0.358, 0.017
0.316, 0.016
0.253, 0.017
PD
0.404, 0.017
0.358, 0.017
0.397, 0.016
0.330, 0.017
AD
0.476, 0.017
0.444, 0.017
0.416, 0.016
0.379, 0.017
L2
0.113, 0.025
0.135, 0.024
0.079, 0.025
0.146, 0.022
L3
0.263, 0.021
0.242, 0.020
0.265, 0.020
0.260, 0.020
L4
0.757, 0.021
0.748, 0.021
0.736, 0.021
0.705, 0.021
Log likelihood
−1431.92
−1463.1
−1421.7
−1489.9
Note. All coefficients were significant at 0.05 level.
AD indicates anxiety/depression; CALE, cross-attribute level effects; Coef, coefficient; EQ − SC, EQ-5D-5L without the self-care dimension; EQ + VI, EQ-5D-5L plus a vision dimension; EQ-5D-5L, 5-level version of EQ-5D; L2, level 2; L3, level 3; L4, level 4; Mo, mobility; PD, pain/discomfort; SC, self-care; UA, usual activities.
Table 3 shows the cross-validation results of the leave-one-state-out analysis. In overall, the CALE model outperformed the conventional model in 3 types of health states across 2 arms. The relative decrease of MAE/RMSE ranged between 24% and 115% when using the CALE model (with a random intercept) compared with the conventional model (with a random intercept). Compared with a fix intercept, using a random intercept marginally improved the performance for the CALE model for all 4 arms but led to mixed results for the conventional model (Table 3). In either model, as the number of dimensions increased, the MAE/RMSE increased and the Intraclass correlation coefficient/concordance correlation coefficient/Pearson C decreased. Similar results were observed in the leave-one-block-out cross-validation analysis (Table 4).
Table 3Results of the leave-one-state-out cross-validation analysis.
CALE indicates cross-attribute level effects; CCC, concordance correlation coefficient; EQ − SC, EQ-5D-5L without the self-care dimension; EQ + VI, EQ-5D-5L plus a vision dimension; EQ-5D-5L, 5-level version of EQ-5D; MAE, mean absolute errors; RMSE, root mean squared errors.
∗ Relative reduction was based on the results of random intercept models.
Relative reduction was based on the results of random intercept models for EQ − SC and EQ-5D models, but was based on fixed intercept models for EQ + VI.
The model could not converge for the leave-one-out by block for the EQ + VI states.
2
CALE indicates cross-attribute level effects; CCC, concordance correlation coefficient; EQ − SC, EQ-5D-5L without the self-care dimension; EQ + VI, EQ-5D-5L plus a vision dimension; EQ-5D-5L, 5-level version of EQ-5D; MAE, mean absolute errors; NA, not applicable; RMSE, root mean squared errors.
∗ Relative reduction was based on the results of random intercept models for EQ − SC and EQ-5D models, but was based on fixed intercept models for EQ + VI.
† Indicates the best performance.
‡ The model could not converge for the leave-one-out by block for the EQ + VI states.
This study provided new evidence for the superior out-of-sample predictions of the CALE model over the conventional main-effects model for modeling health state values including EQ-5D-5L and 2 types of modified EQ-5D-5L states. All previous comparisons of the 2 models were secondary analyses of EQ-5D-5L valuation data collected using the same EQ-VT protocol. Therefore, it was possible that the results of those comparisons were specific to the study design. For example, the 86 health states included in the EQ-VT protocol may coincidentally favor the assumption of the CALE model. Nevertheless, this study is a dedicated investigation into the relative merits of the CALE model over the conventional model. We collected and modeled TTO data from a different set of EQ-5D-5L health states that is sufficient for estimating a value set. We also collected and modeled TTO data for 2 types of modified EQ-5D health states. Hence, our study suggests that the advantage of the CALE model in out-of-sample predictions is generalizable to other valuation study designs and even modified EQ-5D health states. This, together with findings from previous studies,
clearly indicated that the CALE model should be considered together with the conventional main-effects model in future EQ-5D-5L valuation studies. The conventional model is a useful tool for examining data quality and identifying possible interactions between dimensions and levels. The CALE model, if found to have better out-of-sample predictions, could be used to produce the value set given that its predictions should be closer to the true values than the currently used model.
This study highlighted the importance of cross-validation for model evaluation in EQ-5D value set studies. In most EQ-5D value set studies, model selection is based on in-sample model fit, that is, agreement between observed and predicted values for the health states whose values are used to estimate the model. Such an approach is at the risk of selecting an overfitted model that may provide inferior predictions for health states that are not included in the value set studies. Predictions for health states not included in the valuation studies are important because only a very small portion of the EQ-5D health states are valued in value set studies. In the case of EQ-5D-5L, the recommended EQ-VT protocol included only 86 of 3125 health states. Cross-validation favors the model that provides the best predictions for out-of-sample health states. It reduces the risk of selecting an overfitted model. Indeed, the conventional model outperformed the CALE model in in-sample model fit (data not shown) but underperformed in out-of-sample model fit in our study, suggesting overfitting. Therefore, we advocate the use of cross-validation to guide the model selection process in future EQ-5D value set studies, irrespective of the use of the CALE model.
In addition to providing a new model specification for the standard EQ-5D-5L valuation studies, the use of the CALE model could significantly lower the costs of value set studies. Taking EQ-5D-5L as an example, the CALE model needs less than half of data for using the conventional 20-parameter model to estimate a value set with comparable prediction accuracy. This is important for countries where data collection is difficult or resources are limited. The current EQ-VT protocol requires face-to-face interviews of 1000 individuals who constitute a nationally representative general population sample. This sample size is based on the statistical power needed for running the 20-parameter model.
Oppe M, Hout B. The ‘power’ of eliciting EQ-5D-5L values: the experimental design of the EQ-VT; 2017, EuroQol Working Paper Series 17003. https://euroqol.org/publications/working-papers/. Accessed December 1, 2022.
The data collection is typically completed by 5 to 10 interviewers who are intensively trained before and closely monitored by a specialized data QC team during the data collection period that may last 3 to 6 months.
Given that most countries worldwide have not established their EQ-5D value sets, a lighter, lower-cost EQ-5D-5L valuation protocol would be very attractive. This cost-saving feature of the CALE model could also benefit the development of new preference-based instruments.
Our study demonstrated that it works well for expanded EQ-5D health states to cover vision function, suggesting possible application for estimating EQ-5D bolt-on value sets. If country-specific bolt-on value sets are considered appropriate, using the current EQ-VT protocol may not be economically viable or sustainable even for resource-rich countries, given the increasingly number of bolt-ons that may be potentially useful.
and SF-6D, may also take advantage of the CALE model to lower their research costs. Theoretically, the more dimensions an instrument has and the more similar the level descriptors are, the more efficient the CALE model can be. Future research is warranted to assess this potential of the CALE model.
This study has several limitations. First, a student sample was used in this experimental study due to budget constraint. It has been shown that the preference of students is more homogeneous than the general public.
In addition, it is clear that the student sample has a very different preference over EQ-5D health states compared with the general public; for example, mobility was valued the lowest and second lowest in the first and second arms respectively in this study, but was the most important dimension in Chinese value set.
In addition to the preference difference, the time spent on the interview (36.8 minutes) was shorter in this study than national valuation studies, for example, 58.3 minutes for the United States
which may be due to university students who tend to be more intellectual and engaged in the cTTO task and our interviewers who were experienced with conducting cTTO interviews. Second, the current study assumes that a main-effects model is sufficient for modeling health state values. We did not test interaction effects mainly because our study design (ie, selection of health states) was optimized for main-effects models. In the recent Slovenia 3-level version of EQ-5D valuation study, a variant of the CALE model with an exponential parameter was used to represent the marginal utility decrease theory.
In the marginal utility decrease theory, respondents may display diminishing sensitivity to health problems when combined, so that the perceived disutility of problems on 2 separate dimensions at the same time may be smaller than the sum of the disutility of each problem in isolation. Third, only vision was used to expand the EQ-5D health states in this study and the vision dimension was described differently with other vision bolt-on studies.
Use of generic and condition-specific measures of health-related quality of life in NICE decision-making: a systematic review, statistical modelling and survey.
We encourage future studies to use well-established bolt-on descriptors. Our finding about the CALE model may not be generalizable to valuation of other health dimensions such as those studied as bolt-ons to the EQ-5D.
Use of generic and condition-specific measures of health-related quality of life in NICE decision-making: a systematic review, statistical modelling and survey.
To address these limitations, future studies should consider using a general public sample and a different bolt-on item. Finally, our study used a smaller design than the 86-state design of the EQ-VT studies. Given that the conventional 20-parameter model is proven to experience overfitting, a smaller design favors the CALE model. Hence, it would be useful to compare the performance 2 models using a larger design in the future.
Conclusion
The CALE model is proved to be an attractive model specification for estimating EQ-5D-5L value sets. We strongly recommend investigators of future EQ-5D-5L value set studies to consider it together with the conventional main-effects model. Investigators with limited resources may consider using the CALE model to lower the costs for their valuation studies for EQ-5D-5L or similar health state descriptive systems.
Article and Author Information
Author Contributions:Concept and design: Yang, Rand, Busschbach, Luo
Acquisition of data: Yang
Analysis and interpretation of data: Yang, Rand, Busschbach, Luo
Drafting of the manuscript: Yang, Rand, Busschbach, Luo
Critical revision of the paper for important intellectual content: Yang, Rand, Luo, Busschbach
Statistical analysis: Yang, Rand
Obtainingfunding: Yang
Administrative, technical, or logisticsupport: Yang
Conflict of Interest Disclosures: Drs Yang and Luo reported receiving grants from the EuroQol Research Foundation during the conduct of the study and outside the submitted work. Dr Rand is the current Chair of the EuroQol Scientific Executive Committee. Dr Busschbach reports grants from the EuroQol Research Foundation outside the submitted work. Drs Yang, Rand, Busschbach, and Luo are EuroQol members. Dr Luo is an editor for Value in Health and had no role in the peer-review process of this article. No other disclosures were reported.
Funding/Support: This work was supported by the grant 20170640 from the EuroQol Research Foundation, the Netherlands.
Role of theFunder/Sponsor: The funder had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Oppe M, Hout B. The ‘power’ of eliciting EQ-5D-5L values: the experimental design of the EQ-VT; 2017, EuroQol Working Paper Series 17003. https://euroqol.org/publications/working-papers/. Accessed December 1, 2022.
Use of generic and condition-specific measures of health-related quality of life in NICE decision-making: a systematic review, statistical modelling and survey.