Willingness to Pay for Health-Related Quality of Life Gains in Relation to Disease Severity and the Age of Patients

Objectives: Decision-making frameworks that draw on economic evaluations increasingly use equity weights to facilitate a more equitable and fair allocation of healthcare resources. These weights can be attached to health gains or reflected in the monetary threshold against which the incremental cost-effectiveness ratios of (new) health technologies are evaluated. Currently applied weights are based on different definitions of disease severity and do not account for age-related preferences in society. However, age has been shown to be an important equity-relevant characteristic. This study examines the willingness to pay (WTP) for health-related quality of life (QOL) gains in relation to the disease severity and age of patients, and the outcome of the disease. Methods: We obtained WTP estimates by applying contingent-valuation tasks in a representative sample of the public in The Netherlands (n = 2023). We applied random-effects generalized least squares regression models to estimate the effect of patients’ disease severity and age, size of QOL gains, disease outcome (full recovery/death 1 year after falling ill), and respondent characteristics on the WTP. Results: Respondents’WTP was higher for more severely ill and younger patients and for larger-sized QOL gains, but lower for patients who died. However, the relations were nonlinear and context dependent. Respondents with a lower age, who were male, had a higher household income, and a higher QOL stated a higher WTP for QOL gains. Conclusions: Our results suggest that—if the aim is to align resource-allocation decisions in healthcare with societal preferences—currently applied equity weights do not suffice.


Introduction
An important objective of publicly financed healthcare systems is to maximize population health given a certain budget constraint. 1 To meet this objective, economic evaluations can be used to inform decision makers about whether reimbursing a (new) health technology can be considered good value for money. In economic evaluations, health gains are often expressed in terms of quality-adjusted life-years (QALYs), comprising gains in both health-related quality of life (QOL) and life expectancy (LE). 2,3 The incremental cost-effectiveness ratio (ICER) of a technology is evaluated against a monetary threshold that represents the maximum societal willingness to pay (WTP) for a QALY or the opportunity costs of spending within the healthcare sector. [4][5][6] Traditionally, a "QALY is a QALY is a QALY" in economic evaluations, 7 meaning that all health gains are valued equally. However, equity weights can be attached to health gains or reflected in the monetary threshold to facilitate a more equitable and fair allocation of healthcare resources. 1, [8][9][10][11][12] In the former case, the equity-adjusted ICER of a technology is evaluated against a fixed monetary threshold and in the latter case, the (unadjusted) ICER of a technology is evaluated against a flexible, equity-adjusted monetary threshold. 1,10 These weights can be based on a range of equity considerations that, for example, are related to characteristics of the patients, disease, or technology. 1, [12][13][14][15][16][17][18][19][20] To facilitate consistent and accountable decision making, it has been advocated to explicitly and transparently integrate such considerations into the decision-making framework. [21][22][23][24] Although many countries (eg, France, Germany, Sweden, and Australia) do this in an ad hoc, implicit manner, [25][26][27] Norway, The Netherlands, and England do this in an explicit manner by applying equity weights. [28][29][30][31] Text Box 1 includes a brief overview of how the weights are applied in these countries.
Societal preferences for equity weighting based on disease severity (defined broadly here to include absolute shortfall, proportional shortfall, and end-of-life considerations associated with terminal illnesses as described in Text Box 1) are increasingly studied, also in relation to patients' age. The available evidence suggests that the public considers age to be an important equityrelevant characteristic (often reflected by giving a higher weight to health gains in younger patients), possibly even more important than disease severity. 1, 5,11,[39][40][41][42] Nevertheless, the weights applied in Norway, The Netherlands, and-classifying terminally ill patients as severely ill 43 -England are all based on disease severity. These weights do not directly account for patients' age nor aim to weight age in resource-allocation decisions, even though they may be inextricably related to patients' age. 11 For example, the weights based on absolute shortfall in Norway may implicitly prioritize younger over older patients, and, conversely, the weights based on proportional shortfall in The Netherlands and end-of-life considerations in England may implicitly prioritize older over younger patients. 11,[44][45][46] The aim of this study was to examine the willingness to pay (WTP) for health gains in relation to the disease severity and age of patients and to examine whether the WTP was different between health gains in patients who fully recovered and patients who died (1 year after falling ill). Based on the available evidence on societal preferences in this context, we hypothesized that WTP would be higher for more severely ill and younger patients, and for patients who can be considered terminally ill. We further hypothesized that the elicited WTP would be sensitive to scale and to household income, indicating the theoretical validity of the elicited WTP. 47 We elicited the WTP for health gains in terms of an increase in monthly basic health-insurance premium in a representative sample of the general public in The Netherlands, as this relates directly to the payment vehicle used for collectively funding healthcare in this country. Given the aim of our study, we focus on the relative rather than absolute WTP for health gains. The results of this study may inform decisions on the relative size of severityand/or age-dependent equity weights and on the range and shape of monetary thresholds used to evaluate the ICERs of health technologies. The results are considered to be of particular interest to Norway, The Netherlands, and England given their current use of equity weights, but also to other countries that (intend to) integrate equity and efficiency considerations into their formal decision-making framework.

Sample and Data Collection
We designed a contingent-valuation (CV) study that was administered online by a professional research agency (Dynata). Respondents were quota sampled to be representative of the general public in The Netherlands in terms of age (18-75 years), sex, and education level and to obtain a broad range in household income. Before conducting the main study in August 2019, we conducted a pilot study in a small sample (n = 100) to test the range of the payment scale and clarity of the tasks. The results of this study did not lead to modifications, and hence we merged the pilot and main data before conducting the analyses (total sample n = 2023).
Before respondents completed the questionnaire, we explained that healthcare resources are scarce and decision makers use information on societal preferences to allocate the available resources in an optimal manner for society. We asked respondents to complete the CV tasks from a socially-inclusive-personal (SIP) perspective. 48 This implied that they had to take into consideration the possibility that they themselves, their family, friends, and/or acquaintances could be part of the hypothetical patient group as well as unknown others. As the SIP perspective represents a combination of the personal and social perspectives, 48 applying it may be seen as yielding relevant WTP estimates for health gains in the context of a collectively funded healthcare system like that of The Netherlands. 48,49 Upon completion of the questionnaire, respondents received a fee of 50 eurocents that they could save in a personal account or donate to charity.

Questionnaire
The questionnaire consisted of 4 parts. In part 1, we introduced respondents to the following concepts using text and graphs: (1) QOL, operationalized on a visual analogue scale (VAS) ranging from 0 "dead" to 100 "full health"; (2) disease severity, operationalized as disease-related QOL loss (in points from 100 on the VAS) in patients who fully recovered and as a combination of QOL and LE (in years) loss in patients who died 1 year after falling ill; TEXT BOX 1. Application of equity weights in Norway, The Netherlands, and England.
In Norway and The Netherlands, the (unadjusted) incremental cost-effectiveness ratio (ICER) of a health technology is evaluated against a monetary threshold that is weighted based on the disease severity of the targeted patient population. 11,28,31 In Norway, a flexible threshold in the range of NOK 275 000 to 825 000 (~V27 500 to V82 500) per quality-adjusted life-year (QALY) is (informally) applied, with a maximum weight of 3 attached for evaluating the ICER of a health technology that targets patients with the highest level of disease severity (ie, an absolute shortfall of $20 QALYs). [31][32][33] Absolute shortfall is calculated as the disease-related loss of remaining QALYs without the new health technology, compared to the remaining QALY expectation in absence of the disease. 31,34,35 In The Netherlands, a flexible threshold in the range of V20 000 to V80 000 per QALY gained is applied, with a maximum weight of 4 attached for evaluating the ICER of a health technology that targets patients with the highest level of disease severity (ie, a proportional shortfall of 0.71-1.00). 11,28,29 Proportional shortfall is calculated as the proportion of absolute shortfall, relative to the remaining QALY expectation in absence of the disease and measured on a scale from 0 "no QALY loss" to 1 "complete QALY loss". 8,28 Health technologies that target patients with the lowest level of disease severity (ie, a proportional shortfall of ,0.10) are generally not recommended for reimbursement. 28,36,37 In England, a weight in the range of 1.7 to 2.5 can be attached to QALYs that are gained by prolonging the lives of terminally ill patients (normally with a remaining life expectancy of #24 months) by at least 3 months. 30 The resulting (equity-adjusted) ICER of a health technology is then evaluated against the common threshold range of £20 000 to £30 000 (~V23 000 to V34 500). 30 Note that the weights are currently applied in decisions on the reimbursement of new health technologies, not in decisions on the displacement of other technologies that may follow as a result (eg, in England 30,38 ). and (3) treatment-related QOL gain (in points on the VAS). We familiarized respondents with the concepts and tasks by asking them to assess their own QOL "today" on the VAS and complete a practice task from a personal perspective. 48 After completing this task, we asked respondents to assess its level of clarity on a 7point Likert scale ranging from 1 "very unclear" to 7 "very clear" and to indicate on what they would likely economize to cover the stated WTP to increase their awareness of the associated opportunity costs.
In part 2, respondents completed 2 tasks from a SIP perspective for which they were randomly assigned to 2 out of 20 scenarios. Each scenario started with the introduction of a group of 10 000 patients aged 10, 20, 40, or 70. We explained that the patients would have lived in full health (a score of 100 on the VAS) until the age of 80 had they not fallen ill. Due to the disease, their QOL decreased from 100 to either 90, 70, 50, 30, or 10 on the VAS for the duration of 1 year. After this year, they would fully recover (ie, a score of 100 on the VAS). The disease would not affect their LE. We explained that a treatment was available that would increase patients' QOL with 10 points on the VAS during the year of illness and that the treatment type and costs were the same for all patients. The treatment could be made available to patients by increasing the monthly basic health insurance premium for the duration of 1 year. This increase would apply to all adult inhabitants of The Netherlands. We elicited respondents' WTP for the treatment-related QOL gains by applying the 2-step procedure described in Text Box 2. After respondents completed the second task, we asked them to indicate how certain they were of actually paying the stated WTP in case the increase became effective immediately, on a 7-point Likert scale ranging from 1 "very uncertain" to 7 "very certain".
In part 3, respondents completed 1 additional task from a SIP perspective for which they were randomly assigned to 1 out of 20 scenarios that were evenly distributed across 5 modules. In modules 124, the level of disease severity was 50, treatmentrelated QOL gain was 20 points (modules 1 and 3) or 50 points (modules 2 and 4), and patients either fully recovered (modules 1 and 2) or died (modules 3 and 4) 1 year after falling ill. We used the data from modules 1 and 2, and from the scenarios in part 2 in which the level of disease severity was also 50 to examine whether respondents' WTP was sensitive to scale. 47 We used the data from modules 124 to examine whether respondents' WTP for similar-sized QOL gains was different between gains in patients who fully recovered and patients who died. In module 5, we focused on a different question that is reported elsewhere. Appendix A includes a task example and Appendix B an overview of the scenario characteristics (see Supplemental Materials found at https://doi.org/10.1016/j.jval.2021.01.012).
In part 4, we asked respondents about their sociodemographic characteristics.

Statistical Analyses and Hypotheses
Before conducting the analyses, we excluded protest zero valuations (see Text Box 2), outliers, and speeders. We identified outliers based on the distribution of stated WTPs (z-score $1.64). We classified respondents who completed the 3 tasks in less than 90 seconds as speeders, based on a timed test of completing the tasks by 3 researchers not involved in this study.
We calculated the raw mean (SD; 95% CI) WTP for QOL gains in all scenarios and the difference in raw mean (SE; 95% CI) WTP for similar sized QOL gains in patients who fully recovered and patients who died 1 year after falling ill. We applied 2-tailed Welch's t-tests (Bonferroni corrected) to examine whether the latter was statistically significantly different from 0. Furthermore, we applied 7 random-effects generalized least squares models to estimate the effect of scenario and respondent characteristics on the WTP. Models 1 and 2 were based on the data obtained in part 2 of the questionnaire and included the scenario characteristics disease severity and age of patients, and their interaction. Models 3 to 6 were based on the data obtained in part 2 and 3 of the questionnaire. Model 3 included the scenario characteristics disease severity, age of patients, size of QOL gains, and disease outcome (full recovery/death, 1 year after falling ill). Model 4 also included the interaction between the disease severity and age of patients. Models 5 to 7 consecutively included the interactions between the disease severity and age of patients, size of QOL gains and age of patients, and disease outcome and age of patients as well as the respondent characteristics age, age 2 , sex, children (yes/no), education level, household income (adjusted for household size using an elasticity scale of 0.5 to account for economies of scale 52 ), and QOL.
Willingness to pay was elicited by applying a 2-step contingent-valuation procedure, consisting of a payment scale and a bounded open-ended question. 50 The payment scale ranged from a V0 to V24 increase in monthly basic health-insurance premium with unevenly distributed intervals between the value points (ie, V0, V0.50, V1, V1.50, V2, V2.50, V3, V4, V5, V6, V7, V8, V10, V12, V14, V16, V18, V20, V22, V24, and "more"). Note that monthly payment of health-insurance premiums is mandatory for adults (181) in The Netherlands. By approximation, the number of adults was 13.7 million and the monthly basic health-insurance premium was V115.00 per person in 2019. 51 In step 1, we asked respondents to inspect the payment scale from left to right and indicate the increase in monthly premium they were certainly willing to pay for the duration of 1 year. We then asked them to again inspect the payment scale from left to right and indicate the increase in monthly premium they would certainly not be willing to pay for the duration of 1 year. In step 2, we asked respondents to indicate the maximum increase in monthly premium they would be willing to pay within the range obtained in step 1. In both steps, we asked respondents to take their net monthly household income into account as a proxy for their ability to pay. We asked respondents who stated a WTP of V0 in step 1 to explain their main reason for having this preference by completing an open-text field or checking 1 of 6 randomized answer options. Three answer options related to true zero valuations (ie, "I cannot afford to pay more than V0", "Treating these patients is not worth more than V0 to me", and "I believe the treatment is worth more than V0, but I would rather spend my money on something else"), and 3 answer options related to protest zero valuations (ie, "I am against an increase in monthly basic health-insurance premium", "Patients should pay for the treatment themselves", and "The value of health and healthcare cannot be expressed in monetary terms"). The open-text field answers were qualitatively assessed by the first 2 authors and subsequently classified as either a true or protest zero valuation.
We assumed that respondents might have used their first WTP stated from a SIP perspective as a reference (anchor) in the subsequent tasks. After testing this assumption, we decided to account for a time effect in all models (labeled "CV task"). 53 A downward adjustment of a previously stated WTP could also indicate a violation of the monotonicity principle that a largersized QOL gain should, ceteris paribus, result in a higher WTP. 54 Therefore, we performed sensitivity analyses to examine the robustness of our results by repeating the analyses excluding respondents who, ceteris paribus, stated a lower WTP for larger QOL gains. We also examined the robustness of our results by alternately repeating the analyses excluding respondents who reported a low clarity score (ie, 123 level) for the practice task, reported a low certainty score (ie, 123 level) for actually paying the stated WTP, and completed the 3 tasks in less than 39 (instead of the predetermined 90) seconds based on the distribution of completion times (z-score #-1.64). Furthermore, we examined the effect of respondents' proximity to the age of patients and of respondents' stated WTP in the practice task on the WTP.
Before conducting the analyses, we hypothesized that respondents' WTP would be higher for QOL gains in more severely ill patients (ie, patients with a higher level of disease severity and patients who died 1 year after falling ill) and for QOL gains in younger patients. Moreover, we hypothesized that respondents' WTP would be sensitive to scale and to household income in the sense that the WTP would be higher for larger-sized QOL gains and for respondents with a relatively higher household income. Evidence in support of the latter hypothesis would indicate the theoretical validity of the elicited WTP. 47 We conducted the analyses using Stata 16.1 (Stata Corp LP, College Station, TX). Table 1 presents the descriptive statistics of the sample (n = 1317) that remained after excluding protest zero valuations (n = 73), outliers (n = 31), and speeders (n = 602). Of the speeders, 50 also gave protest zero valuations, and 12 also stated an outlying WTP. The statistics indicate that the sample was representative of the general public in The Netherlands in terms of sex and education level, but somewhat older.

Results
The remaining respondents assessed the mean (SD) clarity of the practice task at 5.9 (1.1) and certainty of actually paying the stated WTP at 5.4 (1.3) on the 7-point Likert scale. A total of 50 (3.8%) respondents reported a low clarity score, 98 (8.3%) a low certainty score, and 37 (2.8%) stated, ceteris paribus, a lower WTP for a larger-sized QOL gain. Table 2 presents the raw mean (SD; 95% CI) WTP for QOL gains of 10 points in patients who fully recovered 1 year after falling ill. On average, the WTP was V8.0 per month for the duration of 1 year. The results indicate that respondents' WTP was generally higher for QOL gains in more severely ill patients (average WTP: V7.3 to V8.4) and younger patients (average WTP: V7.8 to V8.4); however, the relations were nonlinear. We observed a relatively low average WTP of V8.0 for QOL gains in patients with a severity level of 50 and a relatively high average WTP of V8.4 for QOL gains in patients with a severity level of 70. We also observed a relatively low average WTP of V7.9 for QOL gains in patients aged 20. Because the SDs were relatively large and the 95% CIs largely overlapped, these results suggest strong preference heterogeneity and only partially support the hypotheses that respondents' WTP is higher for more severely ill and younger patients. Table 3 presents the raw mean (SD; 95% CI) WTP for QOL gains of 20 and 50 points in patients with a severity level of 50 who fully recovered and who died 1 year after falling ill (for comparison presented with the WTP for QOL gains of 10 points in patients with severity level 50 who fully recovered, copied from Table 2). These results indicate that respondents' WTP was generally higher for larger sized QOL gains and "hump shaped" across ages, with a peak at age 10, 20, or 40 depending on the size of the gain and whether patients fully recovered or died. Respondents' WTP was higher for similar-sized QOL gains in patients who fully recovered than in patients who died, except for gains of 20 points in patients aged 20 and 40. Most differences were not statistically significant, except for the lower mean (SE; 95% CI) WTP of V4.5 (1.8; -8.0, -1.0) for QOL gains of 50 points in patients aged 20 who died than in those who fully recovered (Bonferroni corrected, a/4). Although these results indicate that respondents' WTP is higher for younger patients and larger-sized QOL gains, they do not support the hypothesis that respondents' WTP is higher for patients who die.
Tables 4 and 5 present the regression results. Note that models 1 and 2 are based on the data obtained in part 2, and models 3 to 7 are based on the data obtained part 2 and 3 of the questionnaire.  The results indicate that-compared to severity level 10-a higher severity level was, ceteris paribus, associated with a higher WTP, though it was relatively low for patients with severity level 90  QOL indicates health-related quality of life; WTP, willingness to pay (in V per month for the duration of 1 year). *Protest zero valuations, outliers (raw WTP $V35.00), and speeders are excluded from this table. In the scenarios, the groups consist of 10 000 patients who fully recover 1 year after falling ill. The treatment-related QOL gain is 10 points, measured on a visual analogue scale ranging from 0 "death" to 100 "full health". † Age at onset of the disease (in years). ‡ Severity is operationalized in terms of disease-related QOL loss and measured in points from 100 on the visual analogue scale.  These results support the hypotheses that respondents' WTP is higher for more severely ill patients and larger-sized QOL gains. However, they only partially support the hypothesis that respondents' WTP is higher for younger patients because this was dependent on patients' level of disease severity in some scenarios.
The results presented in Table 5 provide further insight into the interactions between patients' disease severity, the size of QOL gains, and disease outcome and age of patients (see Fig. 1), and into the effect of respondent characteristics on WTP. The results indicate that respondents' WTP for QOL gains of different sizes in patients with different disease outcomes was dependent on patients' age. Compared to patients aged 10, respondents' WTP was lower for patients aged 70 when the QOL gain was 10 points (model 6: b -0.90) and higher for patients aged 20 when the QOL gain was 20 points (model 6: b 1.35). Compared to patients aged  VAS, visual analogue scale (ranging from 0 "dead" to 100 "full health"); WTP, willingness to pay (in V per month for the duration of 1 year). *P , .1; **P , .05; ***P , .01; ****P , .001. † Note that models 1 and 2 are based on the data obtained in part 2 and models 3 and 4 are based on the data obtained in part 2 and 3 of the questionnaire. In models 1 and 2, we identified speeders as respondents who completed the 2 tasks in less than 60 seconds (based on timed test of completing the tasks by 3 independent researchers) in the main analysis and in less than 25 seconds (based on the distribution of completion times; z-score # -1.64) in the sensitivity analysis. ‡ Note that the Severity and Age coefficients cannot be directly compared between the models. In models 1 and 3, these coefficients represent main effects, and in models 2 and 4, these coefficients represent conditional effects. § Severity is operationalized in terms of disease-related QOL loss and measured in points from 100 on the VAS.   With regard to respondent characteristics, a higher age was associated with a lower WTP (models 4 to 6: b -0.34 to -0.35). Being male (models 5 to 7: b 1.64), having children (models 5 to 7: b 0.99 to 1.02), having a higher (adjusted) household income (models 5 to 7: b 1.44 to 1.45), and a higher QOL (models 5 to 7: b 0.02) were also associated with a higher WTP. The results confirm the theoretical validity of the elicited WTP. The sensitivity analyses indicated that respondents' stated WTP in the practice task had a marginal effect on the stated WTP in the subsequent tasks (models 1 to 7: b 0.01) and that our results were robust.

Discussion
Our aim was to examine the WTP for QOL gains in relation to the disease severity and age of patients in a representative sample of the general public in The Netherlands. Furthermore, our aim was to examine whether the WTP was different between QOL gains in patients who fully recovered and patients who died 1 year after falling ill, and whether the WTP was sensitive to scale. Our main findings are that the WTP is generally higher for QOL gains in patients with a higher level of disease severity and younger age, and for larger-sized gains, but is lower for gains in patients who die 1 year after falling ill. However, the relations were nonlinear and context dependent. For example, the WTP was higher for QOL gains in patients aged 10 than for gains in patients aged 20 and 40 when patients had a severity level of 10, but the WTP was higher for patients aged 20 and 40 from severity levels 50 and 70 onward. The WTP for QOL gains in patients aged 70 was consistently lower than for gains in younger patients. This may be explained by the fact that these patients already had their "fair share of life" at onset of the disease or that less than "full health" is more accepted at an older age. 44,[55][56][57] We would like to make four remarks in relation to our findings. First, we applied a SIP perspective for eliciting respondents' WTP; therefore, our findings can be driven by self-regarding as well as other-regarding preferences of respondents. Although we investigated the potential influence of observable self-regarding preferences (eg, associated with having children and respondents' proximity to the age of patients), we acknowledge that unobservable self-regarding preferences (eg, associated with the probability of respondents' own need for treatment) may have impacted our results. Second, our findings need to be considered in relation to the applied design. The WTP for QOL gains may differ when elicited on full QOL and age scales, in combination with LE gains, or from a social perspective that excludes respondents from the hypothetical patient group. 48 Third, we observed considerable preference heterogeneity, which is consistent with the findings of other studies that examined societal preferences in this context. 1, 10,12,47 Accounting for (some of) this heterogeneity in resource-allocation decisions may be possible and worth pursuing, especially when aiming to align the outcomes of such decisions with societal preferences. However, our results and those of other related studies indicate that societal preferences are complex, and, consequently, there will likely always be groups in society who do (not) agree with decisions made (based on average values). Finally, the (differences in) stated WTPs could be considered modest. However, they need to be considered in relation to the respondent instruction that the increase in monthly basic health-insurance premium would apply to all adult inhabitants of The Netherlands for the duration of 1 year. Hence, on an aggregated level the (differences in) WTP per QALY can be considered substantial. The treatment generates 1000 QALYs (ie, 10 000 patients 3 0.1 QALY), and hence, on average, WTP is~1.3 million euros (calculated as 8 euros 3 12 months 3 13.7 million premium payers/1000 QALYs). Although this value is much higher than the monetary thresholds currently applied in The Netherlands (see Text Box 1) and likely influenced by the scenario characteristics (eg, the number of patients and certainty of QOL gains), it should be noted that such high values are not uncommon in preference-elicitation studies 58 and, considering the high ICERs of some health technologies that are currently reimbursed in The Netherlands, 59,60 also not in decision-making practice.
Our findings are consistent with those of other studies that find societal support for attaching a higher weight to health gains in more severely ill and younger patients. 11,39,[61][62][63] and to largersized health gains. 47 and with those of other studies that find no support for attaching a higher weight to gains in terminally ill patients. [64][65][66][67] The latter is consistent with studies that find that the public may attach a lower weight to health gains in patients with an undesirable "end point after treatment." [58][59][60]68 Although there may be a moral case for attaching a higher weight to health gains in terminally ill patients, 69 our findings-like those of previous studies 64-67 -suggest that empirical support for applying a higher weight to these gains may be limited. This is recognized by the National Institute for Health and Care Excellence (NICE) in England, who recently proposed to replace their end-of-life criterion (see Text Box 1) by considerations that relate more broadly to the disease severity of patients in order to better align their decision-making framework with societal preferences. 69,70 However, because these societal preferences are usually not elicited in monetary terms, we are limited in our ability to directly compare our results to those of others. However, we can compare our results to those of Bobinac et al 5 as they applied the CV approach from a social perspective in scenarios similar to ours. Both our studies found a higher WTP for health gains in younger patients and larger-sized gains. However, in contrast to our findings, Bobinac et al 5 found a lower WTP for health gains in patients with a higher level of disease severity and a higher WTP for gains in terminally ill patients. This may be explained by the different way in which they operationalized disease severity and the health gain in the specific scenarios, ie, in terms of proportional shortfall and the prevention of immediate death. 5 The main strengths of this study lie in the use of a realistic payment vehicle, pilot-tested payment scale, and 2-step CV procedure. Although we could have applied other methods (eg, a discrete choice experiment) to elicit respondents' WTP, the CV method enabled us to approach respondents' common decision context and examine their explicit WTP (instead of, for example, deriving their WTP from the trade-off between scenario characteristics). Other strengths lie in the randomization of scenarios, exclusion of speeders, restriction of the disease duration to 1 year, standardization of patients' risk of falling ill and dying within a certain timeframe (ie, 100%, implying no uncertainty), the size of the patient group and QOL gains, and the costs of treatment, as this reduced the possible influence of an order effect, satisficing behavior, 71 cognitive biases associated with risk assessment 72 and of other considerations (eg, related to health maximization and the budget impact of reimbursing the new treatment) on our results. We appreciate that the latter strength comes with the limitation that our results cannot be generalized to other scenarios, for example, in which the number of patients is uncertain, patients are at risk of falling ill or of dying within a particular timeframe (ie, introducing uncertainty), or in which patients' lives are (also) prolonged. Another strength that comes with a limitation is the exclusion of protest zero valuations. Although including these valuations would have confounded the estimated WTP, it should be noted that the classification into protest and true zero valuations is not always straightforward and inevitably has some impact on results. 73 Some other limitations need to be discussed as well. A first limitation concerns the possible influence of payment-scale characteristics on the WTP. We facilitated a more exact mapping of respondents' WTP on the payment scale by applying a scale with a reasonable range and uneven intervals between the value points. 74,75 However, we cannot rule out the possibility that the scale influenced respondents' WTP, particularly in the case of unstable or not (yet) well-formed preferences. 74 We accounted for this by controlling for a time-effect and discussing the results in relative rather than absolute terms. A second limitation concerns the hypothetical context in which we elicited respondents' WTP. 76 Although the outlined context of a collectively funded healthcare system is realistic, we cannot rule out the possibility that their hypothetical nature increased the risk of an upward "hypothetical" bias, in which case the stated WTP could be an overestimation of respondents' true WTP. 77 However, we also cannot rule out the possibility that their realistic nature increased the risk of a downward "strategic" bias, in which case free-rider behavior of respondents may have offered a counterbalance. 78 A third limitation concerns the inclusion of QOL gains that (in some scenarios) fully restored patients' QOL to 100 points as this means we cannot distinguish between the effect of the size of QOL gains from the effect of patients' health being fully restored on respondents' WTP. 79 A final limitation concerns the low R 2 values of the regression models. We would like to note that our aim was not to predict WTP, and hence to explain as much data Figure 1. Graphical presentation of interaction terms (mean additional effect; 95% confidence interval) presented in Table 5. -variance as possible. Rather, our aim was to assess whether WTP was influenced by scenario characteristics (ie, associated with the disease severity and age of patients, the size of QOL gains, and outcome of the disease) and the models successfully aided in meeting that aim. Further research is warranted to obtain insight into other factors that may influence WTP for QOL gains. This fell outside the scope of the current study.
Our results are consistent with those of other studies suggesting that equity weights based on end-of-life considerations may not be consistent with societal preferences, [64][65][66]80 at least not if the weight is attached to QOL gains in terminally ill patients as in the current study. Furthermore, our results are consistent with those of other studies suggesting that weights based on disease severity are consistent with societal preferences. However, our results suggest that the weights may decrease marginally with increasing disease severity, have a fairly narrow range across severity levels (possibly narrower than the threshold range of V20 000 to V80 000 currently applied in The Netherlands), and are dependent on patients' age. Further research is necessary to examine the robustness of these results in relation to the prevalence of a disease and the related budget impact of a new technology.
Because there is much variation between the results and designs of studies that examine the strength of societal preferences, 68 there is still considerable uncertainty about the "exact" weight. For example, a recent study estimated equity weights based on patients' disease severity in the range of 2.5 to 2.8 by using the person-trade-off approach. 11 Given the very limited evidence on the WTP for health gains in representative samples the of general public, further research is necessary to inform decisions about the appropriate size and range of equity weights and, in relation to this, the range and shape of the monetary thresholds against which the ICERs of new health technologies are evaluated. This may, for example, concern research into the most appropriate design for eliciting the WTP for health gains from a social or SIP perspective.

Conclusions
Our results indicate that the WTP is higher for QOL gains in more severely ill and younger patients and for larger-sized QOL gains. It is lower for QOL gains in patients who die. However, the relations are nonlinear and context dependent. These results suggest that-if the aim is to align resource-allocation decisions in healthcare with societal preferences-currently applied equity weights do not suffice.

Supplemental Material
Supplementary data associated with this article can be found in the online version at https://doi.org/10.1016/j.jval.2021.01.012.