Advertisement

The ICEpop Capability Measure for Adults Instrument for Capabilities: Development of a Tariff for the Dutch General Population

Open AccessPublished:September 07, 2021DOI:https://doi.org/10.1016/j.jval.2021.07.011

      Abstract

      Objectives

      The ICEpop Capability Measure for Adults (ICECAP-A) assesses 5 capabilities (stability, attachment, autonomy, achievement, and enjoyment) that are important to one’s quality of life and might be an important addition to generic health questionnaires currently used in economic evaluations. This study aimed to develop a Dutch tariff of the Dutch translation of the ICECAP-A.

      Methods

      The methods used are similar to those used in the development of the UK tariff. A profile case best–worst scaling task was presented to 1002 participants from the general Dutch population. A scale-adjusted latent class analysis was performed to test for preferences of ICECAP-A capabilities and scale heterogeneity.

      Results

      A 3-preference class 2-scale class model with worst choice as scale predictor was considered optimal and was used to calculate the resulting tariff. Results indicated that the capabilities stability, attachment, and enjoyment were considered more important aspects of quality of life than autonomy and achievement. Additionally, improving capabilities from low to moderate levels had a larger effect on quality of life than improving capabilities that were already at a higher level.

      Conclusions

      The ICECAP-A tariffs found in this study could be used in economic evaluations of healthcare interventions in The Netherlands.

      Keywords

      Introduction

      Efficient allocation of resources is becoming increasingly important when it comes to making decisions in healthcare and health policy. Cost-utility analysis is a central tool for judging the efficiency of interventions and can support decisions on healthcare funding. Generally, quality-adjusted life-years (QALYs) are the central outcome measure in cost-utility analyses. To assess quality of life, generic utility measures are often used, such as the EQ-5D
      EuroQol Group
      EuroQol--a new facility for the measurement of health-related quality of life.
      or the Short-Form 6 Dimensions.
      • Brazier J.
      • Usherwood T.
      • Harper R.
      • Thomas K.
      Deriving a preference-based single index from the UK SF-36 Health Survey.
      Nevertheless, there is critique on the use of generic health questionnaires for economic evaluations, mainly that not all relevant domains of quality of life are captured by these instruments.
      • Byford S.
      • Sefton T.
      Economic evaluation of complex health and social care interventions.
      • Carr-Hill R.A.
      Assumptions of the QALY procedure.
      • Coast J.
      Is economic evaluation in touch with society’s health values?.
      Indeed, Pietersma, Van den Akker-Van Marle, and De Vries
      • Pietersma S.
      • van den Akker-Van Marle M.E.
      • De Vries M.
      Generic quality of life utility measures in health-care research: conceptual issues highlighted for the most commonly used utility measures.
      analyzed several generic utility measures and found that they capture only a selective amount of domains of quality of life and use an almost exclusive focus on people’s current functional abilities with little emphasis on coping capabilities and resources. Consequently, relevant benefits of interventions outside the area of physical health might be underestimated in current economic evaluations.
      Accordingly, considering a different, broader approach not limited to health-related quality of life might be more appropriate for determining treatment outcomes, especially for patients with a psychiatric disorder
      • Mitchell P.M.
      • Al-Janabi H.
      • Byford S.
      • et al.
      Assessing the validity of the ICECAP-A capability measure for adults with depression.
      or chronic illness. One such approach is based on capabilities.
      • Sen A.
      Inequality Reexamined.
      ,
      • Sen A.
      Capability and well-being.
      Capabilities indicate the extent to which someone is able to do what one wishes to do. The ICEpop Capability Measure for Adults (ICECAP-A)
      • Al-Janabi H.
      • Flynn T.N.
      • Coast J.
      Development of a self-report measure of capability wellbeing for adults: the ICECAP-A.
      is an instrument that measures well-being based on capabilities and may be an appropriate addition to the established EQ-5D. The instrument is receiving increased international recognition
      • Flynn T.N.
      • Huynh E.
      • Peters T.J.
      • et al.
      Scoring the ICECAP-A capability instrument: estimation of a UK general population tariff.
      and may be used for economic evaluations of treatments aimed at improving not only physical health but well-being in general. Indeed, regarding its construct, existing research suggests that the ICECAP-A correlates positively with concepts such as feelings of happiness and freedom
      • Al-Janabi H.
      • Peters T.J.
      • Brazier J.
      • et al.
      An investigation of the construct validity of the ICECAP-A capability measure.
      and that it can capture information beyond health-related quality of life.
      • Afentou N.
      • Kinghorn P.
      A systematic review of the feasibility and psychometric properties of the ICEpop CAPability measure for adults and its use so far in economic evaluation.
      • Engel L.
      • Mortimer D.
      • Bryan S.
      • Lear S.A.
      • Whitehurst D.G.T.
      An investigation of the overlap between the ICECAP-A and five preference-based health-related quality of life instruments.
      • Keeley T.
      • Coast J.
      • Nicholls E.
      • Foster N.E.
      • Jowett S.
      • Al-Janabi H.
      An analysis of the complementarity of ICECAP-A and EQ-5D-3 L in an adult population of patients with knee pain.
      Economic evaluations that have already been conducted with the ICECAP-A suggest that using capabilities might lead to different decisions on resource allocation.
      • Afentou N.
      • Kinghorn P.
      A systematic review of the feasibility and psychometric properties of the ICEpop CAPability measure for adults and its use so far in economic evaluation.
      To be able to use the ICECAP-A in economic evaluations tariffs are needed to translate answers of patients on the ICECAP-A to a capability value between “0” and “1,” where “0” represents “not at all able to do what one wishes” and “1” represents “fully able to do what one wishes.” These anchoring values are different to utility values where “0” represents “health as bad as death” and “1” represents “perfect health.” Tariffs of the ICECAP-A of a certain population indicate how important the various capabilities are according to that population and they might differ between populations, cultures, and countries. A tariff already exists for the general population of the United Kingdom,
      • Flynn T.N.
      • Huynh E.
      • Peters T.J.
      • et al.
      Scoring the ICECAP-A capability instrument: estimation of a UK general population tariff.
      but to be able to reliably use the ICECAP-A in other countries, tariffs for those countries need to be developed. In The Netherlands, using the ICECAP in economic evaluations is recommended when benefits regarding well-being are expected, but no Dutch tariff is available. This study aimed to develop an ICECAP-A tariff for the Dutch general population.

      Methods

       Design, Participants, and Procedure

      Methods used to establish the Dutch tariff of the ICECAP-A are similar to those used for the development of the UK tariff by Flynn et al.
      • Flynn T.N.
      • Huynh E.
      • Peters T.J.
      • et al.
      Scoring the ICECAP-A capability instrument: estimation of a UK general population tariff.
      Participants were approached by a market research agency (Kantar Group). A sample of 1002 participants was recruited that was representative of the Dutch general population based on age, gender, region, and income. Because questionnaires were completed online with less possibility for guidance throughout the assessment compared with the interviews in the study by Flynn et al
      • Flynn T.N.
      • Huynh E.
      • Peters T.J.
      • et al.
      Scoring the ICECAP-A capability instrument: estimation of a UK general population tariff.
      and a larger study size was recommended by the UK research group, a sample size of 1000 was assumed to be adequate for establishing the Dutch tariff. Additionally, Yang, Johnson, Kilambi, and Mohamed
      • Yang J.
      • Johnson F.R.
      • Kilambi V.
      • Mohamed A.F.
      Sample size and utility-difference precision in discrete-choice experiments: a meta-simulation approach.
      showed that in discrete choice experiments a sample size of 1000 provides sufficient power for study designs that were similar to that of the current study (type 2 best–worst scaling, conditional logit latent class model) in terms of estimator properties. Participants were first informed about the study and could only continue to the online questionnaire if they consented with participating. They were paid a small sum of money to complete the questionnaire. Only fully completed assessments were saved and no information on the amount or content of partially completed questionnaires was stored. Information the researchers received from the marketing bureau was anonymous and could not be traced back to individuals. An independent medical ethics committee evaluated the study and confirmed it did not fall within the Medical Research Act, waiving the need for ethical approval (Medisch Ethische Toetsingscommissie Leiden-The Hague-Delft, file number N19.119).

       Measurements

       Best–worst scaling task

      The ICECAP-A comprises 1024 (4 levels for each capability) possible states. Using the orthogonal main-effect plan (OMEP) design created by Flynn et al,
      • Flynn T.N.
      • Huynh E.
      • Peters T.J.
      • et al.
      Scoring the ICECAP-A capability instrument: estimation of a UK general population tariff.
      16 profiles, each containing 1 possible ICECAP-A state, were presented to participants. Half of the participants were presented with the 16 profiles from the OMEP design and the other half with its 16 foldover profiles (eg, capabilities presented at level 4, 3, 2, or 1 in the original OMEP design were presented at level 1, 2, 3, and 4, respectively, in this foldover). The OMEP design and its foldover can be found in Appendix Table 1 in Supplemental Materials found at https://doi.org/10.1016/j.jval.2021.07.011. For each of the 16 profiles, participants had to indicate which of the capabilities they valued as best and which as worst. This is known as a (profile case) bestworst scaling task.
      • Potoglou D.
      • Burge P.
      • Flynn T.N.
      • et al.
      Best–worst scaling vs. discrete choice experiments: an empirical comparison using social care data.
      An example of a profile can be seen in Figure 1. A pilot questionnaire was completed in an in-person interview by a convenience sample of 10 people of different ages and educational level to confirm the task would be understood by participants.
      Figure thumbnail gr1
      Figure 1Example of a completed best–worst profile. Note. Sixteen such profiles were completed in Dutch by participants.
      The number in straight brackets [#] indicates the level of the corresponding statement, ranging from [1], the lowest level, to [4], the highest level. In the example, the participant evaluated statement 3 “completely independent” to be the best (ie, adds the most to a valuable life) and statement 5 “cannot have any enjoyment and pleasure” to be the worst (ie, obstructs having a valuable life the most).
      In the final questionnaire, participants were first asked to complete questions on demographics and their health and the ICECAP-A.
      • Flynn T.N.
      • Louviere J.J.
      • Peters T.J.
      • Coast J.
      Using discrete choice experiments to understand preferences for quality of life.
      Details on these questionnaires can be found in Appendix Table 2 in Supplemental Materials found at https://doi.org/10.1016/j.jval.2021.07.011. Here, the levels of capabilities (shown behind every statement) were presented. Participants rated the experienced difficulty of completing the ICECAP-A on a 4-point scale (ranging from 1 “very easy” to 4 “very difficult”). Then, based on experiences from the pilot, an explanation of the best–worst scaling task was given with an example of 1 completed profile. The explanation and bestworst scaling task can be found in the Appendix Table 3 in Supplemental Materials found at https://doi.org/10.1016/j.jval.2021.07.011.

       Statistical Analyses

       Best–worst pairs table

      Firstly, a table was constructed with all possible bestworst pairs. In other words, a count was made of how often, for example, stability at level 1 was chosen as best, whereas attachment at level 1 was chosen as worst, which resembled 1 of the 320 possible bestworst pairs. The margins of the table provided an initial understanding of the perceived importance to quality of life of the 20 capability levels. Moreover, the table allowed inspection of the frequencies of unlikely choices (eg, attributes presented at level 4 chosen as worst or attributes presented at level 1 chosen as best), providing insight into the quality of the data.

       Best-minus-worst scores

      Second, best-minus-worst scores for participants showed individual preferences for capability levels and were used to estimate choice consistency. Within the OMEP design (and its foldover), each capability level was presented 4 times. The best-minus-worst score for 1 capability level, then, equaled the times that a participant picked that capability level as best minus the times it was picked as worst. This resulted in 20 best-minus-worst scores ranging from −4 (0 times picked as best and 4 times picked as worst) to +4 (4 times picked as best and 0 times picked as worst). Next, for each individual, the sum of squares for each capability was used to calculate the empirical scale parameter (ESP), which gave an indication of the consistency with which a participant made choices. An ESP (ranging from 0 to 8) of approximately 4 was considered normal for a participant who understood the task and made consistent choices.
      • Flynn T.N.
      • Huynh E.
      • Peters T.J.
      • et al.
      Scoring the ICECAP-A capability instrument: estimation of a UK general population tariff.
      Participants with a suspicious answering pattern on the bestworst scaling task, identified by differing more than 2 standard deviations (SDs) from the average ESP, were excluded from analyses concerning the tariff development. Table 1 depicts a set of best-minus-worst scores of a participant to illustrate the calculations.
      Table 1Best-minus worst scores for 1 of the participants.
      CapabilityLevelBest-minus-worst scoreNormalized (×1/4) and squaredSum of squares
      Stability1−30.561.38
      200
      320.25
      430.56
      Attachment1−20.250.88
      2−10.06
      300
      430.56
      Autonomy1−10.060.38
      2−20.25
      310.06
      400
      Achievement1−10.060.19
      200
      310.06
      410.06
      Enjoyment1−30.561.44
      2−20.25
      310.06
      430.56
      ESP4.25
      ESP indicates empirical scale parameter.

       Scale-adjusted latent class analysis

      Latent Gold 5.1 software was used for scale-adjusted latent class (SALC) analysis (Statistical Innovations, Arlington, MA). These analyses can distinguish individuals with different preferences (ie, preference heterogeneity) by adding preference classes and also individuals with similar preferences but with different choice consistency (ie, scale heterogeneity) by adding scale classes.

      Magidson J, Vermunt JK. Removing the scale factor confound in multinomial logit choice models to obtain better estimates of preference 1. In Sawtooth Software Conference Proceedings; 2007:139-154.

      Although SALC models are not the only option to model both preference and scale heterogeneity, they are widely used and unique in estimating separate classes with differing preferences.
      • Groothuis-Oudshoorn C.G.M.
      • Flynn T.N.
      • Yoo H.I.
      • Magidson J.
      • Oppe M.
      Key issues and potential solutions for understanding healthcare preference heterogeneity free from patient-level scale confounds.
      As new preference classes are added to the model, the software uses the data to predict the probability for an individual to fall within a certain class. Each class has its own parameters (comparable with regression coefficients) for each of the 20 capability levels of the ICECAP-A, where parameters further away from 0 signify greater importance (ie, are more often chosen as best or worst than other capability levels). Effects coding was used with level 4 of enjoyment as reference level. Adding more classes to a model will often improve the fit, but a balance between fit and interpretability is warranted. Nevertheless, there are no clear guidelines for choosing 1 model over another. Therefore, we chose to follow a pragmatic approach by, on one hand, minimizing the Bayesian information criterion (BIC) and, on the other hand, looking for a solution with classes that were clearly separable. Apart from adding classes, it is possible to add scale classes to separately target scale heterogeneity.
      • Vass C.M.
      • Wright S.
      • Burton M.
      • Payne K.
      Scale heterogeneity in healthcare discrete choice experiments: a primer.
      For people in the same class but in a different scale class, parameters of capability levels showed a similar pattern, but were scaled. The scaling factor was smaller than 1 if they were less consistent or larger than 1 if they were more consistent in making bestworst choices. Additionally, to account for possible heteroscedasticity (ie, allow a different scale factor) between best and worst choices, a dummy variable indicating a worst choice was added as scale predictor to the estimated models. Finally, multiple starting seeds were used when estimating the SALC model to verify the stability of the solution.
      In the final model, the relative attribute importance within each class gave an indication of the preferences of participants in that class. Attribute importance was calculated for the 5 attributes in all classes by dividing the parameter range of 1 ICECAP-A attribute (ie, the difference between level 1 and level 4 parameters of an attribute) by the sum of 5 attribute parameter ranges.

       ICECAP-A tariff

      After identifying the preferred model, the parameters of each class and scale class were weighted by the size of the class (ie, the probability that a participant falls into that particular class) by calculating the product of the raw parameters and the group probability. Finally, adding the weighted parameters for every capability level across groups resulted in 20 parameters that, when linearly transformed to range from 0 (ie, level [1] for all capabilities) to 1 (ie, level [4] for all capabilities), constituted the final tariff.

      Results

       Participants

      In total, 1002 participants completed the online questionnaire. The distribution of the ESP can be found in Appendix Table 4 in Supplemental Materials found at https://doi.org/10.1016/j.jval.2021.07.011. The ESP differed 2 SDs from the mean (4.04 [SD = 1.18]) for 69 participants (40 below and 29 above the mean). Visual inspection confirmed that these participants had suspicious answering patterns (eg, always choosing stability as best and enjoyment as worst, regardless of the level on which they were presented) suggesting they did not understand the task or did not take it seriously. These participants were excluded, leaving 933 participants for analyses. Excluding these participants did not influence representativeness of the sample (see Appendix Table 5 in Supplemental Materials found at https://doi.org/10.1016/j.jval.2021.07.011) or the balance between randomization to version 1 and 2 of the bestworst scaling task (50.1% vs 49.9%) and had a small effect on quality of the data (see Appendix Table 6 in Supplemental Materials found at https://doi.org/10.1016/j.jval.2021.07.011). The questionnaire took on average 14.2 minutes (SD = 28.9; range 3.8-618.4) to complete. One participant for whom completion time was 5692 minutes was not included in this calculation. There were no missing data. Table 2 presents participant characteristics. The sample was highly representative of the general Dutch population in terms of age, gender, region, and income (see Appendix Table 5 in Supplemental Materials found at https://doi.org/10.1016/j.jval.2021.07.011). Most participants found the ICECAP-A very easy or easy to complete (93.9%).
      Table 2Frequencies (%) and means (standard deviations) of participant characteristics.
      VariableCategorySample mean(N = 933)
      Age48.9 (17.1)
      GenderFemale (%)479 (51.3)
      Male (%)453 (48.6)
      Other (%)1 (0.1)
      ICECAP-ACapability value
      Values reflect scores based on the Dutch population tariff.
      0.88 (0.13)
      ICECAP-A difficultyVery easy (%)469 (50.5)
      Easy (%)407 (43.6)
      Hard (%)55 (5.9)
      Very hard (%)2 (0.2)
      EQ-5D-5LIndex scores
      Values reflect scores based on the Dutch population tariff.
      0.86 (0.20)
      ESP4.07 (0.95)
      Note. Values represent mean values with standard deviations in parentheses unless indicated otherwise.
      ESP indicates empirical scale parameter.
      Values reflect scores based on the Dutch population tariff.

       Best–Worst Pairs Table

      The number of times each of the 320 bestworst pairs was chosen across all participants is presented in Table 3. The last column indicates how often a capability at a certain level is chosen as best, whereas the last row indicates how often a capability at a certain level is chosen as worst. For example, the capability attachment presented at level 4 (“I can have a lot of love, friendship and support”) was chosen 1772 times (11.9% of best choices) as best and 229 times (1.5% of worst choices) as worst across all profiles that participants completed. The table suggests that high levels of stability, attachment, and, to a lesser extent, autonomy and enjoyment were often chosen as best, whereas high levels of achievement were infrequently chosen as best (9.7%, 11.9%, 7.2%, and 8.7%, respectively, vs 3.5%). For worst choices, preferences appeared less explicit, with low levels of stability, attachment, autonomy, achievement, and enjoyment all frequently chosen as worst (10.5%, 9.8%, 9.3%, 8.3%, and 10.1%, respectively).
      Table 3Best–worst pairs frequencies.
      BestWorstTotal% (best choices)
      StabilityAttachmentAutonomyAchievementEnjoyment
      Level12341234123412341234
      Stability1xxxx1514912141198161513142281081981.33
      2xxxx641918255622261833283729552727134973.33
      3xxxx207471917134733456171114435519378243012958.67
      4xxxx17961271515915146331346610482184151252614439.67
      Attachment155910xxxx981312282223171471051971.32
      28029187xxxx39282230556042421072414146114.09
      3165761833xxxx2529851481911588177149781821151410.14
      42331433024xxxx1691213447170174113912361432420177211.87
      Autonomy179109135710xxxx9121310961151450.97
      27213121749241516xxxx35242725972218194853.25
      3199871939148401815xxxx83606136121101232710777.21
      4141562521101119913xxxx791095447133111312710767.21
      Achievement196931711181265811xxxx46541340.90
      260147193118101918141015xxxx44101183082.06
      3874571713238201849181916xxxx105321286234.17
      4656011146454121459511417xxxx402910125263.52
      Enjoyment11455685867105411457xxxx1100.74
      25623151859147105022222334162033xxxx4222.83
      31946023162195851421186243791704143xxxx11927.98
      41801342933162602913162823642911336354xxxx13038.73
      Total15677652472861468587231229139480037341712311065740662151383327324729856100.00
      % (worst choices)10.505.121.651.929.833.931.551.539.345.362.502.798.257.134.964.4310.145.581.831.65100.00
      Note. Based on N = 933. Row margins indicate best choice frequencies and column margins indicate worst choice frequencies.

       SALC Estimates

      A 3-preference class 2-scale class model with worst choice as a scale predictor was considered optimal (df = 871; BIC = 68 992; R2 = 0.25). A 3-preference class was chosen because a third class added a substantial group with interpretable differences compared with a 2-preference class model (df = 894; BIC = 71 166; R2 = 0.19). Adding a fourth class resulted in 1 relatively small group that did not provide clear discrimination between already existing preference classes (df = 854; BIC = 69 224; R2 = 0.25). Two scale classes were added because they improved the fit of the model considerably. Adding a third scale class reduced both the fit and the interpretability of the model. All attribute parameters for participants in the second scale class were estimated to be 0.29 times those of participants in the first scale class, with most participants (58.1%) predicted to be in the first scale class. Finally, adding worst choice as a scale predictor increased the fit of the model and seemed relevant to control for the questionnaire design (where participants could pick the best and worst choice in whatever order they preferred). Indeed, the scaling factor for worst choices compared with best choices was 0.68 (P<.001). This suggests that participants switched the order of making best and worst choices throughout the bestworst scaling task, strengthening the choice to correct for questionnaire design by adding worst choice as a predictor in the model. Relatedly, a strong linear relation between the amount of best choices and the inverse of worst choices across each of the 20 capability levels was found (r = 0.97, R-squared = 0.95), indicating that best and worst data were proportional and can likely be pooled for analyses. A summary of the results on all estimated models can be found in Appendix Table 7 in Supplemental Materials found at https://doi.org/10.1016/j.jval.2021.07.011.
      A table with attribute importance, based on the parameters from Table 4, can be found in Appendix Table 8 in Supplemental Materials found at https://doi.org/10.1016/j.jval.2021.07.011. Participants in preference class 1, containing 40.2% of the sample, showed little variation in attribute importance with stability, attachment, autonomy, achievement, and enjoyment accounting for 0.23, 0.20, 0.21, 0.17, and 0.20 of the space, respectively. Participants in class 2, containing 30.3% of the sample, were characterized by a very low preference for achievement (.02) with high preferences for the other 4 capabilities. Class 3 contained 29.5% of the participants and was distinguished by a high preference for attachment (0.30) and enjoyment (0.27) while indicating low importance of autonomy (0.14) and especially achievement (.09). For the total sample, the attribute importance for stability, attachment, autonomy, achievement, and enjoyment weighted by class size was 0.22, 0.24, 0.19, 0.13, and 0.22, respectively.
      Table 4Final model parameters and Dutch general population ICECAP-A tariffs.
      Class probabilityClass 1

      sClass 1
      Class 1

      sClass 2
      Class 2

      sClass 1
      Class 2

      sClass 2
      Class 3

      sClass 1
      Class 3

      sClass 2
      Final Dutch tariff
      0.23370.16860.17610.12700.17120.1234
      Coefficient (SE)Coefficient (SE)Coefficient (SE)Coefficient (SE)Coefficient (SE)Coefficient (SE)
      Stability [1]−5.84 (.33)−1.71 (.04)−0.68 (.14)−0.20 (.20)−3.86 (.13)−1.13 (.08)−0.0073
      Stability [2]−1.02 (.12)−0.30 (.03)0.52 (.04)0.15 (.12)−0.63 (.10)−0.19 (.04)0.1061
      Stability [3]3.17 (.18)0.93 (.03)1.37 (.07)0.40 (.13)1.97 (.10)0.58 (.05)0.2007
      Stability [4]4.22 (.20)1.23 (.03)1.34 (.09)0.39 (.14)2.09 (.10)0.61 (.05)0.2163
      Attachment [1]−5.11 (.31)−1.50 (.04)−0.71 (.13)−0.21 (.20)−4.45 (.13)−1.30 (.09)−0.0035
      Attachment [2]−0.83 (.11)−0.24 (.03)0.76 (.04)0.22 (.11)0.43 (.10)0.13 (.03)0.1223
      Attachment [3]2.71 (.18)0.79 (.03)1.56 (.07)0.46 (.13)3.49 (.11)1.02 (.06)0.2118
      Attachment [4]3.80 (.19)1.11 (.03)1.59 (.09)0.47 (.15)4.17 (.12)1.22 (.08)0.2344
      Autonomy [1]−5.30 (.32)−1.55 (.04)−1.21 (.13)−0.35 (.18)−3.07 (.12)−0.90 (.08)0.0027
      Autonomy [2]−0.90 (.11)−0.26 (.03)0.33 (.04)0.10 (.11)−0.78 (.10)−0.23 (.03)0.1043
      Autonomy [3]2.69 (.16)0.79 (.03)0.97 (.06)0.29 (.12)0.87 (.11)0.26 (.04)0.1784
      Autonomy [4]3.88 (.19)1.14 (.04)0.69 (.09)0.20 (.17)0.86 (.13)0.25 (.05)0.1920
      Achievement [1]−4.41 (.29)−1.29 (.04)−1.80 (.11)−0.53 (.17)−2.56 (.13)−0.75 (.06)0.0143
      Achievement [2]−0.90 (.12)−0.26 (.04)−1.69 (.04)−0.49 (.12)−0.94 (.14)−0.28 (.04)0.0813
      Achievement [3]1.76 (.14)0.52 (.04)−1.54 (.05)−0.45 (.12)0.08 (.13)0.02 (.03)0.1308
      Achievement [4]2.90 (.19)0.85 (.04)−1.63 (.07)−0.48 (.14)0.02 (.13)0.00 (.04)0.1451
      Enjoyment [1]−5.25 (.31)−1.53 (.04)−1.10 (.13)−0.32 (.19)−4.14 (.12)−1.21 (.08)−0.0063
      Enjoyment [2]−1.64 (.14)−0.48 (.03)0.09 (.05)0.03 (.12)0.07 (.12)0.02 (.04)0.1001
      Enjoyment [3]2.57 (.17)0.75 (.04)0.59 (.07)0.17 (.12)2.87 (.13)0.84 (.06)0.1932
      Enjoyment [4]
      Used as reference level.
      3.48 (.19)1.02 (.04)0.54 (.08)0.16 (.14)3.53 (.13)1.03 (.07)0.2122
      Note. Scale factor sClass 2 compared with sClass 1 = 0.2925.
      sClass indicates scale Class.
      Used as reference level.

       ICECAP-A Tariff for the General Dutch Population

      Table 4 shows the coefficients for the different preference classes and scale classes, together with the tariff. The capability value can be deduced from the tariff by adding the values for the corresponding score. For example, a change in an ICECAP-A score of [12211] to [44323] would result in a change in capability value of 0.6762: from 0.2274 (−0.0073 + 0.1223 + 0.1043 + 0.0143 − 0.0063) to 0.9036 (0.2163 + 0.2344 + 0.1784 + 0.0813 + 0.1932). The capability value was scaled to range from 0 [11111] to 1 [44444].
      In the chosen model, the capability attachment on level 4 was valued as most desirable (parameter = 2.28; tariff = 0.2344) and capability stability on level 1 as least desirable (parameter = −2.60; tariff = −0.0073) to one’s quality of life. The largest increase in capability equals 0.1258 and is obtained when going from attachment level 1 (“cannot have any love, friendship, and support”) to level 2 (“can have a little love, friendship, and support”). The average difference between capability levels was 0.0667. The largest relative importance was ascribed to attachment, accounting for 22.3% of the possible improvement, whereas achievement received the lowest preference, accounting for 13.1% of the possible improvement. In general, the capabilities stability, attachment, and enjoyment seem to be somewhat more important to quality of life than autonomy and achievement. In addition, improvements within a capability from a low level to a higher level (eg, going from level 1 to 2) yielded larger increases in capability value than improving attributes that were already moderate to high (eg, going from level 3 to 4).
      Explorative analyses were conducted after developing the tariff to investigate what aspects of quality of life are important for different people. Details on these explorative analyses can be found in Appendix Table 9 in Supplemental Materials found at https://doi.org/10.1016/j.jval.2021.07.011.

      Discussion

      This study aimed to develop a tariff for the ICECAP-A based on a large representative sample from the general Dutch population (N = 933). The tariff shows that the 5 capabilities described in the ICECAP-A all contribute to quality of life. The capabilities stability, attachment, and enjoyment were somewhat more important than autonomy, and achievement contributed the least to quality of life. Going from 1 level to the next within an attribute does not have a linear effect on the tariff. Indeed, improving capabilities from low to moderate levels rather than from moderate to high is more valuable according to the current sample. Consequently, prioritizing to help people with low capabilities might result in larger well-being gains for society as a whole. This relates to the concept of “sufficient capability,” an approach with the aim to maximize the number of people above a level of sufficient capability.
      • Goranitis I.
      • Coast J.
      • Day E.
      • Copello A.
      • Freemantle N.
      • Frew E.
      Maximizing health or sufficient capability in economic evaluation? A methodological experiment of treatment for drug addiction.
      ,
      • Mitchell P.M.
      • Roberts T.E.
      • Barton P.M.
      • Coast J.
      Assessing sufficient capability: a new approach to economic evaluation.
      Most study findings are similar to those reported for the UK tariff.
      • Flynn T.N.
      • Huynh E.
      • Peters T.J.
      • et al.
      Scoring the ICECAP-A capability instrument: estimation of a UK general population tariff.
      It is to be expected that Dutch and UK populations have comparable preferences. Nevertheless, it is interesting to note that the Dutch sample seems to value high levels of enjoyment more and high levels of achievement less compared with the UK sample. This difference in preferences was also apparent in the European Values Study,
      European Values Study. Atlas of European Values.
      where 95.6% of Dutch respondents indicated that leisure time is important in their lives compared with 91.9% of their UK counterparts. More strikingly, 81.0% of UK respondents indicated that the feeling to achieve something is an important aspect of a job, whereas this was only the case for 62.4% of the Dutch respondents. Consequently, interventions that increase the ability to enjoy life might have a slightly greater impact on quality of life in The Netherlands than the United Kingdom. Capturing these differences between countries in tariffs is important because they might ultimately influence funding decisions.
      • Kiadaliri A.A.
      • Eliasson B.
      • Gerdtham U.G.
      Does the choice of EQ-5D tariff matter? A comparison of the Swedish EQ-5D-3L index score with UK, US, Germany and Denmark among type 2 diabetes patients.

       Strengths and Limitations

      The SALC model used to find clusters of participants with similar answer patterns is a flexible model that enables the modeling of both preference and scale heterogeneity, resulting in a parsimonious model. The BIC was used to determine the final model. Nevertheless, this measure tends to overstate the number of preference classes,
      • Groothuis-Oudshoorn C.G.M.
      • Flynn T.N.
      • Yoo H.I.
      • Magidson J.
      • Oppe M.
      Key issues and potential solutions for understanding healthcare preference heterogeneity free from patient-level scale confounds.
      so the final model was also based on interpretability and face validity, inevitably introducing subjective judgment. Another choice was to use case 2 (profile) bestworst scaling to establish participant preferences on the ICECAP-A. It must be noted that although bestworst scaling tasks might be more statistically efficient than discrete choice experiments, estimates of preferences seem to be similar across methods
      • Whitty J.A.
      • Gonçalves A.S.O.
      A systematic review comparing the acceptability, validity and concordance of discrete choice experiments and best–worst scaling for eliciting preferences in healthcare.
      and evidence on the burden on participants is mixed.
      • Flynn T.N.
      • Louviere J.J.
      • Peters T.J.
      • Coast J.
      Best–worst scaling: what it can do for health care research and how to do it.
      • Himmler S.
      • Soekhai V.
      • van Exel J.
      • Brouwer W.
      What works better for preference elicitation among older people? Cognitive burden of discrete choice experiment and case 2 best-worst scaling in an online setting.
      • Mühlbacher A.C.
      • Kaczynski A.
      • Zweifel P.
      • Johnson F.R.
      Experimental measurement of preferences in health and healthcare using best-worst scaling: an overview.
      A strength of the study was the recruitment of a large sample to develop the tariff.
      Several limitations were also present. First, people with lower education were somewhat underrepresented because the assessment was online and education was not included in the quotations. Additionally, the sample was slightly under representative of the 75- to 99-year age group. Possibly, this is related to a difficulty of finding participants in this age group with access to the internet. These differences between the sample and the Dutch population might have influenced the tariff slightly. Second, a pilot was conducted to identify problems and to assess the difficulty of the best–worst task, which led to significant improvements in explanations in the questionnaire. Nevertheless, the final questionnaire was completed online with no guidance making it impossible to check how participants interpreted the questions. At least 69 participants did not understand the task or take it seriously and were excluded from analyses, but it is realistic to assume that more participants struggled with the questionnaire. Indeed, the margins of best–worst pairs table reveal that in the remaining sample 12% of worst choices were a capability presented at level 4 and 5% of best choices were a capability presented at level 1. This is strange considering all profiles presented to participants had balanced capability levels with some capabilities presented at a high level and others at a low level. Nevertheless, because the conducted analyses could account for scale heterogeneity and the sample was large with the majority seeming to understand the task, it is expected that the current results still reflect preferences on quality of life of the Dutch general population accurately.

       Use in Economic Analyses

      To be able to compare (economic) benefits across interventions, it is necessary to consider both the effectiveness (ie, quality of life) and life extension (ie, quantity of life). Conceptually, it is difficult to interpret the capability value derived from tariffs of the ICECAP-A in the context of health economics and cost-utility analyses and in comparison with QALYs.
      • Coast J.
      • Smith R.D.
      • Lorgelly P.
      Welfarism, extra-welfarism and capability: the spread of ideas in health economics.
      ,
      • Cookson R.
      QALYs and the capability approach.
      The capability value is not a QALY because the lowest value is not anchored to “death,” but to “no capability.” Nevertheless, death is accounted for in the sense that death is associated with no capability even though the reverse is not necessarily true (eg, consider a state in which capabilities are nonexistent or a state of unconsciousness).
      • Coast J.
      • Flynn T.N.
      • Natarajan L.
      • et al.
      Valuing the ICECAP capability index for older people.
      Consequently, capability values have a meaningful anchor (ie, no capability) and can be adjusted for time, by estimating gains in years lived with full capability.
      • Flynn T.N.
      • Huynh E.
      • Peters T.J.
      • et al.
      Scoring the ICECAP-A capability instrument: estimation of a UK general population tariff.
      Therefore, they can be used in economic evaluations in a similar way as QALYs. Although applied similarly, the ICECAP-A measures a related but distinct concept compared with generic health questionnaires.
      • Afentou N.
      • Kinghorn P.
      A systematic review of the feasibility and psychometric properties of the ICEpop CAPability measure for adults and its use so far in economic evaluation.
      This suggests that the ICECAP-A is not a substitute, but rather a complement to generic health questionnaires,
      • Engel L.
      • Mortimer D.
      • Bryan S.
      • Lear S.A.
      • Whitehurst D.G.T.
      An investigation of the overlap between the ICECAP-A and five preference-based health-related quality of life instruments.
      ,
      • Keeley T.
      • Coast J.
      • Nicholls E.
      • Foster N.E.
      • Jowett S.
      • Al-Janabi H.
      An analysis of the complementarity of ICECAP-A and EQ-5D-3 L in an adult population of patients with knee pain.
      as is also advocated by the National Institute for Health and Care Excellence social care guidelines.
      National Institute for Health and Care Excellence
      The Social Care Guidance Manual.
      Accordingly, the instrument seems to be especially suitable and valuable in contexts outside the traditional healthcare model, such as general well-being, social care, mental health,
      • Mitchell P.M.
      • Al-Janabi H.
      • Byford S.
      • et al.
      Assessing the validity of the ICECAP-A capability measure for adults with depression.
      ,
      • Goranitis I.
      • Coast J.
      • Day E.
      • et al.
      Measuring health and broader well-being benefits in the context of opiate dependence: the psychometric performance of the ICECAP-A and the EQ-5D-5L.
      public health, and chronic illness. Indeed, the Dutch guidelines for conducting economic evaluations in healthcare recommend the use of the ICECAP when considering interventions aimed at improving general well-being.

      Conclusion

      This study developed a tariff for the ICECAP-A based on a large Dutch general population. This makes the ICECAP-A ready for use in economic evaluations in The Netherlands. The instrument is expected to be a valuable addition to other generic health questionnaires, especially when evaluating interventions outside the traditional health intervention model.

      Article and Author Information

      Author Contributions: Concept and design: Rohrbach, Dingemans, Essers, Van Furth, Van den Akker-Van Marle
      Acquisition of data: Rohrbach, Van den Akker-Van Marle
      Analysis and interpretation of data: Rohrbach, Dingemans, Groothuis-Oudshoorn, Van Til, Van den Akker-Van Marle
      Drafting of the manuscript: Rohrbach, Dingemans, Groothuis-Oudshoorn, Essers, Van Furth
      Critical revision of the paper for important intellectual content: Rohrbach, Dingemans, Groothuis-Oudshoorn, Van Til, Essers, Van Furth, Van den Akker-Van Marle
      Statistical analysis: Rohrbach, Groothuis-Oudshoorn, Van Til
      Obtaining funding: Rohrbach, Dingemans, Van den Akker-Van Marle
      Administrative, technical, or logistic support: Dingemans
      Supervision: Dingemans, Van Furth, Van den Akker-Van Marle
      Conflict of Interest Disclosures: Dr Rohrbach reported receiving grants from ZonMw and Stichting Zorg & Zekerheid during the conduct of the study. Dr Dingemans reported receiving grants from Stichting Zorg & Zekerheid during the conduct of the study. No other disclosures were reported.
      Funding/Support: This work was supported by grant ST.2019-24 from Stichting Zorg & Zekerheid and by grant 636310001 from ZonMw .
      Role of the Funder/Sponsor: The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

      Acknowledgment

      Advice on the design and best–worst scaling task was provided by Elisabeth Huynh of the Australian National University, Department of Health Services Research and Policy.

      Supplemental Material

      References

        • EuroQol Group
        EuroQol--a new facility for the measurement of health-related quality of life.
        Health Policy. 1990; 16: 199-208
        • Brazier J.
        • Usherwood T.
        • Harper R.
        • Thomas K.
        Deriving a preference-based single index from the UK SF-36 Health Survey.
        J Clin Epidemiol. 1998; 51: 1115-1128
        • Byford S.
        • Sefton T.
        Economic evaluation of complex health and social care interventions.
        Natl Inst Econ Rev. 2003; 186: 98-108
        • Carr-Hill R.A.
        Assumptions of the QALY procedure.
        Soc Sci Med. 1989; 29: 469-477
        • Coast J.
        Is economic evaluation in touch with society’s health values?.
        BMJ. 2004; 329: 1233-1236
        • Pietersma S.
        • van den Akker-Van Marle M.E.
        • De Vries M.
        Generic quality of life utility measures in health-care research: conceptual issues highlighted for the most commonly used utility measures.
        Int J Wellbeing. 2013; 3: 173-181
        • Mitchell P.M.
        • Al-Janabi H.
        • Byford S.
        • et al.
        Assessing the validity of the ICECAP-A capability measure for adults with depression.
        BMC Psychiatry. 2017; 17: 46
        • Sen A.
        Inequality Reexamined.
        Clarendon Press, Oxford, UK1992
        • Sen A.
        Capability and well-being.
        in: Nussbaum M. Sen A. The Quality of Life. Clarendon Press, Oxford, UK1993
        • Al-Janabi H.
        • Flynn T.N.
        • Coast J.
        Development of a self-report measure of capability wellbeing for adults: the ICECAP-A.
        Qual Life Res. 2012; 21: 167-176
        • Flynn T.N.
        • Huynh E.
        • Peters T.J.
        • et al.
        Scoring the ICECAP-A capability instrument: estimation of a UK general population tariff.
        Health Econ. 2015; 24: 258-269
        • Al-Janabi H.
        • Peters T.J.
        • Brazier J.
        • et al.
        An investigation of the construct validity of the ICECAP-A capability measure.
        Qual Life Res. 2013; 22: 1831-1840
        • Afentou N.
        • Kinghorn P.
        A systematic review of the feasibility and psychometric properties of the ICEpop CAPability measure for adults and its use so far in economic evaluation.
        Value Health. 2020; 23: 515-526
        • Engel L.
        • Mortimer D.
        • Bryan S.
        • Lear S.A.
        • Whitehurst D.G.T.
        An investigation of the overlap between the ICECAP-A and five preference-based health-related quality of life instruments.
        Pharmacoeconomics. 2017; 35: 741-753
        • Keeley T.
        • Coast J.
        • Nicholls E.
        • Foster N.E.
        • Jowett S.
        • Al-Janabi H.
        An analysis of the complementarity of ICECAP-A and EQ-5D-3 L in an adult population of patients with knee pain.
        Health Qual Life Outcomes. 2016; 14: 36
        • Zorginstituut Nederland
        Richtlijn voor het uitvoeren van economische evaluaties in de gezondheidszorg.
        Zorginstituut Nederland, Diemen2015
        • Yang J.
        • Johnson F.R.
        • Kilambi V.
        • Mohamed A.F.
        Sample size and utility-difference precision in discrete-choice experiments: a meta-simulation approach.
        J Choice Modell. 2015; 16: 50-57
        • Potoglou D.
        • Burge P.
        • Flynn T.N.
        • et al.
        Best–worst scaling vs. discrete choice experiments: an empirical comparison using social care data.
        Soc Sci Med. 2011; 72: 1717-1727
        • Flynn T.N.
        • Louviere J.J.
        • Peters T.J.
        • Coast J.
        Using discrete choice experiments to understand preferences for quality of life.
        Soc Sci Med. 2010; 70: 1957-1965
      1. Magidson J, Vermunt JK. Removing the scale factor confound in multinomial logit choice models to obtain better estimates of preference 1. In Sawtooth Software Conference Proceedings; 2007:139-154.

        • Groothuis-Oudshoorn C.G.M.
        • Flynn T.N.
        • Yoo H.I.
        • Magidson J.
        • Oppe M.
        Key issues and potential solutions for understanding healthcare preference heterogeneity free from patient-level scale confounds.
        Patient. 2018; 11: 463-466
        • Vass C.M.
        • Wright S.
        • Burton M.
        • Payne K.
        Scale heterogeneity in healthcare discrete choice experiments: a primer.
        Patient. 2018; 11: 167-173
        • Goranitis I.
        • Coast J.
        • Day E.
        • Copello A.
        • Freemantle N.
        • Frew E.
        Maximizing health or sufficient capability in economic evaluation? A methodological experiment of treatment for drug addiction.
        Med Decis Mak. 2017; 37: 498-511
        • Mitchell P.M.
        • Roberts T.E.
        • Barton P.M.
        • Coast J.
        Assessing sufficient capability: a new approach to economic evaluation.
        Soc Sci Med. 2015; 139: 71-79
      2. European Values Study. Atlas of European Values.
        • Kiadaliri A.A.
        • Eliasson B.
        • Gerdtham U.G.
        Does the choice of EQ-5D tariff matter? A comparison of the Swedish EQ-5D-3L index score with UK, US, Germany and Denmark among type 2 diabetes patients.
        Health Qual Life Outcomes. 2015; 13: 145
        • Whitty J.A.
        • Gonçalves A.S.O.
        A systematic review comparing the acceptability, validity and concordance of discrete choice experiments and best–worst scaling for eliciting preferences in healthcare.
        Patient. 2018; 11: 301-317
        • Flynn T.N.
        • Louviere J.J.
        • Peters T.J.
        • Coast J.
        Best–worst scaling: what it can do for health care research and how to do it.
        J Health Econ. 2007; 26: 171-189
        • Himmler S.
        • Soekhai V.
        • van Exel J.
        • Brouwer W.
        What works better for preference elicitation among older people? Cognitive burden of discrete choice experiment and case 2 best-worst scaling in an online setting.
        J Choice Modell. 2021; 38100265
        • Mühlbacher A.C.
        • Kaczynski A.
        • Zweifel P.
        • Johnson F.R.
        Experimental measurement of preferences in health and healthcare using best-worst scaling: an overview.
        Health Econ Rev. 2016; 6: 2
        • Coast J.
        • Smith R.D.
        • Lorgelly P.
        Welfarism, extra-welfarism and capability: the spread of ideas in health economics.
        Soc Sci Med. 2008; 67: 1190-1198
        • Cookson R.
        QALYs and the capability approach.
        Health Econ. 2005; 14: 817-829
        • Coast J.
        • Flynn T.N.
        • Natarajan L.
        • et al.
        Valuing the ICECAP capability index for older people.
        Soc Sci Med. 2008; 67: 874-882
        • National Institute for Health and Care Excellence
        The Social Care Guidance Manual.
        National Institute for Health and Care Excellence (NICE), London, UK2016
        • Goranitis I.
        • Coast J.
        • Day E.
        • et al.
        Measuring health and broader well-being benefits in the context of opiate dependence: the psychometric performance of the ICECAP-A and the EQ-5D-5L.
        Value Health. 2016; 19: 820-828