If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Centre for Heart Lung Innovation, St. Paul’s Hospital, and Division of Respiratory Medicine, Department of Medicine, University of British Columbia, Vancouver, BC, Canada
1 Dr Quittner was an employee of Behavioral Health Systems Research at the time the study was completed, but is currently an employee of The Cystic Fibrosis and Pulmonary Center at Joe DiMaggio Children’s Hospital, Hollywood, FL, USA.
Cost-effectiveness analysis (CEA) may guide reimbursement decisions for cystic fibrosis (CF) treatments; nevertheless, standard measures of utility used in CEA are insensitive to changes in lung function and health-related quality of life in CF.
•
We used blinded Cystic Fibrosis Questionnaire-Revised (CFQ-R) data from 4 clinical trials to develop the CFQ-R-8 dimensions (CFQ-R-8D), a new, CF disease–specific, preference-based utility measure with 8 dimensions (physical functioning, vitality, emotion, role functioning, breathing difficulty, cough, abdominal pain, and body image).
•
The CFQ-R-8D enables disease-specific utilities for use in CEA to be generated from the CFQ-R, a measure that is widely used in CF trials.
Abstract
Objectives
Cystic fibrosis (CF) limits survival and negatively affects health-related quality of life (HRQOL). Cost-effectiveness analysis (CEA) may be used to make reimbursement decisions for new CF treatments; nevertheless, generic utility measures used in CEA, such as EQ-5D, are insensitive to meaningful changes in lung function and HRQOL in CF. Here we develop a new, CF disease–specific, preference-based utility measure based on the adolescent/adult version of the Cystic Fibrosis Questionnaire-Revised (CFQ-R), a widely used, CF-specific, patient-reported measure of HRQOL.
Methods
Blinded CFQ-R data from 4 clinical trials (NCT02347657, NCT02392234, NCT01807923, and NCT01807949) were used to identify discriminating items for a classification system using psychometric (eg, factor and Rasch) analyses. Thirty-two health states were selected for a time trade-off (TTO) exercise with a representative sample of the UK general population. TTO utilities were used to estimate a preference-based scoring algorithm by regression analysis (tobit models with robust standard errors clustered on participants with censoring at −1).
Results
A classification system with 8 dimensions (CFQ-R-8 dimensions; physical functioning, vitality, emotion, role functioning, breathing difficulty, cough, abdominal pain, and body image) was generated. TTO was completed by 400 participants (mean age, 47.3 years; 49.8% female). Among the regression models evaluated, the tobit heteroscedastic–ordered model was preferred, with a predicted utility range from 0.236 to 1, no logical inconsistencies, and a mean absolute error of 0.032.
Conclusion
The CFQ-R-8 dimensions is the first disease-specific, preference-based scoring algorithm for CF, enabling estimation of disease-specific utilities for CEA based on the well-validated and widely used CFQ-R.
Cystic fibrosis (CF) affects > 80 000 people worldwide and is the most common life-threatening autosomal recessive disorder in populations of Northern European ancestry, with an overall incidence of 1 in 3500 in European countries.
Up-to-date and projected estimates of survival for people with cystic fibrosis using baseline characteristics: a longitudinal study using UK patient registry data.
Prevalence of depression and anxiety in patients with cystic fibrosis and parent caregivers: results of the International Depression Epidemiological Study across nine countries.
Many clinical and demographic characteristics have been associated with HRQOL in CF, including occurrence of pulmonary exacerbations, disease severity, sex, and socioeconomic status.
The CFQ-R has 12 dimensions: physical functioning, emotional functioning, social functioning/school functioning, body image, eating problems, treatment burden, respiratory symptoms, digestive symptoms, vitality, health perceptions, weight, and role functioning.
Three versions of the CFQ-R have been developed: 1 for parents/caregivers to proxy report for children aged 6 to 13 years, 1 that can be interviewer administered for children aged 6 to 11 years or self-completed for children aged 12 or 13 years, and an adolescent/adult version for those aged ≥ 14 years.
Determination of the minimal clinically important difference scores for the Cystic Fibrosis Questionnaire-Revised respiratory symptom scale in two populations of patients with cystic fibrosis and chronic Pseudomonas aeruginosa airway infection.
As the version with the broadest target population, this study focuses on the adolescent/adult version.
Cost-effectiveness analyses, a framework used by some health technology appraisal agencies to evaluate novel healthcare interventions, require measures of HRQOL in the form of health state utilities to generate quality-adjusted life-years, which combine the value of HRQOL with the length of life into a single index number. Health state utilities are typically generated from preference-based measures that use preference-elicitation techniques such as time trade-off (TTO), standard gamble, or discrete choice experiment to assign a value, anchored at 0 for dead and 1 for perfect health, to health states described by the underlying classification system.
a generic preference-based measure of HRQOL, comprising 5 dimensions (mobility, self-care ability, ability to undertake usual activities, pain and discomfort, and anxiety and depression).
The EQ-5D-3L lacks sensitivity to meaningful differences in lung function and HRQOL among people with CF, with individuals self-reporting mean utility of 0.923 for mild and 0.870 for severe lung function impairment,
and utilities estimated from mapping to the EQ-5D-3L showed limited ability to discriminate between groups classified based on lung function in a disease largely characterized by respiratory symptoms.
Given this observed lack of sensitivity of EQ-5D in CF, an alternative approach to estimating utilities is required. Utilities generated from disease-specific measures that are sensitive to change, such as the CFQ-R, have the potential to effectively capture disease-relevant concepts. Nevertheless, given that the CFQ-R is not preference based, it cannot be used directly to generate utilities. Here we derive the first preference-based scoring algorithm to generate utilities from CFQ-R data.
Methods
The study was conducted in 5 stages using methods previously described to estimate a preference-based measure from the Short Form-36 dimension survey
Developing and testing methods for deriving preference-based measures of health from condition-specific measures (and other patient-based measures of outcome).
The 5 stages were (1) assessing the dimensional structure of the CFQ-R using factor and Rasch analyses, (2) identifying suitable items for the health state classification system using classical psychometric analyses, (3) using clinical and participant input to assess the face validity of the CFQ-R items and dimensions selected in stage 2, (4) valuation of the health states by members of the general public, and (5) developing the scoring algorithm for the classification system using regression modeling. The first 3 stages used existing clinical trial data (described below), whereas the latter 2 stages used primary data collected for this study. Rasch analysis was conducted using RUMM2030 (RUMMlab Pty Ltd, Perth, Australia)
The CFQ-R (adult and adolescent version) includes 50 items assessing 12 dimensions scored on 4-point Likert scales, including frequency (always to never), intensity (a great deal to not at all), difficulty (a lot of difficulty to no difficulty), and true-false (very true to very false). Dimensions are scored as the average across all items and rescaled on a 0 to 100 scale where higher scores indicate better HRQOL.
Clinical Trial Data
To facilitate selection of dimensions and items, we used data from 4 randomized, double-blind, placebo-controlled clinical trials designed to evaluate novel CF medications that included the adult and adolescent version of the CFQ-R (EVOLVE [NCT02347657], EXPAND [NCT02392234], TRAFFIC [NCT01807923], and TRANSPORT [NCT01807949]); full descriptions of trial designs have been published.
In brief, the trials enrolled participants aged ≥ 12 years with CF homozygous for the F508del-CFTR mutation or heterozygous for F508del-CFTR and a residual function mutation, who were randomized to active treatment versus placebo. The primary outcome was percent predicted forced expiratory volume in 1 second (ppFEV1), a measure of lung function. Only participants who were administered the CFQ-R adult and adolescent version (ie, those aged ≥ 14 years) were included in this analysis. Three trials had a 24-week intervention period and were used for the main analysis; the EXPAND trial, a crossover trial with 2 intervention periods of 8 weeks, was used to replicate the main-item selection analysis. All analyses were conducted by analysts blinded to treatment assignment to ensure item selection was driven by item performance independent of treatment effect. Data included clinical outcomes and other patient-reported measures, such as ppFEV1, number of pulmonary exacerbations, and the patient-reported Cystic Fibrosis Respiratory Symptom Diary (CFRSD).
to identify potential health dimensions and their associated items. Factor analysis can be used to identify dimensional structures, whereas Rasch models allow unidimensional estimates of item location and ability to be made. Results from factor analysis were assessed based on eigenvalues > 1 (including review of scree plots), assessment of contribution of items to each factor and whether they contributed > 1 factor (range 0-1 with higher values indicating greater contribution), and assessment of measurement error based on uniqueness where a value > 0.6 indicated that an item may reflect other information not captured in the dimension (see Supplemental Methods found at https://doi.org/10.1016/j.jval.2022.12.002 for further details). Rasch analysis was undertaken for identified items in each factor to assess whether all items fit based on assessment of the residuals to identify potential divergence and assessment of local dependency (ie, where there was > 1 item measuring the same construct in the factor) (see Supplemental Methods found at https://doi.org/10.1016/j.jval.2022.12.002 for further details). Items excluded at this stage were those that did not contribute to the identified factors or that showed evidence of local dependency or divergence in the Rasch analysis. Items that were optional and those relating to general health were excluded.
Item Selection
To identify items best representing each dimension, a combination of classical psychometric analysis and Rasch analysis was used (see Supplemental Methods found at https://doi.org/10.1016/j.jval.2022.12.002 for further details on item selection methods). Classical psychometric criteria were applied to each item in the CFQ-R using the following metrics: level of missing data, distribution of response across categories (floor and ceiling effects), correlation of item to its own dimension, and responsiveness (standardized response mean [SRM]) to change over time based on improved ppFEV1.
Rasch analysis was used to assess the performance of individual items. Items that did not fit the model, did not cover the full range of severity, had disordered response choices, or had differential item functioning were candidates for exclusion from the health state classification system.
Item wording was required to be suitable for TTO valuation (eg, responses such as “somewhat true” vs “somewhat false” were not concrete enough, and items that combine concepts, such as rating walking function by level of tiredness, were not sufficiently independent).
Assessment of Face Validity of the Classification System
The face validity of the proposed items and dimensions was assessed in interviews with clinicians and individuals with CF to ensure that selected items and dimensions were important, were relevant, and represented dimensions that may change after an effective treatment. Four clinicians practicing in Australia, Canada, the United Kingdom, and the United States and 5 individuals with CF (2 from the United Kingdom, 2 from Australia, and 1 from Canada) participated in the validation process.
Health State Selection and Valuation
Not all possible health states from the classification system could be valued because of the many possible combinations of items; therefore, a subset of health states was valued and used to model the utilities for the complete classification system. An orthogonal array was generated using IBM SPSS statistics version 21, which selected 32 health states for valuation, including the best state (“full health”). The “full health” state was anchored on 1 as the combination of items in the CFQ-R classification system in which no problems were recorded in any dimension, leaving 31 states to be valued. The worst health state was valued by all participants. Each state was valued by multiple respondents; nevertheless, asking respondents to value all states would be excessively burdensome. Therefore, the states were allocated to 4 sets containing mild, moderate, and severe health states, with each respondent valuing 1 set of 8 or 9 health states. To avoid bias, no reference to CF was made in the interview. Once the health states for valuation were selected, cognitive debriefing interviews were conducted with 5 members of the UK general population to evaluate face validity of the states and understanding of the task.
The valuation sample was recruited from a UK general population research panel, aiming to reflect the most recent UK census population demographics.
The interviews were conducted face to face by 20 trained interviewers across 5 regions of the United Kingdom (Birmingham, Glasgow, Manchester, London, Swansea) in 2018. At the start of the interview, participants were shown the items and an example health state. Afterward, participants completed visual analog scale tasks and TTO tasks, first for 2 practice health states and then for the assigned CFQ-R health state set. If the interviewer felt a participant did not understand or engage with the practice task, they did not continue to complete the main study health state exercise. For the visual analog scale task, participants were asked to rate the presented health states, plus the best state and “dead,” from 0 (very worst or least preferred) to 100 (very best) to familiarize themselves with the states they were valuing. Participants then completed the TTO, a standardized interview method for valuing health states to generate utility estimates.
The method was designed to determine the point at which participants considered 10 years in the target health state to be equivalent (or they were indifferent) to the prospect of x years in full health. Time in full health was varied between high and low values, changing by 6-month intervals, until this point of indifference was reached. If a participant indicated that they believed that being dead was preferable to any time living in a health state, the interviewer switched to a lead-time TTO exercise,
which asked participants whether they would prefer to live for 10 years in full health followed by 10 years in a health state or to live for x years of full health (where x < 10). This lead-time procedure allowed the participant to trade more years of life to determine how much worse than dead they considered the health state to be and to estimate a utility below zero (worse than dead). Participants also completed sociodemographic information, experience of illness (their own or family and friends), and EQ-5D-5L (scored using UK population weights
Before conducting the valuation study, the study protocol was reviewed and determined exempt from ethical review requirements by an independent review board in the United States (Western Institutional Review Board); nevertheless, an informed consent was still collected before interview participation.
Development of Scoring Algorithm
Data were reviewed before analysis and flagged for exclusion if responses were considered to reflect either a lack of understanding or engagement or if there were inconsistencies (ie, more severe states were given higher utilities than milder states). Responses were flagged if participants assigned (1) the same value for all states (except where all values are 1 [ie, full health]), (2) the worst state the highest TTO value, and (3) all states values representing worse than dead (< 0). Estimation was undertaken with exclusions and robustness of results assessed the full data set without exclusions.
To produce utilities for every health state defined by the classification system, the utilities were modeled using regression analysis. The standard specification was:
where i = 1,2…n represented individual health states and j = 1,2…m represented respondents. The dependent variable was TTO disutility (1-TTO value) for health state i valued by respondent j, and was a vector of dummy explanatory variables for each level λ of dimension of the classification system. Level λ = 1 acted as a baseline for each dimension, and was the error term. The error term could be divided into uj the individual random effect and eij the random error term for the ith health state valuation of the jth individual. There was no constant term because full health was defined as best CFQ-R health state (ie, 11111111).
Models were estimated using tobit models with robust standard errors clustered on participants with censoring at −1. This accounts for censored data at −1; respondents may want to assign a value below −1, but this was not possible in the LT-TTO protocol used. Use of robust standard errors clustered on participants accounted for multiple observations and had an impact on the standard errors but not the coefficients. Random effects (RE) tobit models were estimated to take into account differences at the individual level given that these models appropriately dealt with the structure of the data in which each respondent had multiple observations.
Mean-level models using tobit with robust standard errors clustered on participants were also estimated, given that they reduced the impact of outliers in health state utilities present in individual-level data. Tobit models that accounted for heteroscedasticity were also estimated given that TTO data typically have larger variance for more severe states. A test for heteroscedasticity in the linear model confirmed this. Inconsistent coefficients for adjacent severity levels of a dimension (eg, moving from “sometimes” to “often” experiencing a health problem) where health deterioration leads to a higher utility were contrary to expectations. To address this issue, models were also estimated that merged inconsistent adjacent severity levels to remove these inconsistencies to ensure that a health deterioration leads to a lower utility score.
Performance of regression models was assessed using the number of significant (P < .05) and nonsignificant coefficients, the consistency of the coefficients with the classification system, and mean absolute error (MAE) at the health state level. MAE was generated using the difference between observed and predicted utilities at the health state level, and models with a lower MAE were preferred. The Akaike information criterion and Bayesian information criterion were also examined, with lower values indicating a more preferred model. Predicted values, observed values, and errors by health states were plotted and examined for patterns that would indicate predictive inconsistency across states. The final model was selected through consideration of both logical consistency of coefficients and predictive performance.
Results
Selection of CFQ-R Classification System
Baseline participant demographics and clinical characteristics are presented in Table 1. The 3 participant data sets used for the main analysis (EVOLVE, TRAFFIC, and TRANSPORT) were similar in terms of age, sex distribution, and ppFEV1, whereas EXPAND had a higher mean age and slightly higher percentage of female participants.
Table 1Baseline demographics and clinical characteristics of participants in each clinical trial (pooled treatment arms).
As summarized in Table 2, factor analysis (Appendix Tables 1 and 2 and Appendix Fig. 1 in Supplemental Materials found at https://doi.org/10.1016/j.jval.2022.12.002) identified 10 independent factors. Rasch and psychometric analyses plus consideration of item wording reduced these to 8 independent factors and the item pool for consideration from 50 items to 12 items: physical functioning (1 item, phys4), vitality (2 items, vital9 and vital11), emotional functioning (2 items, emot7 and emot12), role functioning (2 items, role36 and role37), respiratory symptoms (2 items, resp41 and resp45), body image (1 item, body26), digestive symptoms (1 item, digest49), and treatment burden (1 item, treat15). Results from the psychometric and Rasch analyses are presented in Appendix Tables 3 to 13 and Appendix Figure 2 in Supplemental Materials found at https://doi.org/10.1016/j.jval.2022.12.002.
Table 2Summary of findings from factor, psychometric, and Rasch analysis and assessment of wording.
Items labeled based on original measure; prefix represents original dimension, and number represents order in the CFQ-R: phys, physical functioning; vital, vitality; emotion, emotional functioning; role, role functioning; social, social functioning; resp, respiratory symptoms; eat, eating problems; body, body image; digest, digestive symptoms; weight, weight; treat, treatment burden; health, health perceptions.
Item question
Response options
Dimension
Psychometrics
Rasch
Wording
For consideration
Physical functioning
phys1
Performing vigorous activities such as running or playing sports
A lot, some, a little, no difficulty
Physical
üü
ü
X
X
phys2
Walking as fast as others
A lot, some, a little, no difficulty
Physical
üü
X
X
X
phys3
Carrying or lifting heavy things such as books, shopping, or school bags
∗ Items labeled based on original measure; prefix represents original dimension, and number represents order in the CFQ-R: phys, physical functioning; vital, vitality; emotion, emotional functioning; role, role functioning; social, social functioning; resp, respiratory symptoms; eat, eating problems; body, body image; digest, digestive symptoms; weight, weight; treat, treatment burden; health, health perceptions.
† Included in the health state classification.
‡ Item was not included for consideration of dimension because it is a global item and cannot be included in a classification system.
In summary, the psychometric analysis found that the level of missing data in the data sets was low; therefore, this criterion was not used for item selection. Evidence of ceiling effects was observed for some items, but none had floor effects. Most items (all but 1) had strong correlations (r > 0.5) with their related dimension score across all the dimensions. SRMs were generally very small (≤ 0.2), indicating that the items were not responsive. The main exceptions were in the respiratory symptoms dimension, where response means were small to moderate in the confirmatory analysis conducted on the EXPAND data for all respiratory symptoms items (> 0.2 to 0.6). Correlations between the items and ppFEV1, number of pulmonary exacerbations, and body mass index were generally weak and were therefore not used to support item selection. The CFRSD was mainly focused on respiratory symptoms, with 1 item on tiredness. Correlations between the CFRSD and CFQ-R respiratory symptoms items ranged from r = 0.37 to 0.63. Correlations between the CFRSD and CFQ-R vitality items ranged from r = 0.43 to 0.52.
Of the 12 items identified for consideration, 9 were selected for clinician and participant validation. The role functioning “impact on daily activities” item (role36) was selected over the “goals” item (role37), and vitality “exhaustion” (vital11) was selected over “tiredness” (vital9) because of the more concrete concepts referenced in the former items, despite the role36 item not having the strongest psychometric performance of all role items. Furthermore, the treatment burden item was removed because of conceptual overlap with the role functioning item. Given that worry (emot7) and sadness (emot12) were both considered relevant and had the same response options, these items were combined to represent the emotion dimension (in line with the EQ-5D anxiety/depression dimension). Finally, the response option wording for the body image item (body26; very true, somewhat true, somewhat false, very false) was judged to be conceptually complex and was therefore dichotomized to a true or false response.
Assessment of Face Validity
The selection of the 9 of 12 items outlined earlier was endorsed by all individuals with CF and clinicians, and all proposed items in the classification outlined in Appendix Table 14 in Supplemental Materials found at https://doi.org/10.1016/j.jval.2022.12.002 (see example health state in Fig. 1) were considered valid and relevant by CF clinicians and participants interviewed. Because cough and breathing difficulties were judged to be relatively independent, these items were treated as independent concepts for the valuation, resulting in 8 dimensions: physical functioning, vitality, emotion, role functioning, breathing difficulty, cough, abdominal pain, and body image.
Figure 1Example health state. Example of 1 health state; 31 health states were evaluated.
After cognitive debriefing interviews (n = 5) to check the understanding and interpretability of the health states, a total of 400 TTO interviews were conducted (see Appendix Figure 3 in Supplemental Materials found at https://doi.org/10.1016/j.jval.2022.12.002). Of the total sample, 14 TTO interviews were excluded for valuing all states as identical (but not at full health); 38 respondents were excluded for valuing the worst health state as equivalent to their highest TTO value; and 7 respondents were excluded for valuing all health states as worse than being dead. The main analyses focused on 345 respondents with robustness analyses using the full sample; data for the practice states were not analyzed.
The analysis sample did not meaningfully differ, based on measured characteristics, from the overall sample and was comparable with the most recent UK population census data in terms of sex, age, and ethnicity (Table 3
The TTO values for all health states that were directly valued are presented in Appendix Table 15 in Supplemental Materials found at https://doi.org/10.1016/j.jval.2022.12.002. Mean and median values did not vary substantially across states, apart from the best and worst health states that were valued. The SDs were large, indicating variation in the individual valuations.
Development of the Scoring Algorithm
Estimates of preference weights based on tobit, RE tobit, and mean-level models using tobit with and without accounting for heteroscedasticity are presented in Table 4. Models with a constant term were also tested, but the constants were not statistically significant and did not improve the model fit statistics. The coefficients were all positive as expected, indicating that less than full health resulted in an increase in disutility. Regression coefficients were logically consistent in most dimensions (ie, disutility values increased as severity increased), but where the levels within a dimension were disordered (eg, levels 3 and 4 for role functioning in the RE tobit model), levels were combined to generate a logically consistent (ordered) model.
Table 4Models estimating preference weights based on disutility.
Dimension level
Tobit
Tobit ordered
RE tobit
RE tobit ordered
Mean tobit
Mean tobit ordered
Tobit het
Tobit het ordered
Disutility
Physical functioning: reference—you have no difficulty climbing 1 flight of stairs
AIC indicates Akaike information criterion; BIC, Bayesian information criterion; het, heteroscedasticity; MAE, mean absolute error; RE, random effects.
All models better predicted mean disutility for each health state at the more severe end than at the milder end, indicating a relationship between error and predictive ability (Fig. 2). MAEs ranged from 0.025 to 0.039. Across the 31 health states, 5 to 12 had MAE > 0.05, and only 1 was > 0.1, indicating small levels of error at the health state level and thus good predictive ability (Table 4).
Figure 2Mean observed and predicted disutility by health state.
The final preferred model, selected through consideration of logical consistency of coefficients, predictive performance, and the ability to reflect variation at the individual level, was the tobit heteroscedastic–ordered model, with a predicted range of health state utilities from 0.236 to 1 for the CFQ-R-8-dimension scoring algorithm (CFQ-R-8D). Based on the worst level, the dimension with the largest disutility was breathing difficulty (0.1268) whereas body image had the smallest disutility (0.028). The Stata code used to generate utility values is provided in the Supplement found at https://doi.org/10.1016/j.jval.2022.12.002.
Results based on the full sample had more nonsignificant coefficients (3 compared with 1 in the tobit heteroscedastic model; Appendix Table 16 in Supplemental Materials found at https://doi.org/10.1016/j.jval.2022.12.002) although this was reversed for other models (eg, tobit model), and there was also some evidence of slightly more inconsistencies.
Discussion
Here we describe the development of the CFQ-R-8D, a novel preference-based scoring algorithm that allows utilities to be estimated for use in economic evaluations based on patient-reported CFQ-R data. The final classification system has 8 dimensions: physical functioning, vitality, emotion, role functioning, breathing difficulty, cough, abdominal pain, and body image, representing a broad range of the HRQOL impacts in CF included in the CFQ-R adolescent/adult version and endorsed as meaningful and relevant by people with CF and clinicians.
In TTO interviews, most of the states valued were considered to be better than dead, with only 5.5% of the TTO values < 0 and 6% at 0. These results are comparable with the recent EQ-5D-5L England valuation, where 5.1% were valued as worse than dead, and the US valuation where 5.1% were valued at 0.
not all participants were shown both the worse than dead and better than dead procedure; participants were only shown the worse than dead procedure if their preferences took them there. It is unknown whether this may have affected responses, but it is difficult to reason why knowing how a health state is valued as worse than dead would affect participants’ responses in the better than or worse than dead TTO choice. Overall, a few outliers were seen in the TTO data, but at the health state level, mean TTO values did not vary widely, with most values between 0.5 and 0.6. This lack of variability may reflect the mix of severity levels across dimensions in each health state that was valued. It may also reflect the challenge of valuing 8 dimensions using TTO; nevertheless, international protocols for longer and more complex approaches have been successfully implemented (eg, EORTC QLU-C10D).
The tobit models that were estimated all had coefficients with the expected sign and most were statistically significant at the 10% level, but all had inconsistencies. The tobit heteroscedastic–ordered model was selected because it addressed the problem of heteroscedasticity and only had 1 inconsistency. The values ranged from 0.236 to 1, which is a smaller range than the UK EQ-5D-3L (−0.594 to 1).
Before development of the CFQ-R-8D, a study was undertaken to estimate utilities in CF based on a mapping algorithm linking CFQ-R data to the 3-level version of the EQ-5D (EQ-5D-3L).
Evidence from this study suggested that the EQ-5D-3L may not be sensitive to meaningful changes in health status in the CF population. The core respiratory dimension of the CFQ-R was not a significant predictor of EQ-5D-3L utility and thus not included in the mapping algorithm. Perhaps not surprisingly, utilities estimated from the algorithm showed limited ability to discriminate between groups classified based on lung function. The CFQ-R-8D reflects a broad range of CF-specific health dimensions included in the CFQ-R, which is a well-validated and widely used measure to evaluate treatment benefit in CF. Although the CFQ-R-8D uses a subset of the CFQ-R items in the scoring algorithm, as is generally necessary for scoring algorithms, this specificity does not suggest that the full CFQ-R should not be administered. Capturing the full impact of CF on HRQOL provides important evidence outside economic evaluation.
Limitations of the current work should be highlighted. Four trial data sets were used to select items for the classification system, and although the use of multiple data sets and larger combined sample size was advantageous in this context, all 4 samples included clinical trial participants for whom severity of CF may have been different from that in a typical CF population because of study inclusion criteria. In addition, most items demonstrated only small SRMs, the notable exceptions being the respiratory items, and thus may not reflect their performance in other CF populations. As such, the items selected here ideally should be validated in another setting, such as a registry or observational study. The TTO sample was drawn from the UK population to reflect UK societal values as recommended by agencies, including the National Institute for Health and Care Excellence.
Assessment of safety and efficacy of long-term treatment with combination lumacaftor and ivacaftor therapy in patients with cystic fibrosis homozygous for the F508del-CFTR mutation (PROGRESS): a phase 3, extension study.
To use this classification system in another country, it may be desirable to repeat the TTO valuation and algorithm estimation with a local population; nevertheless, UK valuations for utility measures may be acceptable where local weights are unavailable.
Identification, review, and use of health state utilities in cost-effectiveness models: an ISPOR Good Practices for Outcomes Research Task Force report.
Interviewers were trained, but there were no specific built-in interviewer quality checks during the data collection process, and the number of interviewers and variability in number of interviews conducted may have affected data quality. Notably, 5 interviewers did not record values < 0, but because they equally did not record many values at zero (1 interviewer recorded 3 values at zero, whereas the others recorded none), this was not considered in the modeling. Therefore, data quality issues were addressed using exclusions where poor understanding in the TTO was indicated by the TTO responses. A substantial number of TTO participants (n = 55; 13.8%) were excluded from the analysis because of low data quality, suggesting some participants had difficulty with the valuation exercises that was not identified in the cognitive interviews or by the interviewer. In addition, some respondents may have found it difficult to assess 8 dimensions that may have led to heuristics such as focusing on some of the dimensions and not others. Nevertheless, the full sample and analysis TTO set were comparable, based on measured demographic characteristics, and the analysis set was comparable with the most recent UK population census data. Given the number of health states required for the TTO valuation, each participant valued only 1 of the 4 blocks of health states, which may have added systematic variability; nevertheless, having participants value all states would have risked data quality, and the approach taken here reflects that which was used in previous valuations (eg, EQ-5D). Finally, evidence is required about how the new measure compares to measures such as EQ-5D in the context of economic evaluation and quality-adjusted life-year estimation.
Conclusions
The CFQ-R-8D allows direct estimation of CF-specific utilities from the CFQ-R, a well-validated measure that is used widely in CF clinical trials and clinical practice, thus enabling utilities for use in cost-effectiveness analyses to be generated from any existing or future CFQ-R data set. The ability to adequately capture the HRQOL in this population using a metric suitable for economic evaluation is essential to demonstrating the potential benefit and value of new CF treatments. An evaluation of the psychometric performance of the CFQ-R-8D compared with the generic EQ-5D-3L and Short Form-6 Dimension survey is ongoing.
PR1 comparison of the psychometric performance of a new condition-specific preference-based measure derived from the CFQ-R (CFQ-R-8D) to EQ-5D-3L and SF-6D to evaluate health-related quality-of-life (HRQOL) in people with cystic fibrosis (CF).
Analysis and interpretation of data: Acaster, Mukuria, Rowen, Brazier, Quon, Duckers, Quittner, Lou, Sosnay, McGarry
Drafting of the manuscript: Acaster, Mukuria, Rowen, Wainwright, Quon, Duckers, Quittner, Sosnay
Critical revision of the paper for important intellectual content: Mukuria, Rowen, Brazier, Wainwright, Quon, Duckers, Quittner, Sosnay, McGarry
Statistical analysis: Acaster, Mukuria, Lou
Supervision: McGarry
Obtainingfunding: Brazier, McGarry
Administrative, technical, or logisticsupport: Quittner
Other (provision of CFQ-R): Quittner
Conflict of Interest Disclosures: All authors reported receiving nonfinancial support (assistance with manuscript preparation) from ArticulateScience LLC, which received funding from Vertex Pharmaceuticals Incorporated. Ms Acaster reported employment with Acaster Lloyd Consulting Ltd, which received payment from Vertex Pharmaceuticals Incorporated for their contribution to the design, management, and undertaking of all aspects of the study reported in this manuscript; in addition, Dr Acaster is an associate editor for Value in Health and had no role in the peer-review process of this article. Dr Mukuria reported receiving grants from Vertex Pharmaceuticals and membership in the EuroQol Group Association. Dr Rowen reported receiving grants from Vertex Pharmaceuticals Incorporated. Dr Wainwright reported advisory board membership for Vertex Pharmaceuticals Incorporated; consulting fees from Vertex Pharmaceuticals Incorporated; editor duties for Respirology (associate editor) and Thorax (deputy editor); honoraria from BMJ, DKBmed, Gilead, In Vivo Academy, Novartis, Thorax, University of Miami, and Vertex Pharmaceuticals Incorporated; research grant from Novo Nordisk; being a study investigator for Boehringer Ingelheim and Vertex Pharmaceuticals Incorporated; and travel expenses from Vertex Pharmaceuticals Incorporated. Dr Quon reported receiving grants and personal fees from Vertex Pharmaceuticals Incorporated. Dr Duckers reported receiving personal fees from Vertex Pharmaceuticals Incorporated, Chiesi Pharmaceuticals, and Insmed. Dr Quittner reported research grants from the NIH, CF Foundation, and American Cochlear Implant Alliance and consulting fees from Insmed and Vertex Pharmaceuticals Incorporated. Drs Lou, Sosnay, and McGarry reported employment with Vertex Pharmaceuticals Incorporated and may own stock or stock options in that company. Dr Quittner was an employee of Behavioral Health Systems Research at the time the study was completed but is currently an employee of the Cystic Fibrosis and Pulmonary Center at Joe DiMaggio Children’s Hospital (Hollywood, FL). No other disclosures were reported.
Funding/Support: This study was supported by Vertex Pharmaceuticals Incorporated.
Role of theFunder/Sponsor: Vertex Pharmaceuticals Incorporated contributed to the design of the study and the review of the manuscript. Data collection and analysis were undertaken by Acaster Lloyd Consulting Ltd, which received funding from Vertex Pharmaceuticals.
Up-to-date and projected estimates of survival for people with cystic fibrosis using baseline characteristics: a longitudinal study using UK patient registry data.
Prevalence of depression and anxiety in patients with cystic fibrosis and parent caregivers: results of the International Depression Epidemiological Study across nine countries.
Determination of the minimal clinically important difference scores for the Cystic Fibrosis Questionnaire-Revised respiratory symptom scale in two populations of patients with cystic fibrosis and chronic Pseudomonas aeruginosa airway infection.
Developing and testing methods for deriving preference-based measures of health from condition-specific measures (and other patient-based measures of outcome).
Assessment of safety and efficacy of long-term treatment with combination lumacaftor and ivacaftor therapy in patients with cystic fibrosis homozygous for the F508del-CFTR mutation (PROGRESS): a phase 3, extension study.
Identification, review, and use of health state utilities in cost-effectiveness models: an ISPOR Good Practices for Outcomes Research Task Force report.
PR1 comparison of the psychometric performance of a new condition-specific preference-based measure derived from the CFQ-R (CFQ-R-8D) to EQ-5D-3L and SF-6D to evaluate health-related quality-of-life (HRQOL) in people with cystic fibrosis (CF).