Advertisement

EQ-5D-5L versus EQ-5D-3L: The Impact on Cost Effectiveness in the United Kingdom

Open ArchivePublished:October 17, 2017DOI:https://doi.org/10.1016/j.jval.2017.09.004

      Abstract

      Objectives

      To model the relationship between the three-level (3L) and the five-level (5L) EuroQol five-dimensional questionnaire and examine how differences have an impact on cost effectiveness in case studies.

      Methods

      We used two data sets that included the 3L and 5L versions from the same respondents. The EuroQol Group data set (n = 3551) included patients with different diseases and a healthy cohort. The National Data Bank data set included patients with rheumatoid disease (n = 5205). We estimated a system of ordinal regressions in each data set using copula models to link responses of the 3L instrument to those of the 5L instrument and its UK tariff, and vice versa. Results were applied to nine cost-effectiveness studies.

      Results

      Best-fitting models differed between the EuroQol Group and the National Data Bank data sets in terms of the explanatory variables, copulas, and coefficients. In both cases, the coefficients of the covariates and latent factors between the 3L and the 5L instruments were significantly different, indicating that moving between instruments is not simply a uniform re-alignment of the response levels for most dimensions. In the case studies, moving from the 3L to the 5L caused a decrease of up to 87% in incremental quality-adjusted life-years gained from effective technologies in almost all cases. Incremental cost-effectiveness ratios increased, often substantially. Conversely, one technology with a significant mortality gain saw increased incremental quality-adjusted life-years.

      Conclusions

      The 5L shifts mean utility scores up the utility scale toward full health and compresses them into a smaller range, compared with the 3L. Improvements in quality of life are valued less using the 5L than using the 3L. The 3L and the 5L can produce substantially different estimates of cost effectiveness. There is no simple proportional adjustment that can be made to reconcile these differences.

      Keywords

      Introduction

      The EuroQol five-dimensional questionnaire (EQ-5D) comprises a descriptive system of health-related quality of life and associated tariffs or “utility” scores. The descriptive system covers five dimensions of health: mobility, ability to self-care, ability to undertake usual activities, pain/discomfort, and anxiety/depression. The original version of the EQ-5D allows respondents to indicate the degree of impairment on each dimension according to three levels (no problems, some problems, and extreme problems). This is the three-level EQ-5D (3L). The five-level EQ-5D (5L) is a new version of the instrument, which includes five levels of severity for each dimension (no problems, slight problems, moderate problems, severe problems, and extreme problems), with the intention of improving the instrument’s sensitivity and reducing ceiling effects [
      • Herdman M.
      • Gudex C.
      • Lloyd A.
      • et al.
      Development and preliminary testing of the new five-level version of EQ-5D (EQ-5D-5L).
      ]. Tariffs are anchored around 1 for full health and 0 for states considered equivalent to death, on the basis of estimates from samples of the general population. For the 3L version, these tariffs were based on a time trade-off valuation method. In the United Kingdom, these tariffs range from 1 for full health to −0.594. Thirty-five percent (84 of 243) of the health states are valued with a negative score. There is a gap between full health and the next level of impairment valued at 0.883. Tariffs for the 5L version are now available for England [
      • Devlin N.
      • Shah K.
      • Feng Y.
      • et al.
      Valuing health-related quality of life: an EQ-5D-5L value set for England.
      ], Canada [
      • Xie F.
      • Pullenayegum E.
      • Gaebel K.
      • et al.
      A time trade-off-derived value set of the EQ-5D-5L for Canada.
      ], Japan [
      • Shiroiwa T.
      • Ikeda S.
      • Noto S.
      • et al.
      Comparison of value set based on DCE and/or TTO data: scoring for EQ-5D-5L health states in Japan.
      ], Uruguay [
      • Augustovski F.
      • Rey-Ares L.
      • Irazola V.
      • et al.
      An EQ-5D-5L value set based on Uruguayan population preferences.
      ], The Netherlands [
      • Versteegh M.M.
      • Vermeulen K.M.
      • Evers S.M.
      • et al.
      Dutch tariff for the five-level version of EQ-5D.
      ], and Korea [
      • Kim S.H.
      • Ahn J.
      • Ock M.
      • et al.
      The EQ-5D-5L valuation study in Korea.
      ]. The valuation methods for the 5L used a combination of updated “lead time” time trade-off methods and discrete-choice experiments [

      Feng Y, Devlin N, Shah K, et al. New methods for modelling EQ-5D-5L value sets: an application to English data. Available from: https://www.ohe.org/publications/new-methods-modelling-eq-5d-5l-value-sets-application-english-data. [Accessed May 23, 2017].

      ]. In England, this has led to a smaller range of values (from 1 to −0.281), a smaller gap at the upper end of the distribution (0.951 is the next score after 1), and fewer values less than 0 (153 of 3125 [5%]).
      The EQ-5D is one of the most widely used instruments underpinning economic evaluations conducted in terms of cost-per quality-adjusted life-year (QALY), calculated from the tariff scores. It is therefore essential to understand the implications of using the new 5L version of the instrument compared with using the 3L version. This article provides information on how the two versions of the EQ-5D relate to each other, using the UK/English tariffs. It should be noted that this 5L tariff could be subject to change as it progresses through the peer review process. We used two data sets in which respondents filled in both the 3L and the 5L instruments. We estimated the joint distribution of responses to the two instruments. This model was then used in nine cost-effectiveness studies to compare results when using directly observed 3L values with estimated 5L results.

      Methods

       Data

      We used two data sets. The first was provided by the EuroQol Group (the EQG data). Between August 2009 and September 2010, the EuroQol Group coordinated and partly funded a data collection study. Its main aim was to collect data on both versions of the EQ-5D, the 3L and the 5L, to compare them in terms of their measurement properties and to generate an interim value set for the 5L using a mapping (or cross-walk) approach. The questionnaire introduced the 5Lversion first, followed by a few background questions (age, sex, education, etc.), and then the 3Lversion, the EQ-5D visual analogue scale, a set of five dimension-specific rating scales, and finally the World Health Organization (five) Well-Being Index. The study was carried out in six countries, Denmark, England, Italy, The Netherlands, Poland, and Scotland, and included eight broad patient groups (cardiovascular disease, respiratory disease, depression, diabetes, liver disease, personality disorders, arthritis, and stroke) and a student cohort (healthy population). Each country used the official EQ-5D language versions, and data were mainly collected through specialist hospitals/centers and patient recruitment agencies. All countries used paper-and-pencil questionnaires, apart from England, which used an online version. In all countries, except Italy, sampling methods ensured a wide range of severity across all the 5L and 3L dimensions.
      The National Data Bank (NDB) for rheumatic diseases is a register of patients with rheumatoid disease, primarily recruited by referral from US and Canadian rheumatologists. Information supplied by participants is validated by direct reference to records held by hospitals and physicians. (A minority of cases come by self-referral, with medical details obtained by the NDB in the same way.) Full details of the recruitment process are given by Wolfe and Michaud [
      • Wolfe F.
      • Michaud K.
      The National Data Bank for rheumatic diseases: a multi-registry rheumatic disease data bank.
      ]. The EQ-5D responses and other patient-supplied data are collected by various means, primarily postal and Web-based questionnaires completed directly by patients. Data collection began in 1998 and continues to the present, in waves administered in January and July of each year. In 2011, there was a switch from the 3L to the 5L version of the EQ-5D and both versions were included in the January 2011 wave. The NDB questionnaire is 27 pages long and it includes many general as well as specific questions on rheumatoid arthritis. The 5L and 3L versions are on pages 11 and 22 of the questionnaire, respectively. This wave is used to estimate the model.

       Statistical Analysis

      The aim is to estimate the relationship between the two instruments. Hernandez and Pudney [
      • Hernandez Alava M.
      • Pudney S.
      Econometric modelling of multiple self-reports of health states: The switch from EQ-5D-3L to EQ-5D-5L in evaluating drug therapies for rheumatoid arthritis.
      ] have previously developed a flexible model that allows analysis of the joint responses to the 3L and 5L versions. Full details are provided there. Responses to the 3L and 5L versions are ordinal. The model reflects this in a system of 10 ordinal regressions, each of which is used to estimate the response level for one of the two versions of the EQ-5D conditional on covariates (age and sex), arranged into the five health domains. The model reflects any tendency for an individual to give more or less positive responses across domains via a latent factor representing background response behavior. A copula approach is used to specify the bivariate distribution of each “3L, 5L” pair of responses. This captures the strong association between 3L and 5L responses within each health domain, without necessarily assuming that the strength of the association is the same in all parts of the health distribution. For example, someone who has experienced extreme pain may answer the pain questions in a more focused and coherent way than someone without experience of chronic pain. Five different copulas were examined (Gaussian, Clayton, Frank, Gumbel, and Joe), which reflect different types and strengths of dependence at different parts of the distribution, with the data informing the most appropriate final choice of copula.
      Such statistical models are sensitive to the distributional assumptions, the usual one being normality. Mis-specification of the joint residual distribution may lead to significant bias in the estimated coefficients of the covariates, in addition to giving a distorted picture of the dependence. For this reason, mixture distributions are used to allow for non-normality in the residuals and the latent factor representing the individual’s response behavior.

       Cost-Effectiveness Case Studies

      We used the copula mapping models in nine cost-effectiveness case studies. All were economic evaluations based on individual patient-level data using the 3L version. We made a pragmatic decision in selecting case studies. We sought collaborators who had previously completed suitable studies using the 3L instrument and who were willing and able to replicate their study substituting predicted utility scores for the 5L instrument using a bespoke Stata command (StataCorp, College Station, TX). The included studies are as follows:
      • 1.
        The Combination of Anti-Rheumatic Drugs in Early Rheumatoid Arthritis (CARDERA) trial was a double-blind, factorial designed, placebo-controlled randomized trial that compared the benefits of adding cyclosporine, high-dose step-down prednisolone, or both to methotrexate monotherapy [
        • Choy E.H.S.
        • Smith C.M.
        • Farewell V.
        • et al.
        Factorial randomised controlled trial of glucocorticoids and combination disease modifying drugs in early rheumatoid arthritis.
        ]. The 3L version was administered to patients at baseline and at 6, 12, 18, and 24 months [
        • Wailoo A.
        • Hernandez Alava M.
        • Scott I.
        • et al.
        Cost-effectiveness of treatment strategies using combination disease-modifying anti-rheumatic drugs and glucocorticoids in early rheumatoid arthritis.
        ].
      • 2.
        The Cost-effectiveness of Aphasia Computer Treatment Compared to Usual Stimulation (CACTUS) pilot randomized controlled trial tested the feasibility of comparing self-managed computer therapy combined with usual stimulation (such as participation in normal language stimulation activities and support groups) with usual stimulation alone in people with aphasia [
        • Latimer N.R.
        • Dixon S.
        • Palmer R.
        Cost-utility of self-managed computer therapy for people with aphasia.
        ]. The 3L version was completed at baseline and at 3 and 8 months.
      • 3.
        The Risk Adjustment in Neurocritical care (RAIN) trial compared 1) management in a dedicated neurocritical care unit versus a combined neuro/general critical care unit and 2) “early” transfer to a neuroscience center versus “no or late” transfer, for patients who initially present at a non-neuroscience center and do not require urgent neurosurgery and for patients with acute traumatic brain injury. The 3L version was completed at 3 months.
      • 4.
        The Immediate Management of Patients with Rupture: Open Versus Endovascular Repair (IMPROVE) trial compared either endovascular repair or open repair of ruptured abdominal aortic aneurysm [
        IMPROVE Trial Investigators
        Endovascular or open repair strategy for ruptured abdominal aortic aneurysm: 30 day outcomes from IMPROVE randomised trial.
        ]. The 3L version was administered at 3 and 12 months.
      • 5.
        The COUGAR-02 randomized, controlled, open-labeled trial compared docetaxel chemotherapy plus active symptom control (DXL + ASC) and ASC-only in patients in the United Kingdom with advanced adenocarcinoma of the esophagus, esophagogastric junction, or stomach [
        • Ford H.E.R.
        • Marshall A.
        • Bridgewater J.A.
        • et al.
        COUGAR-02 Investigators. Docetaxel versus active symptom control for refractory oesophagogastric adenocarcinoma (COUGAR-02): an open-label, phase 3 randomised controlled trial.
        ]. Patients completed the EQ-5D at baseline, during clinic visits at weeks 3, 6, 9, and 12, then every 6 weeks for up to 1 year, and then every 3 months until death.
      • 6.
        The Attenuated dose Rituximab with ChemoTherapy in Chronic lymphocytic leukemia (ARCTIC) study was a multicenter, randomized, controlled, open, phase IIB noninferiority trial conducted in previously untreated patients with chronic lymphocytic leukemia [
        • Hillmen P.
        • Milligan D.
        • Schuh A.
        • et al.
        Results of the randomised phase II NCRI ARCTIC (Attenuated dose Rituximab with ChemoTherapy In CLL) trial of low dose rituximab in previously untreated CLL.
        ,
        • Howard D.R.
        • Munir T.
        • McParland L.
        • et al.
        Clinical effectiveness and cost-effectiveness results from the randomised, phase IIB trial in previously untreated patients with chronic lymphcytic leukaemia (CLL) to compare fludarabine, cyclophosphamide and rituximab (FCR) with fludarabine, cyclophosphamide, mitoxantrone and low dose rituximab (FCM-miniR): the Attenuated dose Rituximab with ChemoTherapy In CLL (ARCTIC) trial.
        ]. It compared the combination of fludarabine, cyclophosphamide, and rituximab, which is considered conventional frontline therapy, with that of fludarabine, cyclophosphamide, mitoxantrone, and low-dose rituximab. The 3L version was completed at baseline, after three cycles of therapy, at the end of therapy, 3 months after the end of therapy, and then every 3 months after the end of therapy until 24 months postrandomization (i.e., at 6, 9, 12, 18, and 24 months postrandomization).
      • 7.
        The Self-Help and Relapse Prevention in Smoking for Health (SHARPISH) trial [
        • Blyth A.
        • Maskrey V.
        • Notley C.
        • et al.
        Effectiveness and economic evaluation of self-help educational materials for the prevention of smoking relapse: randomised controlled trial.
        ] sought to estimate the effectiveness and cost effectiveness of self-help booklets versus a single leaflet to prevent smoking relapse in people who had stopped smoking for 4 weeks. The 3L version was administered at baseline and at 2 months and 11 months postrandomization.
      • 8.
        The Weight-Reduction Activity Program (WRAP) [
        • Ahern A.L.
        • Aveyard P.N.
        • Halford J.C.
        • et al.
        Weight loss referrals for adults in primary care (WRAP): protocol for a multi-centre randomised controlled trial comparing the clinical and cost-effectiveness of primary care referral to a commercial weight loss provider for 12 weeks, referral for 52 weeks, and a brief self-help intervention [ISRCTN82857232].
        ] was a multicenter, nonblinded, three-arm parallel-group randomized controlled trial of two commercial weight loss programs, compared with a brief intervention in overweight adults. The 3L version was administered at baseline and at 3, 12, and 24 months.
      • 9.
        The Complete versus Lesion-only Revascularization for Myocardial Infarction (CvLPRIT) trial [
        • Gershlick A.H.
        • Khan J.N.
        • Kelly D.J.
        • et al.
        Randomized trial of complete versus lesion-only revascularization in patients undergoing primary percutaneous coronary intervention for STEMI and multivessel disease: the CvLPRIT trial.
        ] randomized patients presenting with ST-segment elevation myocardial infarction with bystander stenosis to an infarct-only strategy (only treat the blocked artery that caused the heart attack) versus complete revascularization (treat the blocked artery and also treat any narrowed arteries that may cause heart attacks in future). The 3L version was administered immediately before discharge and at 12 months postdischarge.
      We used the UK value sets for the 3L instrument and the English value set for the 5L instrument [
      • Dolan P.
      Modeling valuations for EuroQol health states.
      ,
      • Devlin N.
      • Shah K.
      • Feng Y.
      • et al.
      Valuing Health Related Quality of Life: An EQ-5D-5L Value Set for England. Technical Report 16.02.
      ].

      Results

       Data Sets

      After exclusion of missing values, there were final estimation samples of 3551 and 5205 respondents in the EQG and NDB data sets, respectively. The EQG sample was younger and contained more males than the NDB sample (see Table 1).
      Table 1Descriptive statistics in the EQG and NDB estimation samples
      EQG sampleNDB sample
      Age (y)
       Mean (95% CI)51.23 (50.57–51.89)63.32 (62.99–63.65)
       Median (95% CI)54 (54–56)64.13 (63.78–64.46)
       SD20.1112.31
       Minimum1316.66
       Maximum9995.20
      Proportion female0.530.81
      EQ-5D-3LEQ-5D-5LEQ-5D-3LEQ-5D-5L
      Utility
       Mean (95% CI)0.628 (0.617–0.639)0.712 (0.703–0.722)0.681 (0.674–0.688)0.779 (0.773–0.784)
       Median (95% CI)0.691 (0.691–0.725)0.802 (0.792–0.816)0.725 (0.725–0.727)0.823 (0.817–0.829)
       SD0.3330.2780.2540.191
       Minimum−0.594−0.281−0.594−0.226
       Maximum1111
      No. of health states (percentage out of possible health states)123 (50.62)660 (21.12)86 (35.39)524 (16.77)
      CI, confidence interval; EQG, EuroQol Group; NDB, National Data Bank; EQ-5D-3L, three-level EuroQol five-dimensional questionnaire; EQ-5D-5L, five-level EQ-5D.
      Figure 1 shows histograms of the response distributions for each dimension of the 3L and 5L versions of the EQ-5D in both data sets. There are differences both across the dimensions and between the data sets. Four distinct distributional shapes can be identified:
      • 1.
        Decreasing profile with a dominant mode at the first category: This distributional shape can be seen in the self-care dimension of both the 3L and the 5L versions and in the mobility and usual activities dimensions of the 5L version in the EQG data set and on the self-care and anxiety/depression dimensions of both versions of the EQ-5D in the NDB data set.
      • 2.
        Decreasing profile with a heavier central section: In the EQG data set, the pattern can be seen in the mobility dimension (3L) and in the pain/discomfort and anxiety/depression dimensions (5L). In the NDB data set, the mobility and usual activities dimensions for both the versions exhibit this shape.
      • 3.
        A strong mode in the center of the distribution: This shape can be found in the pain/discomfort dimension in the 3L version in the EQG data set and in both the versions in the NDB data set.
      • 4.
        A mode in the center of the distribution and an almost as large first category: This distributional shape is similar to shape 2 in that they both exhibit a decreasing profile, but shape 4 has less central concentration. This shape can be found only in the EQG data set in the usual activities and anxiety/depression dimensions of the 3L version.
      Fig. 1
      Fig. 1Response histograms for EQ-5D-3L and EQ-5D-5L in the EQG data set and the NDB data set. For the 3L, level 1 = “no problems,” 2 = “some problems,” 3 = “extreme problems/unable to do.” For the 5L, level 1 = “no problems,” 2 = “slight problems,” 3 = “moderate problems,” 4 = “severe problems,” 5 = “extreme problems/unable to do.” EQ-5D-3L, three-level EuroQol five-dimensional questionnaire; EQ-5D-5L, five-level EQ-5D; EQG, EuroQol Group; NDB, National Data Bank.
      In the NDB data set, both versions of the EQ-5D display the same pattern within each dimension but different shapes across dimensions: shape 1 in both the self-care and anxiety/depression dimensions, shape 2 in the mobility and usual activities dimension, and shape 3 in the pain/discomfort dimension. In contrast, in the EQG data set, only the self-care dimension shows the same shape of distribution in both the 3L and 5L versions. In the EQG data set, the distributional shapes for all the dimensions of the 5L version are similar, displaying a decreasing profile corresponding to either shape 1 or shape 2. The 3L distributions in the EQG data set exhibit all four distributional shapes and appear more different across dimensions than in the 5L version. The variation in shape highlights the need to use flexible model specifications that do not impose the same model structure across dimensions or data sets.
      Figure 2 shows kernel estimates of the distributions of utility scores in both data sets. The 3L versions in both data sets exhibit the typical characteristics documented in the literature: a large mass of observations at 1 (full health), a gap of no observations between full health and the next feasible value (0.883), and a multimodal distribution. In both data sets, the distributions are smoother for the 5L version, especially toward the top of the distribution. The number of individuals in full health is reduced by using the 5L version, and the mode at the bottom of the distribution around the value of 0 in the 3L distribution disappears in the 5L distribution. The mean and median of the 5L version are higher than the corresponding mean and median of the 3L version in both data sets (see Table 1). The range of the 5L version is smaller because the worst state has a utility score of −0.281 compared with −0.594 of the 3L version.
      Fig. 2
      Fig. 2Smoothed empirical distribution functions of EQ-5D-3L and EQ-5D-5L in the EQG and NDB data sets. EQ-5D-3L, three-level EuroQol five-dimensional questionnaire; EQ-5D-5L, five-level EQ-5D; EQG, EuroQol Group; NDB, National Data Bank.

       Statistical Model Results

      The initial specification had sex, age, and the square of age as covariates. The square of age was significant when the model was estimated with EQG data, but grossly insignificant when estimated with NDB data. The preferred specification for the EQG data set has age, age squared, and sex as covariates in all 10 ordinal regressions, whereas the model for the NDB data set excludes the square of age.
      Table 2 presents the results for the two data sets. There are several differences between the models from the two data sets. The best-fitting model in the EQG data set chooses the same copula, Frank, in all dimensions of the EQ-5D. In contrast, the best-fitting model in the NDB data set selects a Gaussian copula for the mobility, usual activities, and pain/discomfort dimensions; a Clayton copula for the self-care dimension; and a Frank copula for the anxiety/depression dimension. The Gaussian and Frank copulas are similar in that both allow for positive or negative dependence, symmetric in both tails, but the Frank form generates dependence weaker in the tails and stronger in the center of the distribution. The Clayton copula allows only positive dependence, with strong left tail dependence and relatively weak right tail dependence; thus, if two variables are strongly correlated at low values but less so at high values, then the Clayton copula is a good choice. Therefore, in the EQG data set, the patterns of residual dependence between the 3L and 5L versions of the EQ-5D are similar across all dimensions, indicating symmetric dependence and weak dependence on the tails. In the NDB data set, a Frank copula was also selected for the anxiety/depression dimension and the parameter of dependence was very similar to that estimated in the EQG data set. In contrast, the Gaussian copulas in the mobility, usual activities, and pain/discomfort dimensions indicate symmetric dependence as well but stronger dependence on the tails of the distribution than the Frank copula selected in the EQG data set. The copula chosen in the self-care dimension using the NDB data set, the Clayton copula, displays a very different pattern of dependence compared with the Frank copula chosen in the EQG data set. It exhibits asymmetric dependence on the tails, with strong dependence at lower values and weak dependence at high values.
      Table 2Summary of final model results
      EQGNDB
      Log likelihood−23,891.83−33,621.04
      No. of parameters7868
      Observations35515205
      Type of mixture in copulaSingle mixtureSingle mixture
      Dimension-specific
      Mobility
       CopulaFrankGaussian
       Equality of coefficients (covariates)7.12
      P = 0.10.
      11.86
      P = 0.01.
       Equality of coefficients (latent factor)8.37
      P = 0.01.
      10.64
      P = 0.01.
       Equality of coefficients (covariates and factor)12.19
      P = 0.05.
      26.49
      P = 0.01.
      Self-care
       CopulaFrankClayton
       Equality of coefficients (covariates)8.53
      P = 0.05.
      1.21
       Equality of coefficients (latent factor)3.68
      P = 0.10.
      0.09
       Equality of coefficients (covariates and factor)9.39
      P = 0.10.
      1.35
      Usual activities
       CopulaFrankGaussian
       Equality of coefficients (covariates)3.290.67
       Equality of coefficients (latent factor)5.62
      P = 0.05.
      8.24
      P = 0.01.
       Equality of coefficients (covariates and factor)0.04
      P = 0.05.
      9.11
      P = 0.05.
      Pain/discomfort
       CopulaFrankGaussian
       Equality of coefficients (covariates)0.5734.36
      P = 0.01.
       Equality of coefficients (latent factor)9.36
      P = 0.01.
      19.99
      P = 0.01.
       Equality of coefficients (covariates and factor)11.95
      P = 0.05.
      50.74
      P = 0.01.
      Anxiety/depression
       CopulaFrankFrank
       Equality of coefficients (covariates)5.604.94
      P = 0.10.
       Equality of coefficients (latent factor)1.231.94
       Equality of coefficients (covariates and factor)7.086.19
      EQG, EuroQol Group; NDB, National Data Bank.
      low asterisk P = 0.10.
      P = 0.01.
      P = 0.05.
      There are significant statistical differences in the coefficients of the covariates and the latent factors between the 3L and the 5L versions in most dimensions. This is a test of the hypothesis that the underlying relationship between covariates and/or latent variables and the EQ-5D is the same for the 3L and 5L versions. Rejection of the hypothesis indicates that the effect of moving from the 3L to the 5L is not just a uniform re-alignment of the response levels. The only exception to this in both data sets is in the anxiety/depression and the self-care dimensions in the NDB data set.

       Cost-Effectiveness Results

      Table 3 and Figure 3 report headline results for all the case studies. In almost all cases, the switch from the 3L to the 5L causes a decrease in the incremental QALY gain from effective health technologies. This is true whether the estimation of the 5L is based on the EQG or the NDB data, with one exception.
      Table 3Incremental QALYs and ICERs for 3L, 5L (EQG), and 5L (NDB) across all case studies
      StudyIncremental QALYsICER
      3L5L (EQG)% Change5L (NDB)% Change3L5L (EQG)% Change5L (NDB)% Change
      CARDERA 10.1450.113−21.8%0.111−23.2%4648594027.8%605430.3%
      CARDERA 20.0840.075−10.4%0.077−8.0%13,66615,25211.6%14,8468.6%
      CARDERA 30.0820.054−33.5%0.043−47.6%15,92923,94050.3%30,41891.0%
      CACTUS0.1500.050−66.7%0.020−86.7%30589481210.0%23,022652.8%
      RAIN a0.0200.005−75.0%0.003−85.0%184,700738,800300.0%1,231,333566.7%
      RAIN b0.0510.021−58.8%0.021−58.8%294,137714,333142.9%714,333142.9%
      IMPROVE0.0520.046−11.5%0.042−19.2%44,617
      In the IMPROVE study, the technology of interest (endovascular aneurysm repair) was cost-saving.
      48,1137.8%54,74222.7%
      COUGAR-020.1150.1193.5%0.1182.6%27,18026,434−2.7%26,484−2.6%
      ARCTIC0.0590.043−27.1%0.046−22.0%112,193162,77445.1%152,13035.6%
      SHARPISH0.0000.003NA0.003NANA
      Incremental QALYs near 0 meant that the calculation of the ICER may be misleading and was therefore not reported.
      WRAP-CP120.0620.047−23.7%0.039−36.2%1812237331.0%284056.7%
      WRAP-CP520.0440.0440.0%0.036−19.0%430543120.2%531623.5%
      CvLPRIT0.0200.010−52.5%0.009−53.0%21,49646,761117.5%47,521121.1%
      Note. CARDERA 1 = MTX vs. MTX + CS; CARDERA 2 = MTX vs. MTX + PNS; CARDERA 3 = MTX + CS + PNS vs. MTX.
      3L, three-level EuroQol five-dimensional questionnaire; 5L, five-level EuroQol five-dimensional questionnaire; ARTIC, Attenuated dose Rituximab with ChemoTherapy in Chronic lymphocytic leukemia; CACTUS, Cost-effectiveness of Aphasia Computer Treatment Compared to Usual Stimulation; CARDERA, Combination of Anti-Rheumatic Drugs in Early Rheumatoid Arthritis; CS, cyclosporine; CvLPRIT, Complete- compared to Lesion-Only Revascularization for Myocardial Infarction trial; EQG, EuroQol Group; ICER, incremental cost-effectiveness ratio; IMPROVE, Immediate Management of Patients with Rupture: Open Versus Endovascular Repair; MTX, methotrexate; NA, not available; NDB, National Data Bank; PNS, prednisolone; QALY, quality-adjusted life-year; RAIN, Risk Adjustment in Neurocritical care; SHARPISH, Self-Help and Relapse Prevention in Smoking for Health; WRAP, Weight-Reduction Activity Program.
      low asterisk In the IMPROVE study, the technology of interest (endovascular aneurysm repair) was cost-saving.
      Incremental QALYs near 0 meant that the calculation of the ICER may be misleading and was therefore not reported.
      Fig. 3
      Fig. 3Histogram of incremental QALYs by 3L, 5L (EQG), and 5L (NDB) for all case studies. CARDERA 1 = MTX vs. MTX + CS, CARDERA 2 = MTX vs. MTX + PNS, CARDERA 3 = MTX + CS + PNS vs. MTX. CS; 3L, three-level EuroQol five-dimensional questionnaire; 5L, five-level EuroQol five-dimensional questionnaire; ARTIC, Attenuated dose Rituximab with ChemoTherapy in Chronic lymphocytic leukemia; CACTUS, Cost-effectiveness of Aphasia Computer Treatment Compared to Usual Stimulation; CARDERA, Combination of Anti-Rheumatic Drugs in Early Rheumatoid Arthritis; CS, cyclosporine; CvLPRIT, Complete- compared to Lesion-Only Revascularization for Myocardial Infarction trial; EQG, EuroQol Group; IMPROVE, Immediate Management of Patients with Rupture: Open Versus Endovascular Repair; MTX, methotrexate; NDB, National Data Bank; PNS, prednisolone; QALY, quality-adjusted life-year; RAIN, Risk Adjustment in Neurocritical care; SHARPISH, Self-Help and Relapse Prevention in Smoking for Health; WRAP, Weight-Reduction Activity Program.
      In COUGAR-02, there is an increase in incremental QALYs as a result of shifting from the 3L to the 5L. The increase is small but is apparent for both versions of 5L estimates. In COUGAR-2, mortality is a very substantial driver of cost effectiveness. Median overall survival in the DXL + ASC group was 5.2 months (95% confidence interval [CI] 4.1–5.9) versus 3.6 months (95% CI 3.3–4.4) in the ASC-only group [
      • Wolfe F.
      • Michaud K.
      The National Data Bank for rheumatic diseases: a multi-registry rheumatic disease data bank.
      ]. Here, the value of improved survival is greater because utility values are increased when using the 5L. It is worth noting that although the RAIN study also included patients with a substantial mortality rate (∼25% mortality within 6 months), this was substantially lower than in COUGAR-02 (approximate 6-month mortality of 75% in the control group and 60% in the DXL arm [
      • Wolfe F.
      • Michaud K.
      The National Data Bank for rheumatic diseases: a multi-registry rheumatic disease data bank.
      ]) and did not outweigh the morbidity effect.
      The responses people give to the 5L instrument and the changed tariff have the combined effect of shifting mean utility scores further up the utility scale toward full health, and compressing them into a smaller range. Thus, improvements in quality of life tend to be valued less using the 5L instrument compared with the same clinical change measured with the 3L instrument.
      In six of the nine reported comparisons, the incremental QALY gain is greater when measured using the 5L and the EQG data set compared with using the 5L and the NDB data set. One of the three remaining comparisons showed no difference.
      In those studies in which the 5L (EQG) lowered incremental QALYs, the impact ranged from a reduction of 10.4% (CARDERA comparison of methotrexate with methotrexate plus prednisolone) to 75% (RAIN comparison of dedicated neurocritical care unit with combined neuro/general critical care unit). The comparable range when using mapping on the basis of NDB data was 8% (CARDERA as before) to 87% (CACTUS).
      The impact of these changes on incremental cost-effectiveness ratios (ICERs) is also substantial in several cases. In CARDERA, the comparison of triple therapy with monotherapy with disease-modifying antirheumatic drug changes from approximately £16,000 using the 3L to more than £24,000 using the 5L (EQG data) and more than £30,000 using the 5L (NDB data). CACTUS changes from a highly cost-effective central estimate using the 3L (£3,058) to one that is more borderline (£23,022) using the 5L (NDB data). CvLPRIT changes from an ICER of just more than £20,000/QALY to in excess of £45,000/QALY when using either estimate of the 5L health utility. Other case studies demonstrate changes in cost effectiveness that may not span boundaries of typically cited cost-effectiveness thresholds but are, nevertheless, very substantial.

      Conclusions

      We have shown that the 3L and 5L versions can produce substantially different estimates of cost effectiveness in a series of case studies spanning different health conditions, severities, and health technologies. Technologies that improve quality of life have those benefits valued more highly, in terms of health utility, when using the 3L instrument compared with the 5L instrument. This is because of the combined effect of the changed descriptive system and how individuals respond to it compared with the 3L (which we demonstrated is not the same across each health dimension) and the changed valuation system. The result is that, in almost all cases, it is estimated that the ICER of a clinically effective technology would be higher (i.e., becomes less cost-effective) if the 5L instrument had been used in place of the 3L instrument. When the cost effectiveness of a technology is substantially driven by mortality rather than by morbidity gains, the impact of shifting the 5L may lower ICERs (improve cost effectiveness). Consistent with our findings, a recent study that also used the EQG data set reported that the 5L leads to higher values overall and across all the health conditions in the EQG data set [

      Mulhern B, Feng Y, Shah K, et al. Comparing the UK EQ-5D-3L and the English EQ-5D-5L value sets. OHE Research Paper 17/02. Available from: https://www.ohe.org/publications/comparing-uk-eq-5d-3l-and-english-eq-5d-5l-value-sets. [Accessed June 20, 2017].

      ].
      In this sense, estimates of health gain from the 3L and 5L are not consistent with each other. There is not a simple proportional adjustment that can be made to reconcile differences between the 3L and the 5L. Changes do not impact equally across the distribution of health and therefore different technologies are affected to a different degree by the shift from one instrument to another.
      It is feasible to adjust 3L evidence to its 5L equivalent, as has been done in this article. The validity of this approach is, in part, dependent on the data on which it is based. We have demonstrated this method in two separate data sets and shown that they give substantially different results. Further investigation of the reasons for these differences is required. In particular, the NDB includes only patients with rheumatoid disease and may not be generalizable to other populations. Nevertheless, the design of the NDB questionnaires included much more separation between the completion of the 3L and the 5L and may, therefore, offer observations given without recall of previous responses than the EQG studies. The NDB study is also predominantly conducted in English. Although there is some evidence that the ranking of levels 4 and 5 (“severe” and “extreme” problems) may not be as expected in the English valuation study [
      • Devlin N.
      • Shah K.
      • Feng Y.
      • et al.
      Valuing health-related quality of life: an EQ-5D-5L value set for England.
      ], this is less likely to be an issue affecting the descriptive system when respondents are provided with all five levels in their expected order. Therefore, we do not feel there is a rationale to prefer English-speaking samples for the mapping work. Both data sets are also limited by their size and coverage of relevant health states. In the EQG data, only 119 of the possible 233 3L utility values are observed. That figure is 83 for the NDB data. We know that most of the 233 health states do appear in real patient records. For example, in the UK 2010 to 2014 data for knee replacement procedures (n = 320,000), we find 189 out of 233 possible utility values. There is a pressing need for well-designed, large-scale data collection to extend this work.
      There are a number of implications for policy in the light of these results. Given the differences between the 3L and 5L instruments, consistency in decision making will be difficult to achieve. Consideration must be given to the value of any cost-effectiveness threshold (or thresholds) or other means for making adjustments between the two instruments. Mapping can help achieve this, and the copula-based method is a sophisticated development of “response mapping” that obtains consistent and accurate results. A single approach to mapping between the 3L and 5L instruments would aid consistent decision making. Additional data collection would also permit extended validation of the method and comparison against the EuroQol “cross-walk” that provides a link between 5L responses and 3L responses [
      • Van Hout B.
      • Janssen M.F.
      • Feng Y.
      • et al.
      Interim scoring for the EQ-5D-5L: mapping the EQ-5D-5L to EQ-5D-3L value sets.
      ]. Decision-making bodies, such as the National Institute for Health and Care Excellence in the United Kingdom, should endorse the use of either the 3L instrument or the 5L instrument and a set of methods that allow evidence to be linked from one to the other. The 5L instrument is increasingly being used in studies of clinical effectiveness, but this is unlikely to entirely replace existing evidence using the 3L instrument that will remain of relevance to many economic evaluations for many years to come.

      Acknowledgments

      We thank Fred Wolfe and Kaleb Michaud for providing data from the NDB; the EuroQol Group for providing data and helpful feedback on a presentation at the International Society for Pharmacoeconomics and Outcomes Research, Vienna, 2016, in particular Paul Kind, Bas Jaansen, and Andrew Lloyd; and Jenny Dunn (School of Health and Related Research) for providing administrative support.
      Source of financial support: This work was supported by a grant from the Medical Research Council Methodology Research Programme (grant no. MR/L022575/1) and the National Institute for Health and Care Excellence through its Decision Support Unit. The views represented are those of the authors alone.

      References

        • Herdman M.
        • Gudex C.
        • Lloyd A.
        • et al.
        Development and preliminary testing of the new five-level version of EQ-5D (EQ-5D-5L).
        Qual Life Res. 2011; 20: 1727-1736
        • Devlin N.
        • Shah K.
        • Feng Y.
        • et al.
        Valuing health-related quality of life: an EQ-5D-5L value set for England.
        OHE Research Paper 16/01, Office of Health Economics, London2016
        • Xie F.
        • Pullenayegum E.
        • Gaebel K.
        • et al.
        A time trade-off-derived value set of the EQ-5D-5L for Canada.
        Med Care. 2016; 54: 98-105
        • Shiroiwa T.
        • Ikeda S.
        • Noto S.
        • et al.
        Comparison of value set based on DCE and/or TTO data: scoring for EQ-5D-5L health states in Japan.
        Value Health. 2016; 19: 648-654
        • Augustovski F.
        • Rey-Ares L.
        • Irazola V.
        • et al.
        An EQ-5D-5L value set based on Uruguayan population preferences.
        Qual Life Res. 2016; 25: 323-333
        • Versteegh M.M.
        • Vermeulen K.M.
        • Evers S.M.
        • et al.
        Dutch tariff for the five-level version of EQ-5D.
        Value Health. 2016; 19: 343-352
        • Kim S.H.
        • Ahn J.
        • Ock M.
        • et al.
        The EQ-5D-5L valuation study in Korea.
        Qual Life Res. 2016; 25: 1845-1852
      1. Feng Y, Devlin N, Shah K, et al. New methods for modelling EQ-5D-5L value sets: an application to English data. Available from: https://www.ohe.org/publications/new-methods-modelling-eq-5d-5l-value-sets-application-english-data. [Accessed May 23, 2017].

        • Wolfe F.
        • Michaud K.
        The National Data Bank for rheumatic diseases: a multi-registry rheumatic disease data bank.
        Rheumatology. 2011; 50: 16-24
        • Hernandez Alava M.
        • Pudney S.
        Econometric modelling of multiple self-reports of health states: The switch from EQ-5D-3L to EQ-5D-5L in evaluating drug therapies for rheumatoid arthritis.
        J Health Econ. 2017; 55: 139-152
        • Choy E.H.S.
        • Smith C.M.
        • Farewell V.
        • et al.
        Factorial randomised controlled trial of glucocorticoids and combination disease modifying drugs in early rheumatoid arthritis.
        Ann Rheum Dis. 2008; 67: 656-663
        • Wailoo A.
        • Hernandez Alava M.
        • Scott I.
        • et al.
        Cost-effectiveness of treatment strategies using combination disease-modifying anti-rheumatic drugs and glucocorticoids in early rheumatoid arthritis.
        Rheumatology (Oxford). 2014; 53: 1773-1777
        • Latimer N.R.
        • Dixon S.
        • Palmer R.
        Cost-utility of self-managed computer therapy for people with aphasia.
        Int J Technol Assess Health Care. 2013; 29: 402-409
        • IMPROVE Trial Investigators
        Endovascular or open repair strategy for ruptured abdominal aortic aneurysm: 30 day outcomes from IMPROVE randomised trial.
        BMJ. 2014; 348: f7661
        • Ford H.E.R.
        • Marshall A.
        • Bridgewater J.A.
        • et al.
        COUGAR-02 Investigators. Docetaxel versus active symptom control for refractory oesophagogastric adenocarcinoma (COUGAR-02): an open-label, phase 3 randomised controlled trial.
        Lancet Oncol. 2014; 15: 78-86
        • Hillmen P.
        • Milligan D.
        • Schuh A.
        • et al.
        Results of the randomised phase II NCRI ARCTIC (Attenuated dose Rituximab with ChemoTherapy In CLL) trial of low dose rituximab in previously untreated CLL.
        Blood. 2013; 122: 1639
        • Howard D.R.
        • Munir T.
        • McParland L.
        • et al.
        Clinical effectiveness and cost-effectiveness results from the randomised, phase IIB trial in previously untreated patients with chronic lymphcytic leukaemia (CLL) to compare fludarabine, cyclophosphamide and rituximab (FCR) with fludarabine, cyclophosphamide, mitoxantrone and low dose rituximab (FCM-miniR): the Attenuated dose Rituximab with ChemoTherapy In CLL (ARCTIC) trial.
        Health Technol Assess. 2017; 21: 1-374
        • Blyth A.
        • Maskrey V.
        • Notley C.
        • et al.
        Effectiveness and economic evaluation of self-help educational materials for the prevention of smoking relapse: randomised controlled trial.
        Health Technol Assess. 2015; 19: 1-70
        • Ahern A.L.
        • Aveyard P.N.
        • Halford J.C.
        • et al.
        Weight loss referrals for adults in primary care (WRAP): protocol for a multi-centre randomised controlled trial comparing the clinical and cost-effectiveness of primary care referral to a commercial weight loss provider for 12 weeks, referral for 52 weeks, and a brief self-help intervention [ISRCTN82857232].
        BMC Public Health. 2014; 14: 620
        • Gershlick A.H.
        • Khan J.N.
        • Kelly D.J.
        • et al.
        Randomized trial of complete versus lesion-only revascularization in patients undergoing primary percutaneous coronary intervention for STEMI and multivessel disease: the CvLPRIT trial.
        J Am Coll Cardiol. 2015; 65: 963-972
        • Dolan P.
        Modeling valuations for EuroQol health states.
        Med Care. 1997; 35: 1095-1108
        • Devlin N.
        • Shah K.
        • Feng Y.
        • et al.
        Valuing Health Related Quality of Life: An EQ-5D-5L Value Set for England. Technical Report 16.02.
        Health Economics and Decision Science, University of Sheffield, Sheffield, UK2016
      2. Mulhern B, Feng Y, Shah K, et al. Comparing the UK EQ-5D-3L and the English EQ-5D-5L value sets. OHE Research Paper 17/02. Available from: https://www.ohe.org/publications/comparing-uk-eq-5d-3l-and-english-eq-5d-5l-value-sets. [Accessed June 20, 2017].

        • Van Hout B.
        • Janssen M.F.
        • Feng Y.
        • et al.
        Interim scoring for the EQ-5D-5L: mapping the EQ-5D-5L to EQ-5D-3L value sets.
        Value Health. 2012; 15: 708-715