Valuing Health Gain from Composite Response Endpoints for Multisystem Diseases

Objectives: This study aimed to demonstrate how to estimate the value of health gain after patients with a multisystem disease achieve a condition-speci ﬁ c composite response endpoint. Methods: Data from patients treated in routine practice with an exemplar multisystem disease (systemic lupus erythema- tosus) were extracted from a national register (British Isles Lupus Assessment Group Biologics Register). Two bespoke composite response endpoints (Major Clinical Response and Improvement) were developed in advance of this study. Difference-in-differences regression compared health utility values (3-level version of EQ-5D; UK tariff) over 6 months for responders and nonresponders. Bootstrapped regression estimated the incremental quality-adjusted life-years (QALYs), probability of QALY gain after achieving the response criteria, and population monetary bene ﬁ t of response. Results: Within the sample (n = 171), 18.2% achieved Major Clinical Response and 49.1% achieved Improvement at 6 months. Incremental health utility values were 0.0923 for Major Clinical Response and 0.0454 for Improvement. Expected incremental QALY gain at 6 months was 0.020 for Major Clinical Response and 0.012 for Improvement. Probability of QALY gain after achieving the response criteria was 77.6% for Major Clinical Response and 72.7% for Improvement. Population monetary bene ﬁ t of response was £ 1106458 for Major Clinical Response and £ 649134 for Improvement. Conclusions: Bespoke composite response endpoints are becoming more common to measure treatment response for multisystem diseases in trials and observational studies. Health technology assessment agencies face a growing challenge to establish whether these endpoints correspond with improved health gain. Health utility values can generate this evidence to enhance the usefulness of composite response endpoints for health technology assessment, decision making, and economic evaluation. monetary bene ﬁ t, multisystem disease, systemic lupus erythematous.


Introduction
Condition-specific composite response endpoints are being developed to improve the ability to determine whether patients with multisystem diseases benefit from treatment within clinical trials and observational studies, 1,2 but health technology assessment (HTA) agencies and decision makers face a challenge to ascertain whether these endpoints correspond with improved health gain. 3,4 This challenge can be resolved by demonstrating a clear link between achieving a composite response endpoint and the likely health gain expressed in terms of an HTA agency's preferred instrument to measure health state utility. 5,6 Multidimensional health-related quality of life instruments with tariffs informed by population preferences are an underused resource to value the health gain from composite response endpoints. 7 Evidence of this health gain is vital information to justify a composite response endpoint's usefulness for decision making and should be considered alongside its clinical relevance and statistical properties.
HTA agencies are experiencing a rise in treatments for multisystem diseases, which are characterized by heterogeneous activity across more than one organ system. 8,9 Composite response endpoints refer to explicit criteria, which incorporate 2 or more independent outcomes into a single dichotomous outcome to determine whether patients respond to treatment over time ("responder" or "nonresponder"). 4 These composite response endpoints are thought to better capture how multisystem diseases improve after effective treatment. 10 By contrast, response endpoints defined by a single instrument may not be sensitive to change if they do not measure all important improvements in disease activity across different organ systems. 11 One common justification for developing a bespoke composite response endpoint is that improvement may be characterized best by reference to multiple outcomes. 12 By combining these outcomes into a single index, the need for a larger sample size to achieve statistical power is reduced. 12 Nevertheless, costeffectiveness analyses are performed within a Bayesian decision analysis framework, 13 and accordingly, corresponding changes in health state utility can be evaluated in terms of their distribution, expected value, and probability that patients categorized as "responders" will experience health gain. The EQ-5D (3L and 5L versions), with accompanying population preference weights, is often favored by decision makers internationally to estimate health state utility values and calculate quality-adjusted life-years (QALYs). 5 Therefore, the value of health gain from a composite response endpoint can be quantified by outcomes such as the expected incremental improvements in EQ-5D utility score, QALYs, and, equivalently, the expected monetary benefit of responding to treatment. Expressing health gain in terms of outcomes preferred by HTA agencies, instead of condition-specific outcomes alone, will justify the importance of a composite response endpoint for decision making. Failing to demonstrate this evidence will ultimately reduce the usefulness of pivotal studies that use a bespoke composite response endpoint to estimate treatment efficacy or effectiveness.
The multisystem disease systemic lupus erythematosus (SLE) provides a good exemplar to illustrate the value of expressing the health gain associated with bespoke disease-specific composite response endpoints in terms of outcomes favored by decision makers. 14 Clinical trials in SLE have remained a challenge with only 2 treatments achieving regulatory approval in the past 60 years. 10,15 In 2005, the US Food and Drug Administration produced initial guidance (finalized in 2010), which recommended composite response endpoints within trials for SLE to capture whether outcomes are nondeteriorating across more than one domain within the same measure. 16 Two recent examples of composite response endpoints created for SLE, informed by this guidance, are the British Isles Lupus Assessment Group-based Composite Lupus Assessment (BICLA), first used in the phase IIb study of epratuzumab, 17 and the Systemic Lupus Erythematosus Responder Index 4 (SRI-4), developed from the phase II trial data for belimumab. 18 The BICLA and the SRI-4 both incorporate observations from 3 separate condition-specific instruments that measure disease activity (the British Isles Lupus Assessment Group [BILAG] 2004 Index, 19 the Systemic Lupus Erythematosus Disease Activity Index 2000 [SLEDAI-2K], 20 and the Physician Global Assessment 21 ) to determine whether patients are classified dichotomously as either a "responder" or a "nonresponder." These composite response endpoints have certain limitations. For example, the BICLA is less sensitive for patients with high baseline organ involvement and the SRI-4 does not detect partial improvement in an organ system. 11 Nevertheless, they have enhanced the quantification of disease improvement across a patient population with heterogenous clinical manifestations and contributed to the recent success of trials in SLE. 10 Therefore, defining treatment outcomes according to composite response endpoints is likely to remain the standard for the foreseeable future. HTA agencies are now confronted with the challenge of comparing evidence from studies that use different composite response endpoints and evaluating the importance of these endpoints according to their preferred instrument to value health gain. Therefore, to address this growing challenge across diseases that affect more than one organ system, this study aims to demonstrate how to estimate the value of health gain after patients with a multisystem disease achieve a condition-specific composite response endpoint.

Methods
This study uses regression analyses of data collected as part of routine (nontrial) practice within a national patient register. A case study is presented to illustrate how health state utility values (from 3-level version of EQ-5D [EQ-5D-3L] profiles informed by general population preference weights) were used to value the health gain associated with 2 bespoke composite response endpoints for a multisystem disease.

Case Study
SLE is an autoimmune disease with heterogeneous symptoms among patients that is caused by disease activity across different organ systems. Globally, the incidence of SLE ranges between 1.5 and 11 per 100 000 person-years. 22 SLE has a 2 to 3 times higher mortality risk than the general population. 22 Treatment with antimalarials, immunosuppressants, biologic agents, and glucocorticoids is intended to reduce active disease and prevent organ damage. 23 The "Maximizing SLE Therapeutic Potential by Application of Novel and Stratified Approaches" program was a Medical Research Council Precision Medicine Consortium that aimed to develop statistical algorithms that could predict the likelihood of response to treatments used routinely to manage SLE. 24 As part of this program, 2 bespoke condition-specific composite response endpoints for SLE were developed a priori to define criteria for achieving "Major Clinical Response" or "Improvement" after treatment. Patients who are prescribed a biologic treatment for SLE in England are required to enroll with a national patient register as a condition of reimbursement. 25,26 This national register collects disease-specific outcomes and health-related quality of life (EQ-5D-3L) data. The routine collection of these data in parallel with evidence of response to treatments prescribed in a clinical setting provided an opportunity to estimate the value of the health gain associated with the 2 bespoke composite response endpoints.

Data Source
The data for this study were obtained from the British Isles Lupus Assessment Group Biologics Register (BILAG-BR). 27 The BILAG-BR records clinical and demographic data from all patients in England who received a biologic treatment for SLE (belimumab, rituximab) and from some patients who received nonbiologic immunosuppressant treatment (mycophenolate mofetil), at baseline (before receiving treatment) and defined follow-up periods. Data were collected within routine clinical practice from 59 specialist centers across the UK between 2010 and 2019.

Instruments Measured
Two disease-specific instruments measured disease activity at baseline and 6 months. The BILAG-2004 Index measured SLEspecific disease activity across 9 organ systems. 19 Activity in each system was scored on an ordinal scale of A to E, where A means severe active disease requiring high-dose systemic treatment, B means moderate disease activity requiring low-dose corticosteroid or antimalarial treatment, C means mild disease activity, D means inactive disease that was active previously, and E means the system has never been active previously. The SLEDAI-2K measured global disease activity in 9 organ systems across 24 items, weighted by severity, and has a score between 0 and 105. 20 The follow-up duration reflected the timing of treatment continuation decision making within clinical practice, which is informed by disease improvement at 6 months. 25,26 Healthrelated quality of life was recorded at baseline and 6 months with the EQ-5D-3L. 28 The EQ-5D-3L has 5 domains (mobility, selfcare, usual activities, pain/discomfort, and anxiety/depression) and 3 levels (no problems, some problems, and extreme problems) to describe 243 unique health states. The EQ-5D-3L tariff for the general population in the United Kingdom was used to estimate health state utility values on a scale of 1 (perfect health) to 0 (dead), where states worse than dead (, 0) were possible. 29 Patients' demographics (age and female sex) and initial treatment were recorded at baseline. Oral corticosteroid dose (mg) was recorded at baseline and 6 months.

Composite Response Endpoint Criteria
Two bespoke composite response endpoints (Major Clinical Response and Improvement) were developed by clinicians and patient experts who were part of the Maximizing SLE Therapeutic Potential by Application of Novel and Stratified Approaches Consortium. The criteria of these 2 endpoints comprised 3 items: the change in each patient's (1) BILAG score, (2) SLEDAI-2K score, and (3) corticosteroid dose between baseline and 6 months (Table 1). For each endpoint, patients who met these criteria were classified dichotomously as a "responder" and "nonresponder" otherwise. The criteria for Major Clinical Response were stricter than those for Improvement and, subsequently, was hypothesized to characterize a greater health gain.

Estimation Sample
The estimation sample comprised patients with (1) high baseline disease activity before treatment (at least one BILAG A score, 2 BILAG B scores, or SLEDAI-2K $ 6), (2) sufficient data to determine whether both composite response endpoints were achieved at 6 months, and (3) complete EQ-5D-3L observations at baseline and 6 months.

Analysis
Descriptive statistics reported the baseline characteristics of the estimation sample according to their average demographics, treatment received in routine practice, and disease activity (mean SLEDAI-2K and percentage of patients with at least 1 BILAG A or 2 BILAG B scores). The percentage of patients who fulfilled the Improvement or Major Clinical Response endpoint criteria at 6 months were also reported.
For each composite response endpoint, the sample was then stratified by those who achieved the endpoint ("responder") and those who did not ("nonresponder"). To understand how health state utility values changed after achieving each endpoint, the mean EQ-5D-3L utility scores at baseline and 6 months and the mean difference (between baseline and 6 months) were reported separately for patients in the "responder" and "nonresponder"   Table 1 subgroups. Two-period difference-in-differences ordinary least squares regressions on the pooled baseline and 6-month observations estimated the incremental change in utility score associated with achieving each composite response endpoint (relative to not achieving the endpoint). 30 The functional form of each regression is reported in (Eq. 1), where Y is the EQ-5D-3L utility score, R is a binary variable equal to one if the patient achieved the specific composite response endpoint and zero otherwise, T is a binary variable equal to one if the observation was measured at 6 months and zero if at baseline, and (RT) is an interaction term.
The estimated parameter for the interaction term (b 3 ) is most relevant for this study because it is equivalent to the incremental change in health state utility for patients who achieve the endpoint (the difference in EQ-5D-3L utility at 6 months for responders less the difference in EQ-5D-3L utility at 6 months for nonresponders). The remaining estimated parameters are interpreted as follows: b 0 is the mean EQ-5D-3L utility for nonresponders at baseline, b 0 1 b 1 is the mean EQ-5D-3L utility for responders at baseline, and b 2 is the estimated difference in the EQ-5D-3L utility between baseline and 6 months for nonresponders. QALYs were calculated by the trapezium rule for each patient. 31 For each endpoint, the incremental QALY gain at 6 months for responders versus nonresponders was estimated by ordinary least squares regression, which controlled for baseline utility scores with 10 000 bootstrap replications. 31 The density function of the incremental QALY gain in each replication was plot separately for each composite response endpoint (kernel density estimator) to facilitate a comparison between the distributions in terms of expected values and parameter uncertainty. The probability that patients experience a positive QALY gain after achieving each composite response endpoint was estimated by the proportion of replications that were greater than zero. The incremental health gain was converted to monetary units (monetary benefit of response) by multiplying the expected incremental QALY gain by the threshold value used to determine cost-effectiveness (k). 32 A cost-effectiveness threshold value of £20 000 per QALY gained was used because this is the lower value of the range used by the National Institute for Health and Care Excellence in their health technology appraisal process. 33 Per patient estimates of incremental QALY gain (DQ) and monetary benefit were multiplied by the number of incident patients (I) over 1 year to provide population-level estimates. The population monetary benefit of response (PMBR r ) was plot separately for Improvement and Major Clinical Response over a range of values that corresponded with the proportion of patients in incident population who achieved each endpoint (Eq. 2). All analyses were performed in Stata version 16.1 (StataCorp LLC, College Station, TX). 34

Ethics
Ethical approval for the BILAG-BR was obtained from the National Research Ethics Service (NRES) Committee North West-Greater Manchester West (REC: 09/H1014/64) and the local research and development departments at participating sites. All patients provided a written informed consent at the time of registration to the study.

Results
There were 776 individuals in the register with available data to determine whether the composite response endpoints were achieved at 6 months. Of this data set, the estimation sample comprised the 171 patients with complete EQ-5D-3L observations at baseline and 6 months. The baseline (pretreatment) descriptive statistics of the estimation sample and the percentage who achieved the Improvement and Major Clinical Response endpoints are presented in Table 2. On average, the sample had severe active disease (mean SLEDAI-2K: 8.51); more than half of patients had at least one BILAG A score and 48.5% of patients had at least 2 BILAG B scores before receiving treatment. Most patients received a biologic treatment at baseline (rituximab, 85.4%; belimumab, 5.3%). At 6 months, 49.1% of patients achieved the Improvement endpoint and 18.1% of patients achieved the Major Clinical Response endpoint. Table 3 reports separately for each composite response endpoint, the mean health state utility values at baseline and 6 months, and the difference over time, stratified by whether patients achieved each endpoint (responder) or not (nonresponder). For both composite response endpoints, patients who achieved the response criteria had, on average, a lower baseline health state utility value than patients who did not achieve the response criteria (Improvement, 0.449 vs 0.520; Major Clinical Response, 0.344 vs 0.517). The mean health state utility values increased over 6 months irrespective of whether patients met either endpoint. The magnitude of this increase was larger for patients who achieved each endpoint. The increase in mean health state utility values for patients who achieved the Major Clinical Response endpoint (difference, 0.102) was approximately 2 times greater than that for patients who achieved the Improvement endpoint (difference, 0.050).
The estimated coefficients from the difference-in-differences regressions separately for each composite response endpoint are presented in Table 4. The positive coefficient on the interaction term indicates that, for both endpoints, the incremental gain in health state utility was positive for patients who achieved the respective endpoint compared with those who did not. The estimated incremental gain in health state utility for patients who achieved the Major Clinical Response endpoint (b 3 = 0.0923) at 6 months was larger than that for the Improvement endpoint (b 3 = 0.0454).
The 6-month per patient incremental QALY gain for response, after controlling for baseline health state utility, was estimated as 0.020 QALYs for Major Clinical Response and 0.012 QALYs for Improvement. The 6-month per patient incremental monetary benefit of response was £398 for Major Clinical Response and £233 for Improvement. The density function of the simulated incremental QALY gains from achieving each composite response endpoint (relative to being a nonresponder) is illustrated in Figure 1. The sampled distribution of the incremental QALY gain was wider for the Major Clinical Response endpoint than for the Improvement endpoint, which showed that parameter uncertainty associated with the true value of the incremental QALY gain was greater for patients who achieved Major Clinical Response. The distribution of the QALY gain for patients who achieved Major Clinical Response was further to the right than for patients who achieved Improvement, which indicated that Major Clinical Response corresponded with a higher expected incremental QALY gain. By reference to the proportion of simulations that were greater than zero, the probability that achieving each endpoint was associated with a positive QALY gain was 77.6% for Major Clinical Response and 72.7% for Improvement. The estimated number of new SLE cases per year was 2776, based on Rees et al's 35 estimated incidence of 4.91 per 100 000 using national electronic healthcare record data in the UK (Clinical Practice Research Datalink) and assuming a general population of 56.55 million individuals. 36 Figure 2 illustrates how the 6-month population monetary benefit of response for this incident population changes as the proportion of patients who achieve each endpoint increases. If all patients achieve Major Clinical Response, there will be a population benefit of response equivalent to 55.32 QALYs or £1 106 458. Similarly, if all patients achieve Improvement, there will be a population benefit of response equivalent to 32.46 QALYs or £649 134. Based on the proportion of patients who currently achieve each endpoint in routine practice (see Table 2), the expected population monetary benefit of response for Major Clinical Response is £200 600 (or 10.03 QALYs) and for Improvement is £318 854 (or 15.94 QALYs).

Discussion
This study demonstrated how bespoke disease-specific composite response endpoints can be strengthened by linking them with improvements in health gain according to outcomes that are relevant to HTA agencies. In the case study, patients who achieved the Improvement and Major Clinical Response endpoints at 6 months had higher increases in health state utility, QALYs, and monetary benefit of response than patients who did not achieve these endpoints. Valuing the health gain of a composite response endpoint can guide the criteria for meeting these endpoints as they are developed, help inform trial design by selecting the most valuable composite response endpoint from a range of competing alternatives, and improve the usefulness of future research studies that use the endpoint within the context of informing HTA and decision making.
The need to transform disease-specific instruments into health state utility values to support decision making has long been recognized in the mapping literature. 37 Nevertheless, composite response endpoints measure the change of disease activity over time after treatment (a measure of effectiveness), whereas mapping studies estimate a cross-sectional association between a disease-specific instrument and health state utility in the same time period. To value the health gain associated with a composite response endpoint, analysts will first require patientlevel baseline and follow-up data. At each time period, these data should include observations of health state utility and an indicator variable for whether the composite response endpoint was met at the defined follow-up. Regression analyses to control for baseline utility with nonparametric bootstrapping, equivalent to the approach for within-trial economic evaluations, 31 can then estimate the incremental health gain of meeting the endpoint compared with not responding. Large observational studies (patient registers, cohort studies) and clinical trials will   Table 1. Kernel density estimator of 10 000 bootstrap replications.
-both provide relevant patient-level data to complete these analyses.
Valuing health gain according to changes in health state utility values or QALYs are typically an essential input for costeffectiveness analyses. A recent review of economic evaluations in SLE found estimates of utility values were obtained from small samples or were reported incompletely. 38 The results from the present study build on this finding and provide a set of health state utility values associated with achieving the Improvement and Major Clinical Response endpoints, which can, subsequently, be used as a source of evidence for input parameters in future cost-effectiveness analyses. Parameter uncertainty in these utility values can be propagated appropriately within a probabilistic analysis by reference to the reported standard errors. In turn, the evidence reported by this study can help to improve the robustness of future analyses, which aim to estimate the relative costeffectiveness of new and existing treatment strategies for SLE.
Previous studies have compared responders and nonresponders with existing composite response endpoints for SLE with other health-related quality of life instruments. than nonresponding patients. If there is a positive correlation between the SF-36 and EQ-5D-3L in people with SLE, 41 these findings could suggest that existing endpoints may also improve EQ-5D scores. Direct comparative evidence of how existing composite response endpoints perform according to instruments preferred by HTA agencies will be valuable evidence to support future decision making.
One limitation of this study was that the composite response endpoints were measured at 6 months only. The duration of treatment outcomes for SLE varies among individuals, and retreatment can be effective for patients who relapse. 42 Beyond a 6-month time horizon, the health gain attributable to the composite response endpoints will be determined by the proportion who sustain their 6-month outcomes and effectiveness of retreatment in relapsing cases. A second limitation was that fewer patients achieved the Major Clinical Response endpoint (18.1%) than the Improvement endpoint (49.1%), which may increase parameter uncertainty around the expected QALY gain. Nevertheless, the analysis still demonstrated that the Major Clinical Response endpoint was associated with a greater expected incremental QALY gain than the Improvement endpoint, and future analyses can account for this parameter uncertainty appropriately by sampling values from a relevant statistical distribution. A third limitation was that most patients in the estimation sample were treated with rituximab at baseline, which may challenge the generalizability of the findings to patients who receive other treatments. The use of rituximab for SLE varies worldwide 43 and new treatments are likely to enter the healthcare system soon. 44 It is plausible that the incremental improvement in health state utility associated with achieving either composite endpoint is independent of the treatment prescribed at baseline. This assumption can be investigated within cohorts who do not receive biologic treatment or when  Table 1. Population monetary benefit of response (£, million) calculated assuming an incident population of 2776 patients and a cost-effectiveness threshold of £20 000 per QALY gained.
new treatments receive marketing authorization for SLE and their data become available in the national patient register. Future research could replicate this analysis by using the EQ-5D-3L tariff for other countries to value the health gain of achieving the composite response endpoints in different decisionmaking jurisdictions. The methods for this study could also be used to estimate the QALY and monetary benefit of response for existing composite response endpoints in SLE, 17,18 from the completed clinical trial data sets, to compare the relative health gains of achieving these different endpoints. Similarly, these methods can be applied when developing a bespoke composite response endpoint for other complex multisystem diseases to ensure that it reflects improvements in health state utility and outcomes that are valuable for decision makers. The cost of strategies to achieve these endpoints could also be compared with the monetary benefit of response in a full economic evaluation to estimate the net monetary benefit of treatment. 32 Finally, after the introduction of value sets for the EQ-5D-5L, 45 it will be possible to estimate the incremental improvement in health gain associated with achieving all composite response endpoints for SLE and compare the findings with the estimated results based on the EQ-5D-3L value sets.

Conclusion
Composite response endpoints are becoming more common to determine whether patients with multisystem diseases respond to treatment within clinical trials and observational studies. HTA agencies and decision makers require empirical evidence to establish whether these endpoints correspond with improved health gain. Increases in health state utility, QALYs, and monetary benefit of response provide a way to estimate this health gain and demonstrate the usefulness of these endpoints for HTA. Establishing the value of health gains associated with composite response endpoints, in addition to their clinical significance, will ensure they correspond with outcomes most relevant to decision makers, patients, and improving the cost-effectiveness of care.