Full length article| Volume 21, ISSUE 6, P732-741, June 01, 2018

# Evaluation of Split Version and Feedback Module on the Improvement of Time Trade-Off Data

Open ArchivePublished:December 18, 2017

## Abstract

### Background

EQ-5D-5L valuation studies previously reported many inconsistent responses in time trade-off (TTO) data. A number of possible elements, including ordering effects of the valuation tasks, mistakes at the sorting question, and interviewers’ (learning) effects, may contribute to their inconsistency.

### Objectives

This study aimed to evaluate the effect of two modifications on consistency of TTO data in The Netherlands (NL) and Hong Kong (HK): (1) separating the valuation of the Better than Dead (BTD) and Worse than Dead (WTD) states; and (2) Implementation of feedback (FB) module by offering an opportunity to review TTO responses.

### Methods

A crossover design with two study arms was used to test the effect of the modifications. In each jurisdiction, six interviewers were involved where half the interviewers started using the standard version, and the other half started with the split version. Each version was switched after every 25 (NL) or 30 (HK) interviews until 400 interviews were completed.

### Results

In the NL and HK, 404 and 403 respondents participated, respectively. With the use of the FB module, the proportion of respondents with inconsistent responses was lowered from 17.8% to 10.6% (P < 0.001) in NL and from 31.8% to 22.3% (P = 0.003) in HK. The result of separating the valuation of BTD and WTD states was not straightforward because it reduced the inconsistency rate in NL but not in HK.

### Conclusions

The results support implementation of the FB module to promote the consistency of the data. The separation of the BTD and WTD task is not supported.

## Keywords

A standard protocol and interviewer training materials developed by the EuroQol Group can be used to create value sets for the EQ-5D-5L questionnaire uniformly across different jurisdictions with the aid of a computer-assisted personal interview software—EuroQol Valuation Technology (EQ-VT) [
• Oppe M.
• Devlin N.J.
• van Hout B.
• et al.
A program of methodological research to arrive at the new international EQ-5D-5L valuation protocol.
,
• Oppe M.
• Rand-Hendriksen K.
• Shah K.
• et al.
EuroQol protocols for time trade-off valuation of health outcomes.
]. For the valuation, some consider (or at least can imagine) certain health problems to be worse than being dead. However, the valuation of WTD states is controversial, and a number of competing methods for obtaining WTD values have been proposed [
• Patrick D.L.
• Starks H.E.
• Cain K.C.
• et al.
Measuring preferences for health states worse than death.
,
• Devlin N.J.
• Buckingham K.
• Shah K.
• et al.
A comparison of alternative variants of the lead and lag time TTO.
,
• Tilling C.
• Devlin N.
• Tsuchiya A.
• Buckingham K.
Protocols for time trade-off valuations of health states worse than dead: a literature review.
,
• Augustovski F.
• Rey-Ares L.
• Irazola V.
• et al.
]. Composite time trade-off (cTTO), which has been presented as one of the primary health state valuation approaches, uses the 10 years lead-time TTO with a ratio of 1:1 [
• Oppe M.
• Rand-Hendriksen K.
• Shah K.
• et al.
EuroQol protocols for time trade-off valuation of health outcomes.
,
• Janssen B.
• Oppe M.
• Versteegh M.M.
• Stolk E.A.
Introducing the composite time trade-off: a test of feasibility and face validity.
]. Prior to the initiation of cTTO, there was no “theoretical” lower boundary on the utilities for states WTD. If the person considers the health states equivalent to “full health,” the utility value is 1, and for states “equal to dead,” the value is 0. For states WTD, this implies that the utilities could approach –∞ [
• Lamers L.M.
The transformation of utilities for health states worse than death.
]. By transforming utilities for health states WTD and to compare the effect of bounding the negative values, the choice for –1 was motivated by an equal range for positive and negative values [
• Patrick D.L.
• Starks H.E.
• Cain K.C.
• et al.
Measuring preferences for health states worse than death.
]. To assess the cognitive burden of BTD/WTD, a number of debriefing questions were included at the end of 10 cTTO tasks to evaluate whether the instructions were clear to respondents [
• Janssen B.
• Oppe M.
• Versteegh M.M.
• Stolk E.A.
Introducing the composite time trade-off: a test of feasibility and face validity.
]. Scales of the debriefing statements, together with the average number of steps in the iterative process, were analyzed, and the results confirmed the validity of WTD as a measurement and the feasibility of cTTO.
Unfortunately, the first series of EQ-5D-5L valuation studies have reported several common quality issues in their cTTO data. One of these issues is a large number of inconsistent responses observed [
• Devlin N.J.
• Shah K.K.
• Feng Y.
Valuing health-related quality of life: an EQ-5D-5L value set for England.
,
• Ramos-Goñi J.
• Rivero-Arias O.
• Errea M.
• et al.
Dealing with the health state ‘dead’ when using discrete choice experiments to obtain values for EQ-5D-5L health states.
,
• Versteegh M.M.
• Attema A.E.
• Oppe M.
• et al.
Time to tweak the TTO: results from a comparison of alternative specifications of the TTO.
,
• Xie F.
• Pullenayegum E.
• Bansback N.
• et al.
The Canadian EQ-5D-5L valuation study: an exploratory analysis.
,
• Luo N.
• Li M.
• Stolk E.A.
• Devlin N.J.
The effects of lead time and visual aids in TTO valuation: a study of the EQ-VT framework.
]. In the descriptive system of EQ-5D-5L, a health state is defined by taking one level from each of the five different dimensions (mobility, self-care, usual activities, pain/discomfort, and anxiety/depression). Each health state is described by a five-digit single index. For example, the index for having “no problems” in all five dimensions of EQ-5D-5L would be “11111,” and the index for having “extreme problems in all dimensions would be “55555.” Logically, all health states “dominate” the worst state (index 55555). A response is defined as inconsistent if a health state that is better in at least one dimension and no worse in all others, receiving a lower value than the state it logically dominates. However, at least 20% of respondents valued one or more health states as being worse illogically in each of those valuation studies [
• Devlin N.J.
• Shah K.K.
• Feng Y.
Valuing health-related quality of life: an EQ-5D-5L value set for England.
,
• Ramos-Goñi J.
• Rivero-Arias O.
• Errea M.
• et al.
Dealing with the health state ‘dead’ when using discrete choice experiments to obtain values for EQ-5D-5L health states.
,
• Versteegh M.M.
• Attema A.E.
• Oppe M.
• et al.
Time to tweak the TTO: results from a comparison of alternative specifications of the TTO.
,
• Xie F.
• Pullenayegum E.
• Bansback N.
• et al.
The Canadian EQ-5D-5L valuation study: an exploratory analysis.
,
• Luo N.
• Li M.
• Stolk E.A.
• Devlin N.J.
The effects of lead time and visual aids in TTO valuation: a study of the EQ-VT framework.
]. Similarly, about 10% of respondents gave lower values to very mild health states than they did to more severe and logically worse health states. Moreover, it was recognized that about half (8%) of the inconsistencies with regard to the worst state involved large utility differences (>0.5) [
• Shah K.
• Rand-Hendriksen K.
• Ramos J.M.
• et al.
Improving the quality of data collected in EQ-5D-5L valuation studies: a summary of the EQ-VT research methodology programme.
]. It has been considered that these observed problems might (at least partially) reflect implementation issues: ordering effects of the valuation tasks of the state considered as BTD or WTD, mistakes at the “sorting question” identifying whether a state was considered BTD or WTD, interviewees’ strategic behavior, and interviewers’ learning effects. Although real-time data monitoring using quality check software (QC tool), three practice cTTO tasks in addition to one standard example the state of being “in a wheelchair,” and confirmatory pop-ups have been included in the standard protocol, the issue of inconsistency of TTO data still exists. Because this can be resolved, at least partially, an update of the protocol for EQ-5D-5L valuation studies is warranted.
Perfect consistency may not be expected because the differences between EQ-5D-5L health states are subtle and respondents are likely to be uncertain about the values that they provide [
• Devlin N.J.
• Krabbe P.
The development of new research methods for the valuation of EQ-5D-5L.
]. However, the proportion of respondents whose responses contained severe inconsistencies has been considered unacceptably large and is unlikely to be the product of random error or uncertainty per se. As a result, the individual level cTTO data were closely scrutinized. Here, it was observed that severe inconsistencies could often be traced to inconsistent behavior at the sorting question, guiding respondents to value of the states considered BTD or WTD in the cTTO task [
• Janssen B.
• Oppe M.
• Versteegh M.M.
• Stolk E.A.
Introducing the composite time trade-off: a test of feasibility and face validity.
]. It leads to the hypothesis that the order of the valuation of states considered BTD/WTD would influence the response. In addition, the BTD/WTD sorting question may possibly be “error prone” if respondents mistakenly give a –1 response while intending to give a 0 response, or vice versa. It may just simply be inherently difficult to answer, hence it invokes inconsistent responses, which, in turn, quickly result in severe inconsistencies because the distribution of responses on the BTD and WTD parts of the scale are different between the two comparable health states (full health vs BTD/WTD states). It further supports another hypothesis that making mistakes at the sorting question for BTD/WTD valuation may have contributed to the rate of inconsistency.
This study investigated the effect of two modifications on the implementation: (1) separating the valuation of the BTD and WTD states by moving all WTD valuations to the end of the cTTO task, and (2) implementation of a feedback (FB) module by offering respondents the opportunity to review their responses and take the wrong ones out, if any. Both modifications were tested on the consistency of TTO data in NL and HK.

## Methods

### Study Design

The differences between the standard version of protocol [
• Oppe M.
• Rand-Hendriksen K.
• Shah K.
• et al.
EuroQol protocols for time trade-off valuation of health outcomes.
] and a modified version of cTTO task (hereafter referred to as “split version”) were explored, with a crossover study design in NL and HK. In each jurisdiction, a stable team of six interviewers was recruited, and three interviewers were randomly assigned to two groups to conduct interviews. Block sampling (25 and 30 interviews for NL and HK, respectively) was adopted to minimize the selection bias of respondents in the two groups. Group A started with the standard version (control arm) for the first 25 (NL) or 30 (HK) interviews and then switched to the split version, which involved separating the BTD and WTD tasks within the cTTO exercise (experimental arm). Group B started with the split version of cTTO exercise and then switched to the standard version, in the same manner as group A. The two study arms were switched again after completing further 25 and 30 interviews until they had both reached the study’s target sample of 400. Randomizing individual respondents to either the standard version or the split version was considered an alternative design. However, it was not opted for because of possible spillover effects between the methods that could occur if interviewers muddled up the protocols by switching between the two methods simultaneously. After collecting TTO values by using the standard version or the split version of the cTTO task, the FB module was offered to all respondents, giving them the opportunity to indicate changes in both the control arm and the experimental arm. The study design is shown in Figure 1.

#### Control arm

In the control arm, respondents completed the health state valuation tasks by using the standard protocol [
• Oppe M.
• Devlin N.J.
• van Hout B.
• et al.
A program of methodological research to arrive at the new international EQ-5D-5L valuation protocol.
,
• Oppe M.
• Rand-Hendriksen K.
• Shah K.
• et al.
EuroQol protocols for time trade-off valuation of health outcomes.
], including multiple training and quality control components. The training task aimed to make sure that the respondents understood the concept of TTO. The interviewer first showed how TTO works using as an example the state of being “in a wheelchair” and three practice tasks where the respondent was asked to value health states of varying severities. For monitoring the protocol compliance and the performance of interviewers, QC tool was implemented. The valuation task was completed in a single sitting, regardless of whether the respondent indicated that the health state under evaluation was BTD or WTD.

#### Feedback module

The FB module was presented after completing the cTTO task in both study arms. FB presented the implied rank ordering of health states derived from the respondent’s cTTO responses. All 10 health states were presented as vignettes (five bullet points, one for each dimension of health), consistent with the way the health states had been presented in the preceding cTTO exercise. They were shown on one screen sorted in order of obtained TTO values, with the states given the highest value presented on top and the state given the lowest value at the bottom. Health states that were given an equal value were presented side by side. The respondents were asked to review their responses and indicate, by clicking on the relevant health state(s), whether on reflection they considered any to be in the wrong position. Subsequently, the respondents were invited to leave an optional text comment to explain their response. However, no attempt was made to derive the TTO value again. Instead, the TTO response was flagged, allowing subsequent filtering of responses. Figure 1 presents a screenshot of the feedback module.

### Data Collection

In NL, data were collected at one selected specialist day hospital in Rotterdam, and a convenient sampling approach was used. Respondents received a small incentive of 10 euros for their participation. In HK, data were collected alongside the EQ-5D-5L national valuation study [
• Wong E.L.
• Yeoh E.K.
• Slaap B.
• et al.
Validation and valuation of the preference-based health index using EQ-5D-5L in the Hong Kong Population.
]. A stratified sampling strategy based on the profile of age and gender in HK was adopted for the total sample in the valuation study. A total sample size of 400, with 200 respondents per arm in each jurisdiction, was used for the evaluation of the separation of the BTD and WTD tasks, and therefore a total sample size of 800 was available for evaluation of the impact of two modifications.

#### Health states selection

In HK, the study into the effects of separating the BTD and WTD tasks was appended to the national EQ-5D-5L valuation study, in which respondents were randomized to one out of 10 different blocks of health states, as described by Oppe et al. [
• Oppe M.
• Devlin N.J.
• van Hout B.
• et al.
A program of methodological research to arrive at the new international EQ-5D-5L valuation protocol.
]. The first 400 respondents who entered the valuation study comprised the sample for the current study. In NL, the study was set up as a randomized controlled experiment. Only one of the blocks (block 10) was included in NL. Hence, all respondents value the same 10 health states: 12111, 11122, 42321, 13224, 35311, 34232, 52335, 24445, 43555, and 55555.

#### Interviewer training and data quality control

All interviewers were asked to follow the script guiding investigators of valuation studies in their data collection. The interviewers were also required to attend a training session. The rationale for health state valuation studies, the methods for health state valuation used in both the standard version and the modified version, the role of the interviewers, the importance of interviewer’s role in obtaining high-quality data, and the objectives of this methodologic study was explained during the training. The QC process described by Ramos-Goñi et al. [
• Ramos-Goñi J.M.
• Rand-Hendriksen K.
Does the introduction of the ranking task in valuation studies improve data quality and reduce inconsistencies? The case of the EQ-5D-5L.
] was adopted to safeguard data quality, which involved weekly review of all conducted interviews by the principal investigator, who subsequently gave feedback to the interviewers on their performance.

### Statistical Analysis

The impacts of two modifications were evaluated through the interview duration, the proportion of respondents with WTD responses, consistency of observed values for the health states, and TTO value of the cTTO task. Logistic regression was further used to explore the relationship between inconsistent outcomes and various dependent variables: interviewer, study arm, and learning effects. Learning effects were captured by including a variable in the regression that counted the number of interviews previously conducted by that interviewer. P value < 0.05 was considered statistically significant. To evaluate consistency, the number of pairs of health states which were illogically ordered was counted. For example, health state 25555 was logically better than 45555 (25555 was at least as good on each of the five dimensions); thus, we considered that there was an inconsistent response if the respondent valued 45555 higher. In the study, the following inconsistency rates were explored:
• 1.
The percentage of respondents with at least 1 inconsistent response
• 2.
The percentage of respondents with at least 1 inconsistent response involving state 55555
• 3.
The percentage of respondents considered severely inconsistent involving state 55555, defined by a difference on the utility scale of ≥0.5 (i.e., if they gave 55555 a value that was ≥0.5 greater than the value given to any other health state)
To further explore the nature of inconsistencies, a distance score for each pair of illogically ordered health states were estimated. The approach introduced by Bonsel et al. [
• Bonsel G.
• Oppe M.
• Janssen B.
Non-additive impact of dimensions on the index values of health states: an analysis of interaction effects to avoid misspecification of the value function with unsaturated valuation datasets. A pilot.
] was adopted to establish these scores, which involved computing the distance score as the sum of the squared difference on each domain (Mobility-MO, Self-care-SC, Usual-activities-UA, Pain/Discomfort-PD, Anxiety/Depression-AD) for two health states, X and Y, which were compared as reflected in the formula below. This distance score represents the “stress” that would be implied by a direct comparison of two health states within each of the 25 pairs with a logical ordering.
$Distance=(MOX–MOY)2+(SCX–SCY)2+(UAX–UAY)2+(PDX–PDY)2+(ADX–ADY)2$

In the EQ-5D-5L descriptive system, this score has a maximum value of 80 (4^2 * 5) for the comparison of the best health state (11111) with the worst health state (55555). In the subset of 10 health states used in NL in this study, the maximum distance score was 73 (4^2 * 4 + 3^2) for the comparison of the mild state (e.g. 12111) to the worst health state (55555).

## Results

### Characteristic of Respondents

In NL, a total of 404 respondents participated in the study, of which 205 were assigned to the control arm and 199 to the experimental arm. In HK, a total of 403 respondents completed the study, with 200 and 203 being assigned to the control and experimental arms, respectively. Compared with the general population, males were underrepresented and older individuals were overrepresented in both jurisdictions. The demographics of the respondents across the jurisdictions and also the two study arms in each jurisdiction are shown in Table 1. The profiles of the respondents across the two jurisdictions were similar, but the proportion of Dutch respondents who had experience with serious illness in their families was higher (P value < 0.001), and the proportion of HK respondents aged 35 years or older was higher (P value< 0.001). However, there was no statistical significance with regard to the difference between the two study arms for any of the observed background characteristics, both in NL and in HK.
Table 1Characteristics of respondents in The Netherlands and in Hong Kong across two experimental study arms
Netherlands (%)Hong Kong (%)NL vs HK
Control arm (n = 205)BTD/WTD split arm (n = 199)P value
Chi-square tests were performed for the comparison between the respondents in The Netherlands and in Hong Kong and also across two experimental study arms in each jurisdiction.
Overall in NL (n = 404)Control arm (n = 200)BTD/WTD split arm (n = 203)P value
Chi-square tests were performed for the comparison between the respondents in The Netherlands and in Hong Kong and also across two experimental study arms in each jurisdiction.
Overall in HK (n = 403)P value
Chi-square tests were performed for the comparison between the respondents in The Netherlands and in Hong Kong and also across two experimental study arms in each jurisdiction.
Gender0.8380.5290.496
Female65.466.365.960.063.161.5
Age (years)0.9730.812< 0.001
<3521.018.619.831.528.630.0
35–5430.731.230.921.022.221.6
≥5548.350.349.447.549.348.4
Experience of illness
In self23.921.10.50122.733.028.10.28330.50.266
In family75.175.40.95375.342.035.50.17838.7< 0.001
In caring for others50.249.70.92150.150.545.30.29847.90.717
BTD, better than dead; HK, Hong Kong; NL, The Netherlands; WTD, worse than dead.
Chi-square tests were performed for the comparison between the respondents in The Netherlands and in Hong Kong and also across two experimental study arms in each jurisdiction.

### Descriptive Analysis of Health State Values

#### Duration of interview

In the control arm, the mean amount of time taken by respondents to complete the TTO task was 35.2 minutes (NL) and 41.6 minutes (HK). In the split arm with the separation of the BTD and WTD tasks, the mean time taken was significantly longer: 38.2 minutes (NL: P = 0.007) and 44 minutes (HK: P < 0.001).

#### TTO values by study arm

More than 70% of the respondents provided at least one WTD response (73.7% in NL, 69.2% in HK), and more than a quarter of all responses had a negative value (25.7% in the NL, 31.6% in HK). In HK, the observed TTO values were lower than those in NL (Figure 2). In both jurisdictions, the difference in mean health state values across study arms was not statistically significant. Respondents using the split version separating the valuation of BTD and WTD states reconsidered their initial classification of a state being WTD in 5.2% of cases in NL, whereas around one-tenth of them (8.9%) reconsidered their valuation in HK.

#### Impact of the FB module

Because of a bug in the system, the FB module results of 31 Dutch respondents were lost. Results of 373 Dutch respondents and 403 respondents from HK were available for analysis. Completion of the FB module took, on average, 2 minutes in NL and 2.4 minutes in HK. In NL, 35.9% of respondents (134 of 373) flagged one or more responses for removal versus 28% of respondents (113 of 403) in HK. The Dutch respondents most frequently flagged the responses for states 13224 (31 of total 188) and 52335 (34 of total 188). A similar statistic was not produced for HK because more blocks of health states were used. In total, 5.3% of responses were flagged for removal in NL and 4.8% in HK. Including or excluding responses that were flagged by using the FB module did not result in statistically significant different mean values for any of the health states, but in both jurisdictions, the removal of responses overall seemed to slightly raise the values for mild states and to slightly lower the values for poor states. This was an anticipated effect of removing inconsistencies because variance in the observed values was restricted by the boundaries of the scale on which TTO values were measured (e.g., mild states could be valued much too low, but not much too high because all values had an upper bound of 1). The majority of respondents who removed responses had no inconsistent responses (71.6% in NL, 52.2% in HK). Moreover, around 50% of the respondents with inconsistencies did not use the FB module to remove any responses (42.4% in NL, 57.8% in HK). When the reason for removing responses was stated, it was most often the case that respondents considered the rank position of one state relative to the others to be wrong.

### Consistency Analysis of Health State Values

In NL, 17.8% of respondents provided one or more inconsistent responses compared with 31.2% in HK. The most frequently observed inconsistencies in NL involved health states 55555/43555 (n = 15) and 52335/42321 (n = 16). The former was more often resolved in the FB module. Similar metrics were not computed for HK because the numbers of observations per block were too low to allow for drawing conclusions. The FB module then promoted consistency in both jurisdictions. The overall inconsistency rate and the inconsistency rate for only health state “55555” were reduced to 10.6% and 2.7%, respectively, in NL, and to 22.3% and 4.7%, respectively, in HK. The reductions of inconsistency rates were statistically significant in both jurisdictions (Table 2). In addition, these two jurisdictions reported opposite results of inconsistent responses with regard to the split version. In NL, the separation of the BTD and WTD tasks promoted data consistency, whereas in HK, the most consistent results were found in the control arm (Figure 3).
Table 2Inconsistency rates in The Netherlands and Hong Kong by study arm, before and after use of the feedback module
Inconsistency rate (%)Inconsistency rate involving 55555 (%)
Before feedback moduleAfter feedback moduleP value
Proportion tests were performed for the comparison between the inconsistency before using and after using feedback module in The Netherlands and in Hong Kong.
Before feedback moduleAfter feedback moduleP value
Proportion tests were performed for the comparison between the inconsistency before using and after using feedback module in The Netherlands and in Hong Kong.
The NetherlandsOverall17.810.6< 0.0015.72.7< 0.001
Control arm19.513.2< 0.0016.33.4< 0.001
BTD/WTD split16.18.0< 0.0015.02.0< 0.001
Hong KongOverall31.822.30.0039.74.70.006
Control arm28.016.00.00410.54.50.023
BTD/WTD split35.528.60.1368.74.90.117
BTD, better than dead; HK, HongKong; NL,The Netherlands; WTD, worse than dead.
Proportion tests were performed for the comparison between the inconsistency before using and after using feedback module in The Netherlands and in Hong Kong.
Fig. 4, Fig. 5 show the inconsistent responses (frequency) by distance scores in NL and HK for three different comparisons: (1) before and after FB module use; (2) before FB module use by the study arm; and (3) after FB module use by the study arm. The patterns were similar in both jurisdictions. No large systematic differences between the two study arms were detected in the type or number of inconsistencies observed. As expected, the illogical ordering of health states was more likely to occur when the distance between states was smaller. In HK, there was stronger negative correlation between distance scores and inconsistency rates in the experimental arm (separating the valuation of BTD/WTD tasks) compared with the control arm than in the NL (see Figures 4B and Figure 5B). With regard to inconsistency counts after FB module use by the study arm, there was decrease in the number of inconsistencies. However, the pattern was similar before/after FB, which suggested that the FB module did not target any specific type of inconsistency.

### Correlation Analysis of Inconsistency

Logistic regression analysis of Dutch data suggested that interviewer learning effects lowered the inconsistency rate (odds ratio [OR] 0.97; P value = 0.003). The impact of learning effects appeared to be smaller in the experimental arm when the BTD and WTD tasks were separated. No significant interviewer effects were observed. Study arm was a statistically significant predictor in a model (OR 0.24; P value = 0.013) that included an interaction term between learning effects and the study arm, but no statistically significant difference in performance was observed between the main effects model and the one including the interaction. In HK, no statistically significant predictor was observed in a similar model, and none of the independent variables was correlated with the inconsistency rate.

## Discussion

This study presented the findings on two proposed modifications of the protocol for EQ-5D-5L valuation studies. We explored whether the quality of cTTO data was improved (1) by moving all the valuation of the WTD states to the end of the cTTO task (separation of the valuation of the BTD and WTD states), and (2) by offering respondents the opportunity to review their responses and take the wrong ones out, if any, by using a FB module. Implementation of the FB module statistically significantly improved the consistency of cTTO data associated both in NL and in HK. The separation of the valuation of the BTD and WTD states did not have a clear effect, lowering the inconsistency rates in NL and increasing them in HK (although the effects were not statistically significant in either jurisdiction).
The effect of the FB module on the resulting inconsistency rate reached statistical significance, but it may have been influenced by other factors. It should be noted that the inconsistency metric used potentially overestimated the effect of the FB module on data consistency. Removal of any cTTO response would have lowered the number of possible inconsistencies, which, in turn, may have had an impact on the observed inconsistency rates, so randomly taking out responses would also result in improved consistency of the data. However, we also learned that implementation of the FB module was not straightforward. The FB module displayed complex information, and respondents already suffering from fatigue might not always have been able to handle it. It was up to the interviewer to guide respondents through the task. In the debriefing of our interviewers, we found out that they might have had had a notion of which answers were right and wrong based on logical consistency but struggled with finding the words to get a respondent to consider such inconsistencies without actually telling them that a particular response was “wrong.” Better initial guidance of interviewers perhaps could have promoted the FB module’s effect on inconsistency rates.
In spite of the above, there can be no prima facie difficulty with regard to the value of the FB module because the majority of respondents who removed responses by using the FB module had no inconsistent responses (i.e., no logical ordering was violated). In most cases, respondents merely wanted to indicate that they considered two adjacent states to be in the wrong rank position, and thus the FB module was used to correct errors or to overcome learning effects. Given the task complexity and uncertainty surrounding observed values, it makes sense to always allow people to review their own responses and allow them to identify the ones they consider wrong.
Contradictory results were obtained in NL and in HK with regard to the separation of the valuation of the BTD and WTD states in the cTTO task. Results in NL were promising, but not in HK. As we lack knowledge about the conditions required for successful implementation of the separation of the valuation of the BTD and WTD states, further study is recommended to find the explanation. It may be hypothesized that separation of the valuation of the BTD and WTD states can be associated with the learning curves of interviewers, cultural thinking process in the general population in different jurisdictions, and quality control (QC). As mentioned above, to overcome data quality issues and interviewer effects, a newly developed QC tool was used in the present study [
• Ramos- Goñi J.M.
• Oppe M.
• Slaap B.
• et al.
Quality control process for EQ-5D-5L valuation studies.
]. In HK, biweekly meetings with the interviewers were conducted to evaluate their performance and compliance in using the QC software. Feedback on the use of both the control and experimental arms were obtained. Comments on the split version were generally positive, given that respondents perceived the c-TTO tasks as easier to understand. However, frustrations were also observed as a result of the separation of the valuation of the BTD and WTD states resulting in an extended interview process, which might partially explain the increased inconsistencies in HK. Additionally, one could hypothesize that by obtaining two value judgments for a single state, using both the BTD and WTD approaches, we could achieve a measure of test–retest reliability, which may vary across study contexts when an interviewer-assisted valuation task is performed. Any potential benefit of initially avoiding the complex WTD task may then be forgone because the health state descriptions have to be re-read and judged again. The relevance of further study of the separation of the valuation of the BTD and WTD states depends on the objectives. If the main objective is to prevent poor quality data from being collected in any setting, more weight should be given to the negative results of the separation obtained in HK. If the objectives are to promote obtaining the highest quality data in each possible setting, it may be relevant to scrutinize the merits of separation of the BTD and WTD tasks. The FB module is not intended to make the cTTO questions easier or to facilitate the respondent in arriving at a valuation for a given health state—it simply helps respondents identify problematic responses ex post. The BTD–WTD split version, however, is intended to make the task easier, but it is not clear from these results whether it had the desired effect.
On the basis of the results of this study, the EuroQol Group released a new update of its protocol for EQ-5D-5L valuation studies. In this version of the protocol, the FB module has been implemented to allow people to review the response they give in the cTTO task. The current state of play is that five countries (England, China, Canada, NL, and Spain) have completed their valuation studies using the first version of standard protocol, and six others (Indonesia, South Korea, Thailand, Japan, Uruguay and Singapore) have completed their study using the standard version [
• Oppe M.
• Rand-Hendriksen K.
• Shah K.
• et al.
EuroQol protocols for time trade-off valuation of health outcomes.
]. Data collection using the modified version of the protocol has been completed in Germany and Indonesia and is underway in several other countries, including Hong Kong. All protocol updates thus far have aimed at capturing better data; no fundamental changes to the cTTO task itself have been implemented. Besides an intended reduction of random error or bias, comparability has been maintained.

## Conclusions

This study indicated that implementation of the FB module promotes data quality, justifying the subsequent decision to update the protocol for EQ-5D-5L valuation studies. By offering a detailed description of the FB module itself and how it has impacted on the results, we aim to promote transparency of EQ-5D-5L valuation data and ensure general access to the accumulated research expertise of the EuroQol Group.

## Funding

This study was supported by the EuroQol Research Foundation and the Health & Medical Research Fund from the Food and Health Bureau of Hong Kong, Hong Kong Special Administrative Region (HKSAR) (grant number: HMRF11120491). The study sponsors had no role in the analysis, and interpretation of data; writing the report; or in the decision to submit the article for publication.

## Conflict of Interest

The authors have indicated that they have no conflicts of interest with regard to the content of this article.

## Acknowledgements

We gratefully acknowledge Dr. Slaap Bernhard (Executive Director) and the scientific team of the EuroQol Research Foundation. We thank the district officers from 18 districts in Hong Kong for their help with recruitment. We are also grateful to Mr. Dicken Chan, Ms. Nicole Huang, Ms. Yeung Yeung Pan, Mr. Ringo Sze, and Mr. Sky Chan for their assistance during data collection in the study. Finally, we are grateful to all Hong Kong residents who participated in the survey. Without their participation and engagement, the study would not have succeeded.

## References

• Oppe M.
• Devlin N.J.
• van Hout B.
• et al.
A program of methodological research to arrive at the new international EQ-5D-5L valuation protocol.
Value Health. 2014; 17: 445-453
• Oppe M.
• Rand-Hendriksen K.
• Shah K.
• et al.
EuroQol protocols for time trade-off valuation of health outcomes.
Pharmacoeconomics. 2016; 34: 993-1004
• Patrick D.L.
• Starks H.E.
• Cain K.C.
• et al.
Measuring preferences for health states worse than death.
Med Decis Making. 1994; 14: 9-18
• Devlin N.J.
• Buckingham K.
• Shah K.
• et al.
A comparison of alternative variants of the lead and lag time TTO.
Health Econ. 2013; 22: 517-532
• Tilling C.
• Devlin N.
• Tsuchiya A.
• Buckingham K.
Protocols for time trade-off valuations of health states worse than dead: a literature review.
Med Decis Making. 2010; 30: 610-619
• Augustovski F.
• Rey-Ares L.
• Irazola V.
• et al.
Eur J Health Econ. 2013; 14: 25-31
• Janssen B.
• Oppe M.
• Versteegh M.M.
• Stolk E.A.
Introducing the composite time trade-off: a test of feasibility and face validity.
Eur J Health Econ. 2013; 14: S5-S13
• Lamers L.M.
The transformation of utilities for health states worse than death.
Med Care. 2007; 45: 238-244
• Devlin N.J.
• Shah K.K.
• Feng Y.
Valuing health-related quality of life: an EQ-5D-5L value set for England.
Health Econ. 2017; (Aug 22) ([Epub ahead of print])https://doi.org/10.1002/hec.3564
• Ramos-Goñi J.
• Rivero-Arias O.
• Errea M.
• et al.
Dealing with the health state ‘dead’ when using discrete choice experiments to obtain values for EQ-5D-5L health states.
Eur J Health Econ. 2013; 14: S33-S42
• Versteegh M.M.
• Attema A.E.
• Oppe M.
• et al.
Time to tweak the TTO: results from a comparison of alternative specifications of the TTO.
Eur J Health Econ. 2013; 14: S43-S51
• Xie F.
• Pullenayegum E.
• Bansback N.
• et al.
The Canadian EQ-5D-5L valuation study: an exploratory analysis.
Proceedings of the 30th Scientific Plenary Meeting of the EuroQol Group, 2013: 1-17
• Luo N.
• Li M.
• Stolk E.A.
• Devlin N.J.
The effects of lead time and visual aids in TTO valuation: a study of the EQ-VT framework.
Eur J Health Econ. 2013; 14: S15-S24
• Shah K.
• Rand-Hendriksen K.
• Ramos J.M.
• et al.
Improving the quality of data collected in EQ-5D-5L valuation studies: a summary of the EQ-VT research methodology programme.
Proceedings of the 31st Scientific Plenary Meeting of the EuroQol Group, 2014: 1-18
• Devlin N.J.
• Krabbe P.
The development of new research methods for the valuation of EQ-5D-5L.
Eur J Health Econ. 2013; 14: S1-S3
• Wong E.L.
• Yeoh E.K.
• Slaap B.
• et al.
Validation and valuation of the preference-based health index using EQ-5D-5L in the Hong Kong Population.
Value Health. 2015; 18: A27
• Ramos-Goñi J.M.
• Rand-Hendriksen K.