Introduction
The decision making around the reimbursement of newly developed drugs often involves cost-effectiveness analyses underpinned by health economic (HE) decision models [
1- Franken M.
- Nilsson F.
- Sandmann F.
- et al.
Unravelling drug reimbursement outcomes: a comparative study of the role of pharmacoeconomic evidence in Dutch and Swedish reimbursement decision making.
,
2- Hoomans T.
- Severens J.
- Roer N.
- Delwel G.
Methodological quality of economic evaluations of new pharmaceuticals in the Netherlands.
]. HE decision models, like all simulation models, require validation to ensure the credibility of their outcomes [
[3]ISPOR-AMCP-NPC Modeling CER Task Forces
Questionnaire to assess relevance and credibility of modeling studies for informing health care decision making: an ISPOR-AMCP-NPC Good Practice Task Force report.
]. Validation may be described as “the act of evaluating whether a model is a proper and sufficient representation of the system it is intended to represent, in view of a specific application” [
[4]- Vemer P.
- Krabbe P.F.M.
- Feenstra T.L.
- et al.
Improving model validation in health technology assessment: comments on guidelines of the ISPOR-SMDM Modeling Good Research Practices Task Force.
]. A model that is in accordance with what is known about the system is said to be “proper,” and a model whose results can serve as a solid basis for decision making is said to be “sufficient.” Models that have not been properly validated could deliver invalid results, and hence lead to biased decisions in drug reimbursement or other areas of health policy applying the results of HE decision models.
Different guidelines for the validation of HE models can be found in the literature, but these are not very specific about the operationalization of validation efforts [
3ISPOR-AMCP-NPC Modeling CER Task Forces
Questionnaire to assess relevance and credibility of modeling studies for informing health care decision making: an ISPOR-AMCP-NPC Good Practice Task Force report.
,
5- Eddy D.M.
- Hollingworth W.
- Caro J.J.
- et al.
Model transparency and validation: a report of the ISPOR-SMDM Modeling Good Research Practices Task Force-7.
]. The validation assessment tool AdViSHE adds to these guidelines by being a tool for structured reporting on all relevant aspects of validation (conceptual model, input data, implemented software program, and model outcomes) but does not indicate any particular methodology [
[4]- Vemer P.
- Krabbe P.F.M.
- Feenstra T.L.
- et al.
Improving model validation in health technology assessment: comments on guidelines of the ISPOR-SMDM Modeling Good Research Practices Task Force.
]. In this article, we provide further details on one of these aspects: the validation of HE model outcomes against empirical data. When the empirical data are not used to estimate the input parameters of the model, this is often called independent or external validation. Otherwise, the validation is called dependent or internal [
[5]- Eddy D.M.
- Hollingworth W.
- Caro J.J.
- et al.
Model transparency and validation: a report of the ISPOR-SMDM Modeling Good Research Practices Task Force-7.
]. Although statistical testing seems applicable to assess the validity of HE model outcomes in a setting of uncertain observations of possibly variable outcomes, actual applications often present comparisons in an informal way involving subjective judgment of graphs and point estimates [
6- Hammerschmidt T.
- Goertz A.
- Wagenpfeil S.
- et al.
Validation of health economic models: the example of EVITA.
,
7- Hoerger T.J.
- Wittenborn J.S.
- Segel J.E.
- et al.
A health policy model of CKD: 1. model construction, assumptions, and validation of health consequences.
,
8- Sendi P.P.
- Craig B.A.
- Pfluger D.
- et al.
Systematic validation of disease models for pharmacoeconomic evaluations.
,
9- Siebert U.
- Sroczynski G.
- Hillemanns P.
- et al.
The German Cervical Cancer Screening Model: development and validation of a decision-analytic model for cervical cancer screening in Germany.
].
In statistics, accuracy is defined as the combination of (lack of) bias and variance [
[10]- Hastie T.
- Tibshirani R.
- Friedman J.
The Elements of Statistical Learning.
]. Bias is the difference between the expected value of the outcome predicted by a model and its actual empirical value. Variance is any measure of variability of a model outcome. This is a broader concept and does not necessarily refer to the statistical variance of a set of data or a random variable. In prediction modeling, it is common to talk about bias-variance trade-off when discussing prediction errors [
[10]- Hastie T.
- Tibshirani R.
- Friedman J.
The Elements of Statistical Learning.
]. In general, bias is reduced and variance is increased as more parameters are added to a model. Large variance means that relatively large differences are observed in the model outcomes with little additional input data. The “classical” trade-off problem consists of exploring different levels of model complexity and choosing the one minimizing the overall prediction error [
[10]- Hastie T.
- Tibshirani R.
- Friedman J.
The Elements of Statistical Learning.
]. Nevertheless, in the setting of HE model validation and in this article, the assumption is made that the simulation model is a given and it is investigated whether the current levels of bias and variance are acceptable for decision-making purposes.
Furthermore, in HE decision modeling the term
uncertainty is frequently used instead of variance. Uncertainty can also be used in a wide sense to refer to any measure of variability of HE model outcomes. According to Briggs et al. [
[11]- Briggs A.H.
- Weinstein M.C.
- Fenwick E.A.L.
- et al.
ISPOR-SMDM Modeling Good Research Practices Task Force
Model parameter estimation and uncertainty: a report of the ISPOR-SMDM Modeling Good Research Practices Task Force-6.
], four different types of uncertainty can be distinguished in HE modeling. Although all these types of uncertainty are important, in this article we focus on parameter uncertainty (the uncertainty in model outcomes resulting from uncertainty in the estimation process of the input parameters of an HE model). In HE models, parameter uncertainty is represented in different ways, usually by an uncertainty range containing the predicted model point estimate, a cost-effectiveness plane showing the results of a probabilistic sensitivity analysis (PSA), or a cost-effectiveness acceptability curve [
12Handling uncertainty in cost-effectiveness models.
,
13- Van Hout B.A.
- Al M.J.
- Gordon G.S.
- Rutten F.F.H.
Costs, effects and C/E-ratios alongside a clinical trial.
]. Although value of information methods are widely applied to determine whether the uncertainty in HE model outcomes is acceptable for proper decision making [
[14]- Corro Ramos I.
- Rutten-van Mölken M.P.M.H.
- Al M.J.
The role of value-of-information analysis in a health care research priority setting: a theoretical case study.
], model bias can be assessed only by comparing HE model outcomes with empirical data.
The aim of this article was twofold. First, the applicability of several existing validation techniques for comparing HE model outcomes with empirical data is discussed, with special focus on statistical testing. After that, a new method for operational validation is proposed which is aimed at establishing how well HE model outcomes compare with empirical data. In this new method, a level of accuracy that the HE model outcomes should meet is set in advance. The basic idea behind it is rather straightforward: If the model result falls within the limits determined by the required accuracy, then the model result is considered valid. The proportion of valid results obtained in a PSA defines a quantitative measure of the validity of the HE model. Our method is embodied in a Bayesian framework, which allows defining such a validity measure as a probability distribution.
Discussion
The validation of HE models involving comparison of HE model outcomes against empirical observations is of key importance [
4- Vemer P.
- Krabbe P.F.M.
- Feenstra T.L.
- et al.
Improving model validation in health technology assessment: comments on guidelines of the ISPOR-SMDM Modeling Good Research Practices Task Force.
,
37- Vemer P.
- Corro Ramos I.
- van Voorn G.A.K.
- et al.
AdViSHE: a validation-assessment tool of health-economic models for decision makers and model users.
]. Actual validation efforts quite often present results in an informal way, involving subjective judgment of graphs or comparison of point estimates. Guidelines for validation of HE models lack specific guidance for the operationalization of statistical validation efforts, whereas various quantitative/statistical metrics to assess consistency of model outcomes and empirical data seem to be in use [
15- Dini F.L.
- Ballo P.
- Badano L.
- et al.
Validation of an echo-Doppler decision model to predict left ventricular filling pressure in patients with heart failure independently of ejection fraction.
,
16- Kalogeropoulos A.
- Psaty B.M.
- Vasan R.S.
- et al.
Cardiovascular Health Study
Validation of the health ABC heart failure model for incident heart failure risk prediction: the Cardiovascular Health Study.
,
17Uncertainty and validation of health economic decision models.
,
18- McEwan P.
- Foos V.
- Palmer J.L.
- et al.
Validation of the IMS CORE Diabetes Model.
,
19- Pagano E.
- Gray A.
- Rosato R.
- et al.
Prediction of mortality and macrovascular complications in type 2 diabetes: validation of the UKPDS outcomes model in the Casale Monferrato Survey, Italy.
,
20- Palmer A.J.
- Roze S.
- Valentine W.J.
- et al.
Computer modeling of diabetes and its complications: a report on the Fourth Mount Hood Challenge Meeting.
,
21- Palmer A.J.
- Roze S.
- Valentine W.J.
- et al.
Validation of the CORE Diabetes Model against epidemiological and clinical studies.
,
22The Mount Hood 5 Modeling Group
Computer modeling of diabetes and its complications: a report on the Fifth Mount Hood Challenge Meeting.
,
23- Perreault S.
- Levinton C.
- Laurier C.
- et al.
Validation of a decision model for preventive pharmacological strategies in postmenopausal women.
]. Confidence intervals or hypothesis tests are the preferred statistical methods to assess validation of HE models. Nevertheless, several arguments exist against applying confidence intervals or hypothesis tests as set out in previous sections [
31Law AM, McComas MG. How to build valid and credible simulation models. In: Peters BA, Smith JS, Medeiros DJ, Rohrer MW, eds., Proceedings of the 2001 Winter Simulation Conference, IEEE, New York, 2001;22-9.
,
32Sargent RG. Verification, validation, and accreditation of simulation models. In: Joines JA, Barton RR, Kang K, Fishwick PA, eds., Proceedings of the 2000 Winter Simulation Conference, IEEE, New York, 2000;50–9.
,
33Sargent RG. Some approaches and paradigms for verifying and validating simulation models. In: Peters BA, Smith JS, Medeiros DJ, Rohrer MW, eds., Proceedings of the 2001 Winter Simulation Conference, IEEE, New York, 2001;106–14.
,
34Sargent RG. Validation and verification of simulation models. In: Ingalls RG, Rossetti MD, Smith JS, Peters BA, eds., Proceedings of the 2004 Winter Simulation Conference, IEEE, New York, 2004;17–28.
].
The method for operational validation proposed in the present article departs from classical statistical techniques. It aims at establishing how well HE model outcomes compare with empirical data by quantifying the degree of validity statistically. Model accuracy is defined as the combination of bias and parameter uncertainty, and our method is concerned with determining whether the current model bias and parameter uncertainty are acceptable for decision making. Note that when the model is biased, reducing parameter uncertainty will reduce the resulting degree of validity when our method is applied, because less PSA outcomes may fall within the accuracy interval.
In our method, validity is operationalized in a Bayesian way. Because in principle a PSA should provide a large number of observations, the choice of a prior distribution should hardly have any influence on the posterior distribution. If new data become available, an HE model could be validated iteratively, by re-estimating the expected posterior probability that the model result will be regarded as valid by the decision maker. The first consequence of having new data is that the target accuracy interval may change. To properly fit with the idea of validation against independent data (i.e., the data used to validate an HE model preferably should not have been used to obtain the model estimates), in a first step an independent validation against the new data could be performed [
15- Dini F.L.
- Ballo P.
- Badano L.
- et al.
Validation of an echo-Doppler decision model to predict left ventricular filling pressure in patients with heart failure independently of ejection fraction.
,
16- Kalogeropoulos A.
- Psaty B.M.
- Vasan R.S.
- et al.
Cardiovascular Health Study
Validation of the health ABC heart failure model for incident heart failure risk prediction: the Cardiovascular Health Study.
,
17Uncertainty and validation of health economic decision models.
,
19- Pagano E.
- Gray A.
- Rosato R.
- et al.
Prediction of mortality and macrovascular complications in type 2 diabetes: validation of the UKPDS outcomes model in the Casale Monferrato Survey, Italy.
,
38The frequency of cervical cancer screening: comparison of a mathematical model with empirical data.
]. In a second step, if the model is deemed invalid, the new data could be incorporated into the HE model. The development of an HE model in such a way can also be regarded as a Bayesian process, in which unknown model parameters have statistical prior distributions. Fitting the model to the new empirical data implies that these prior distributions would be updated to posterior distributions that combine the previous information with the new empirical data, resulting in a refitted model. Because the input parameters would be updated, the PSA should be run again (new likelihood). Note that at this point, only dependent (internal) validation is possible because all the available data have been included in the model. The posterior distribution before the new data were available would be the prior now, and on the basis of the new PSA, a new posterior would be obtained that would be compared against the new accuracy interval. This process will increase the model’s validation status. If new empirical data were available, then this process of refitting the model and validating against external data can be repeated until the model is deemed valid.
Decision makers should establish the required accuracy level beforehand, ideally in collaboration with stakeholders [
[39]- Van Voorn G.A.K.
- Vemer P.
- Hamerlijnck D.
- et al.
The Missing Stakeholder Group: why patients should be involved in health economic modelling.
], because they will judge the validity of the HE model results and will have to use these results in their decisions. How to define an accuracy interval might be hard in practice. As a good starting point, the confidence interval of the empirical data, provided that it exists, could be used as guidance for decision makers, because a confidence interval is the most common form of presenting variability for empirical data, and this reflects common practice, for example, in prediction modeling. Nevertheless, a range of accuracy intervals, as shown in our case study, could also be defined when the empirical data do not directly point at a certain interval or when the interval is too wide to be informative. Note also that decision-maker requirements on validation do not need to be based on empirical studies. In this article, empirical data have been chosen for defining accuracy interval as an illustrative example, because it is probably the most straightforward one. Our method, however, can still be used for other types of accuracy intervals, for instance, on the basis of the result of indirect comparisons or network meta-analyses. Our approach results in a posterior distribution around the probability that the HE model outcome will be regarded as valid by the decision maker. This can be reported as an expected value with a credible interval, or graphically plotting this posterior probability for different accuracy intervals, as shown in
Figure 2. This figure resembles a cost-effectiveness acceptability curve [
12Handling uncertainty in cost-effectiveness models.
,
13- Van Hout B.A.
- Al M.J.
- Gordon G.S.
- Rutten F.F.H.
Costs, effects and C/E-ratios alongside a clinical trial.
], but it reflects the probability that the model is considered valid for this specific outcome.
Independent validation may raise some practical problems because it is common practice to build HE models on the basis of all the best evidence available [
[40]- Weinstein M.C.
- O’Brien B.
- Hornberger J.
- et al.
Principles of good practice of decision analytic modeling in health care evaluation: report of the ISPOR Task Force on Good Research Practices—modeling studies.
]. Thus, there may be no data to validate the HE model independently. In that case, the validation is called dependent. With sufficient data available, cross-validation techniques exist to keep some of the data for validation purposes. In practice, different parameters of HE models are often estimated on the basis of different sets of empirical data or literature sources. As a result, cross-validation techniques may be less applicable, because these mainly work when a single data set is used as the main source for all model parameters.
In our case study, we have validated the outcome “number of diabetic patients with ESRD.” In HE models, this type of outcomes is referred as intermediate, as opposed to a final or main model outcome, which is usually reported in the form of an ICER. HE models should calculate and report enough intermediate outcomes to ensure that the validation process is useful. It is important to emphasize that each of these intermediate outcomes may influence the ICER in a different way. Therefore, the accuracy required for the different intermediate outcomes is not likely to be the same for all of them. Depending on how sensitive the ICER is to changes on each intermediate outcome, it may be reasonable to ask for more or less accuracy for some intermediate outcomes. For example, in our case study, if the outcome “number of diabetic patients with ESRD” had a small impact on the ICER, low accuracy can be required. In that case, the accuracy interval could be wide (e.g., 25% or 50% deviation from the observed number of patients), and given the expected posterior probability in
Table 2, a decision maker should most likely consider the model outcome valid. In contrast, with a large impact on the ICER, and a higher required accuracy, the model outcome would be considered invalid.
In this article, we have focused on just one model outcome. In general there would be more than one, and if we aim to give an overall measure of validation, all the outcomes should be compared simultaneously. In a classical frequentist setting, this would require simultaneous hypothesis testing, which can be difficult in general because many assumptions (normality, independence, etc.) have to be checked. In our setting, checking these assumptions is not an issue. Nevertheless, because accuracy intervals are defined separately for each model outcome, to get a high probability of overall validity may require a very high probability for each outcome, which might be unrealistic to reach in practice and not required if their separate effect on the ICER is considered. How to handle this situation is a topic for further research.
In a cohort model there is often a clear timing over the disease’s course. Most of these models start with a cohort of (usually newly diagnosed) patients, for example, of the same age and with the same duration of disease, which are then updated after every model cycle. This is not necessarily the case and cohorts based on real-world data, such as the renal registry used in our case study, can represent mixtures of patients with various disease durations and of different ages. In the model used in our case study, patients represent a cohort of typical Dutch diabetic patients with average disease duration (10 years). In that sense, the model deviates from a typical Markov cohort model, because the starting population is not a cohort of newly diagnosed cases. Outcomes that are aggregated over time should be validated with extra caution because a model might result in an overall valid value while having an invalid distribution over the different years. The latter could potentially have serious implications for costs and other outcomes. Nevertheless, it is important to emphasize that the outcome in our case study represents the total annual number of new ESRD cases for the entire diabetic Dutch population but not the total number over the model’s time horizon. In particular, it was considered that the number of patients with incident ESRD in 2003 (from the renal registry) could be compared with the model outcome, because both reflected the number of new ESRD cases in a year for the total diabetic Dutch population in 2003. Thus, time dependency is not an issue for the outcome chosen in our case study (because it was generated using a model time horizon of only 1 year). To check the validity of time-dependent outcomes, observations from empirical data over time (since the disease start and over the disease progression pathway) are needed. Given that information, in principle our method could be applied to identify at which time points the model outcome is deemed valid. How to define the overall validity of the outcome over the entire time horizon can present similar difficulties to those discussed when assessing multiple outcomes (e.g., it might be defined as being valid at all time points, but accuracy requirements might be different in time too). Therefore, this is also considered a subject for further research. Nevertheless, as shown earlier, careful selection of outcomes and matching of empirical data can partly avoid the problem.
Article info
Publication history
Published online: May 29, 2017
Copyright
© 2017 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc.