## Abstract

### Background

### Objectives

### Methods

### Results

### Conclusions

## Keywords

## Introduction

*uncertainty*is frequently used instead of variance. Uncertainty can also be used in a wide sense to refer to any measure of variability of HE model outcomes. According to Briggs et al. [

## Methods

### Using the Statistical Methods Right

*not*be used, but 83% to 84% confidence intervals should [

### Using the Right Statistical Methods

### Formalizing Concepts: Defining a Quantitative Measure for Operational Validity

*X*

_{1}, …,

*X*

_{100}denote the observations of a certain outcome (e.g., hospital length of stay in weeks). The mean, SD, and standard error (SE) of the outcome can be estimated from these 100 observations (e.g., X̅=0.479, ${\mathrm{SD}}_{X}=0.149$, ${\mathrm{SE}}_{X}=0.0149$), and thus a 95% confidence interval for the mean is CI

_{X}= 0.450 to 0.508. A cohort HE model, for which a PSA has been run to address parametric uncertainty, results in say 250 means (Y̅

_{1},Y̅

_{2},...,Y̅

_{250}) and an estimate of the sample mean that is obtained from these 250 replications (e.g., Y̅=0.478). The SD obtained from the 250 replications is in fact the SE of the mean (SE

_{Y̅}) and a 95% uncertainty range for the mean of the outcome, usually given by the simulated 2.5% and 97.5% percentiles, is $0.446\phantom{\rule{.25em}{0ex}}\text{to}\phantom{\rule{.25em}{0ex}}\phantom{\rule{.25em}{0ex}}0.505$. Although the empirical confidence interval and the model uncertainty range are usually compared for overlap, it is important to emphasize that comparing (as in a formal

*t*test) observed outcomes

*X*

_{1}, …,

*X*

_{100}with means of the PSA samples Y̅

_{1},Y̅

_{2},...,Y̅

_{250}is technically incorrect. The 100 values for

*X*represent individual observations, whereas the values for

*Y*represent 250 means that might have been obtained if input values had been slightly different. An extended version of this section can be found in Appendix A, in which the concepts introduced here are discussed for both cohort and patient-level models.

_{X}. The number of times (or proportion) that the simulated value is within the target confidence interval would give a quantitative measure of the validity of the model outcome. This notion is the basis for our new method presented in the next section.

### An Alternative Bayesian Method for Validating HE Model Outcomes

*k*> 1 patients. The outcome of interest is denoted by

*X*

_{1},

*X*

_{2}, …,

*X*

_{k}. The average over

*k*patients is then $\overline{\mathrm{X}}=\frac{1}{\mathrm{k}}{\sum}_{\mathrm{j}=1}^{\mathrm{k}}{\mathrm{X}}_{\mathrm{j}}$. Suppose also that, on the basis of the same empirical data, an interval containing this average and reflecting the required level of accuracy for the HE model results can be set, for example, by the person evaluating the validation status of the model (such as a decision maker). Such an interval will be referred to as

*accuracy interval*, and will be denoted by AI

_{X}. When empirical data are collected from a clinical trial, the confidence interval for the empirical data is a reasonable choice for the accuracy interval. Nevertheless, such a confidence interval may not always be available, for example, when input data are derived from the combination of several published sources that did not report empirical confidence intervals. In that situation, an alternative accuracy interval has to be provided. A simple “what if” situation allowing a certain deviation (e.g., 1%, 5%, or 10%) from the empirical average X̅ could be applied.

*n*times (

*n*> 1) from a PSA, so that Y̅

_{1},Y̅

_{2},...Y̅

_{n}denote the

*n*simulated mean values for our outcome of interest. The validation rule proposed is that a decision maker would regard a model result as valid only if a realization of the model outcome ${\overline{\mathrm{y}}}_{\mathrm{i}}(\mathrm{i}=1,\dots ,\mathrm{n})$ is in the interval AI

_{X}. We will denote this as ${\mathrm{A}}_{\mathrm{i}}={\mathrm{I}}_{\left\{{\overline{\mathrm{y}}}_{\mathrm{i}}\in {\mathrm{AI}}_{\mathrm{X}}\right\}}$, where

*I*denotes the indicator function, so that ${\mathrm{A}}_{\mathrm{i}}=1$ if ${\overline{\mathrm{y}}}_{\mathrm{i}}\in {\mathrm{AI}}_{\mathrm{X}}$ and ${A}_{i}=0$ otherwise. Assuming that a realization of ${A}_{i}$ can be considered as the result of a Bernoulli trial, we can write:

where p is the probability that the HE model outcome will be regarded as valid by the decision maker. We assume then that

*p*is a random variable following a prior beta distribution with parameters

*α*and

*β*. If we assume that the result of a full PSA can be regarded as a binomial process of size

*n*where we will observe

*s*successes (

*s*times the model result will be considered valid) and

*n*−

*s*failures, the posterior distribution of

*p*is then also beta but with updated parameters

*α*+

*s*and

*β*+

*n*–

*s*[].

### Case Study

## Results

### New Method in Practice

*α*= 1 and

*β*= 1, which means that our previous belief is that the probability that the model outcome is valid for the decision maker is 0.5 with high uncertainty, represented by the 95% credible interval 0.025 to 0.975). After a PSA is run, the probability that the model outcome is valid is updated to 0.58 and the uncertainty is reduced because the posterior 95% credible interval is 0.308 to 0.833), which is much narrower than the previous one.

Prior beta (α = 1, β = 1) | PSA replication (${\mathit{A}}_{\mathit{i}}=1$ if ${\overline{\mathit{y}}}_{\mathit{i}}\in \mathbf{A}{\mathbf{I}}_{\mathit{X}}$ and ${\mathit{A}}_{\mathit{i}}=0$ otherwise) | Posterior beta (α' = 7, β' = 5) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

P[A =1] = α/(α + β) = 0.5 | ${\overline{y}}_{1}$ | ${\overline{y}}_{2}$ | ${\overline{y}}_{3}$ | ${\overline{y}}_{4}$ | ${\overline{y}}_{5}$ | ${\overline{y}}_{6}$ | ${\overline{y}}_{7}$ | ${\overline{y}}_{8}$ | ${\overline{y}}_{9}$ | ${\overline{y}}_{10}$ | P[A = 1 | S = 6] = α'/(α' + β') = 0.58 |

95% CI = 0.025–0.975 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | |

95% CI = 0.308–0.833 |

*Notes. A*= 1 when the outcome is considered valid.

*S*, number of valid PSA outcomes; ${\overline{y}}_{i}$, PSA outcome.

### Case Study

*β*= 1 from the prior distribution). This will also set the upper limit for the

*x*-axis in Figure 2, in which the posterior probability that the model outcome is considered valid (with 95% credible intervals) has been plotted assuming different accuracy intervals ranging from 0% to 80% deviation from the observed value (we have chosen 80% to show that beyond 75% nothing changes). This confirms that indeed a high probability of a valid outcome is associated with relatively wide accuracy intervals. Whether this represents a problem for decision making depends on the implications of the current outcome (i.e., “number of patients with diabetes who are on dialysis or with end-stage renal disease [ESRD]”) for the model main outcome, usually the incremental cost-effectiveness ratio (ICER). This will, for example, depend on the costs, utility, and life-years lost associated with this outcome.

Deviation from observed number of patients (%) | AI | α' | β' | Expected (posterior) validity | Posterior validity 95% CI | |
---|---|---|---|---|---|---|

Lower limit | Upper limit | |||||

1 | 274.23 | 279.77 | 8 | 293 | 0.027 | 0.012–0.047 |

5 | 263.15 | 290.85 | 36 | 265 | 0.120 | 0.085–0.159 |

10 | 249.30 | 304.70 | 83 | 218 | 0.276 | 0.227–0.328 |

25 | 207.75 | 346.25 | 203 | 98 | 0.674 | 0.621–0.726 |

50 | 138.50 | 415.50 | 287 | 14 | 0.953 | 0.927–0.974 |

75 | 69.25 | 484.75 | 300 | 1 | 0.997 | 0.998–1.000 |

76 | 66.48 | 487.52 | 300 | 1 | 0.997 | 0.998–1.000 |

80 | 55.40 | 498.60 | 300 | 1 | 0.997 | 0.998–1.000 |

*α*', number of PSA results within the accuracy interval + 1 (note that “+1” is added because as a prior distribution a beta with parameters

*α*= 1 and

*β*= 1 was chosen, but this is just one possibility);

*β*', number of PSA results outside the accuracy interval + 1; CI, credible interval; ESRD, end-stage renal disease; PSA, probabilistic sensitivity analysis.

*α*' and

*β*'.

*α*' and

*β*'.

## Discussion

## Conclusions

## Supplemental Materials

Supplemental Materials

## References

- Unravelling drug reimbursement outcomes: a comparative study of the role of pharmacoeconomic evidence in Dutch and Swedish reimbursement decision making.
*Pharmacoeconomics.*2013; 31: 781-797 - Methodological quality of economic evaluations of new pharmaceuticals in the Netherlands.
*Pharmacoeconomics.*2012; 30: 219-227 - Questionnaire to assess relevance and credibility of modeling studies for informing health care decision making: an ISPOR-AMCP-NPC Good Practice Task Force report.
*Value Health.*2014; 17: 174-182 - Improving model validation in health technology assessment: comments on guidelines of the ISPOR-SMDM Modeling Good Research Practices Task Force.
*Value Health.*2013; 16: 1106-1107 - Model transparency and validation: a report of the ISPOR-SMDM Modeling Good Research Practices Task Force-7.
*Med Decis Making.*2012; 32: 733 - Validation of health economic models: the example of EVITA.
*Value Health.*2003; 6: 551-559 - A health policy model of CKD: 1. model construction, assumptions, and validation of health consequences.
*Am J Kidney Dis.*2010; 55: 452-462 - Systematic validation of disease models for pharmacoeconomic evaluations.
*J Eval Clin Pract.*1999; 5: 283-295 - The German Cervical Cancer Screening Model: development and validation of a decision-analytic model for cervical cancer screening in Germany.
*Eur J Public Health.*2006; 16: 185-192 - The Elements of Statistical Learning.Springer, New York, NY2001
- Model parameter estimation and uncertainty: a report of the ISPOR-SMDM Modeling Good Research Practices Task Force-6.
*Value Health.*2012; 15: 835-842 - Handling uncertainty in cost-effectiveness models.
*Pharmacoeconomics.*2000; 17: 479-500 - Costs, effects and C/E-ratios alongside a clinical trial.
*Health Econ.*1994; 3: 309-319 - The role of value-of-information analysis in a health care research priority setting: a theoretical case study.
*Med Decis Making.*2013; 33: 472-489 - Validation of an echo-Doppler decision model to predict left ventricular filling pressure in patients with heart failure independently of ejection fraction.
*Eur J Echocardiogr.*2010; 11: 703-710 - Validation of the health ABC heart failure model for incident heart failure risk prediction: the Cardiovascular Health Study.
*Circ Heart Fail.*2010; 3: 495-502 - Uncertainty and validation of health economic decision models.
*Health Econ.*2010; 19: 43-55 - Validation of the IMS CORE Diabetes Model.
*Value Health.*2014; 17: 714-724 - Prediction of mortality and macrovascular complications in type 2 diabetes: validation of the UKPDS outcomes model in the Casale Monferrato Survey, Italy.
*Diabetologia.*2013; 56: 1726-1734 - Computer modeling of diabetes and its complications: a report on the Fourth Mount Hood Challenge Meeting.
*Diabetes Care.*2007; 30: 1638-1646 - Validation of the CORE Diabetes Model against epidemiological and clinical studies.
*Curr Med Res Opin.*2004; 20: S27-S40 - Computer modeling of diabetes and its complications: a report on the Fifth Mount Hood Challenge Meeting.
*Value Health.*2013; 16: 670-685 - Validation of a decision model for preventive pharmacological strategies in postmenopausal women.
*Eur J Epidemiol.*2005; 20: 89-101 *Validation of the Economic and Health Outcomes Model of Type 2 Diabetes Mellitus (ECHO-T2DM). Pharmacoeconomics.*2017; 35: 375- Empirically evaluating decision-analytic models.
*Value Health.*2010; 13: 667-674 - Contrasting diversity values: statistical inferences based on overlapping confidence intervals.
*PLoS One.*2013; 8: e56794 - Overlapping confidence intervals or standard error intervals: What do they mean in terms of statistical significance?.
*J Insect Sci.*2003; 3: 34 - A subcutaneous insulin pharmacokinetic model for computer simulation in a diabetes decision support role: validation and simulation.
*J Diabetes Sci Technol.*2008; 2: 672-680 - Policy evaluation in diabetes prevention and treatment using a population-based macro simulation model: the MICADO model.
*Diabet Med.*2015; 32: 1580-1587 - The development and validation of a decision-analytic model representing the full disease course of acute myeloid leukemia.
*Pharmacoeconomics.*2013; 31: 605-621 Law AM, McComas MG. How to build valid and credible simulation models. In: Peters BA, Smith JS, Medeiros DJ, Rohrer MW, eds., Proceedings of the 2001 Winter Simulation Conference, IEEE, New York, 2001;22-9.

Sargent RG. Verification, validation, and accreditation of simulation models. In: Joines JA, Barton RR, Kang K, Fishwick PA, eds., Proceedings of the 2000 Winter Simulation Conference, IEEE, New York, 2000;50–9.

Sargent RG. Some approaches and paradigms for verifying and validating simulation models. In: Peters BA, Smith JS, Medeiros DJ, Rohrer MW, eds., Proceedings of the 2001 Winter Simulation Conference, IEEE, New York, 2001;106–14.

Sargent RG. Validation and verification of simulation models. In: Ingalls RG, Rossetti MD, Smith JS, Peters BA, eds., Proceedings of the 2004 Winter Simulation Conference, IEEE, New York, 2004;17–28.

- Bayesian Statistics: An Introduction.Wiley, Hoboken, NJ2012
- Statistical Report 2007.Renal Registry RENINE, The Netherlands2007
- AdViSHE: a validation-assessment tool of health-economic models for decision makers and model users.
*Pharmacoeconomics.*2016; 34: 349-361 - The frequency of cervical cancer screening: comparison of a mathematical model with empirical data.
*Cancer.*1987; 60: 1117-1122 - The Missing Stakeholder Group: why patients should be involved in health economic modelling.
*Appl Health Econ Health Policy.*2016; 14: 129-133 - Principles of good practice of decision analytic modeling in health care evaluation: report of the ISPOR Task Force on Good Research Practices—modeling studies.
*Value Health.*2003; 6: 9-17

## Article info

### Publication history

### Identification

### Copyright

### User license

Elsevier user license |## Permitted

### For non-commercial purposes:

- Read, print & download
- Text & data mine
- Translate the article

## Not Permitted

- Reuse portions or extracts from the article in other works
- Redistribute or republish the final article
- Sell or re-use for commercial purposes

Elsevier's open access license policy