Advertisement
Themed Section: Assesing the Value of Next-Generation Sequencing| Volume 21, ISSUE 9, P1048-1053, September 2018

Download started.

Ok

Using “Big Data” in the Cost-Effectiveness Analysis of Next-Generation Sequencing Technologies: Challenges and Potential Solutions

Open ArchivePublished:August 17, 2018DOI:https://doi.org/10.1016/j.jval.2018.06.016

      Abstract

      Next-generation sequencing (NGS) is considered to be a prominent example of “big data” because of the quantity and complexity of data it produces and because it presents an opportunity to use powerful information sources that could reduce clinical and health economic uncertainty at a patient level. One obstacle to translating NGS into routine health care has been a lack of clinical trials evaluating NGS technologies, which could be used to populate cost-effectiveness analyses (CEAs). A key question is whether big data can be used to partially support CEAs of NGS. This question has been brought into sharp focus with the creation of large national sequencing initiatives. In this article we summarize the main methodological and practical challenges of using big data as an input into CEAs of NGS. Our focus is on the challenges of using large observational datasets and cohort studies and linking these data to the genomic information obtained from NGS, as is being pursued in the conduct of large genomic sequencing initiatives. We propose potential solutions to these key challenges. We conclude that the use of genomic big data to support and inform CEAs of NGS technologies holds great promise. Nevertheless, health economists face substantial challenges when using these data and must be cognizant of them before big data can be confidently used to produce evidence on the cost-effectiveness of NGS.

      Keywords

      Introduction

      Next-generation sequencing (NGS) techniques have resulted in new genomic-based tests that can inform the diagnosis of rare, genetic diseases and guide treatment decisions for certain types of cancer. However, few phase III clinical trials have been undertaken in the NGS space. This limits the availability of data to inform cost-effectiveness analyses (CEAs) and presents a barrier to translating NGS into routine clinical practice [
      • Schwarze K.
      • Buchanan J.
      • Taylor J.C.
      • et al.
      Are whole-exome and whole-genome sequencing approaches cost-effective? A systematic review of the literature.
      ]. An alternative approach could be to use “big data”—often described as having 5 V’s: volume, velocity, variety, veracity, and value—to inform economic evaluations of NGS technologies [
      • Mehta N.
      • Pandit A.
      Concurrence of big data analytics and healthcare: a systematic review.
      ]. Emerging uses for big data in health care include the development of prediction (diagnostic or prognostic) models and observational studies comparing health care interventions [
      • Collins B.
      Big data and health economics: strengths, weaknesses, opportunities and threats.
      ].
      NGS is considered to be a prominent example of big data because of the quantity and complexity of data it produces [
      • Phillips K.A.
      • Trosman J.R.
      • Kelley R.K.
      • et al.
      Genomic sequencing: assessing the health care system, policy, and big-data implications.
      ] and because it presents an opportunity to use powerful information sources that could reduce clinical and health economic uncertainty at a patient level [
      • Chen Y.
      • Guzauskas G.F.
      • Gu C.
      • et al.
      Precision health economics and outcomes research to support precision medicine: big data meets patient heterogeneity on the road to value.
      ]. Specifically, having NGS data might provide the ability to study the cost-effectiveness of identifying rare disease variants, where a randomized controlled trial (RCT) would be unlikely to be conducted. So perhaps in the past a value of information analysis after an early stage CEA might have indicated that there would be value in reducing uncertainty in the incremental cost-effectiveness ratio for using NGS to identify a rare disease, but the cost of an RCT to achieve this might have been prohibitively high. If we can instead reduce this uncertainty using a (relatively) low-cost, real-world big data study, we could add value into a health care system.
      Is it therefore feasible and practical to use this big data in CEAs of NGS technologies to partially overcome a lack of evidence from clinical trials?
      This question has been brought into sharp focus with the creation of large national sequencing initiatives such as the UK 100,000 Genomes Project [
      Genomics England
      The 100,000 Genomes Project.
      ] and the Cancer 2015 Study in Australia [
      • Parisot J.
      • Thorne H.
      • Fellowes A.
      • et al.
      “Cancer 2015”: a prospective, population-based cancer cohort—phase 1: feasibility of genomics-guided precision medicine in the clinic.
      ,
      • Lorgelly P.K.
      • Doble B.
      • Knott R.J.
      • et al.
      Realising the value of linked data to health economic analyses of cancer care: a case study of cancer 2015.
      ]. Both programs are linking information on patients’ genomic variants with clinical information and health care resource databases, and the recently established All of Us Program in the United States is planning a similar linked database design. The data produced by these initiatives could allow regression-based modeling with matched controls and decision-analytic modeling to play a role in evaluating the costs and outcomes of NGS technologies.
      In this article, we summarize the potential opportunities for using big data when evaluating the cost-effectiveness of NGS tests. We present an overview of the key methodological, technical, and practical challenges when using big data in this context, and propose potential solutions. Our specific focus is on “big datasets” that link observational data and cohort studies with genomic data obtained from NGS tests. These datasets are being created around the world and are starting to be applied in health care. Some important challenges are already emerging such as issues with data quality and difficulty in applying a causal interpretation to studies [
      • Raghupathi W.
      • Raghupathi V.
      Big data analytics in healthcare: promise and potential.
      ]. Nevertheless, it is unclear whether these challenges are unique to NGS, or if they are simply magnified in the NGS context.

      Opportunities for Using Big Data in CEAs of Genomic Tests

      The opportunities for using big data to inform CEAs of genomic tests have expanded in the past 5 years. The UK 100,000 Genomes Project is the largest genomic sequencing initiative to routinely collect linked data at a scale potentially facilitating CEA. This project is linking whole-genome sequencing data with patient-level longitudinal data extracted from routine UK databases, including secondary care records (Hospital Episode Statistics; HES), primary care records (Clinical Practice Research Datalink), disease registries, pharmacy data, and mortality data for rare disease and cancer patients. The Cancer 2015 study in Australia is also a prospective, longitudinal, genomic cohort study, which aimed to enroll >10,000 newly diagnosed cancer patients. Recruited patients consented to have their tumor biopsies and blood screened using an NGS cancer panel (tumor sequencing). Similar clinical and health resource use data are being collected as in the 100,000 Genomes Project, plus health-related quality-of-life (HRQoL) data. The US All of Us Research Program in the United States will use NGS for 1,000,000 individuals for a wide range of diseases, both common and rare, and also healthy individuals [
      National Institutes of Health
      About the All of Us Research Program.
      ]. It is expected that the program will also create an electronic core dataset that will contain participant-provided information on sociodemographic variables, the results of biospecimen assays, electronic health record data, and potentially mobile and digital health data.
      The big data that will be generated in these sequencing programs will provide evidence of the benefits of NGS for health care systems and can be used to populate cost-effectiveness models assessing alternative NGS technologies (such as whole-genome sequencing and panels), presenting health economists with an unprecedented opportunity to conduct detailed analyses of the economic value of these technologies. The addition of genomics data to other linked datasets could perhaps help to understand additional heterogeneity in costs and outcomes and therefore allow health economists to generate more accurate estimates of cost-effectiveness. For example, when assessing the cost-effectiveness of using multiplex testing in cancer, ideally we would want to model the tested population by mutation profile, in that you would have survival, costs, and utility values specific for each profile. This would be in contrast to how we might have to model the same population without genomic linked data as we would be unable to stratify costs or outcomes by mutation profile and instead have to rely on average values from the whole population.

      Challenges in Using Big Data in CEA of NGS Tests

      These opportunities are, however, accompanied by several challenges. These challenges can be broadly split into three categories: data collection, data management, and data analysis. We describe these challenges and provide some suggested solutions; a summary of the challenges and solutions are provided in Table 1.
      Table 1Key challenges in the use of a big data approach to evaluating NGS and potential solutions
      ChallengesPotential solutions
      Data collectionLimited sample sizes for some patient groups (e.g., rare diseases or specific genomic subgroups).Combine data collected from sequencing initiatives in multiple countries. Application of N-of-1 trials with appropriate aggregation methods to inform resource allocation decisions.
      Absence of health-related quality of life data being collected alongside sequencing initiatives.Use of existing sources (e.g., Tufts Cost-Effectiveness Registry); conduct of small bespoke studies in relevant patients; and early stage involvement of health economists in design of data collection.
      Data managementLinking process can be slow because of involvement of multiple stakeholders that employ different data housing platforms and have diverse access and formatting requirements.Central data warehouse hosted on secure servers (e.g., Inuvika Open Virtual Desktop used to access datasets and software for the UK’s 100,000 Genomes Project).
      Linked data can be incomplete as only a certain proportion of patients may have all data elements.Potential to use multiple imputation where data are missing for a subset of patients to avoid dropping observations and introducing bias, but complete case analysis should also be presented.
      Pre-approved protocols that limit data access to minimum dataset required to answer research question.
      Questionable accuracy of time stamped data elements (because of measurement error and de-identification) and diagnosis codes (because of coder misclassification and clinician misdiagnosis).Data visualization of patient pathways using schematics to detail sequence of main events over time will allow analysts to identify potential issues with data before analysis.
      Heterogeneity in event coding (e.g., differentiation of routine follow-up events vs. diagnostic testing events to appropriately cost diagnostic odysseys).Select small sample of patients and use expert clinical input to differentiate event types, with potential to develop algorithm to apply to larger sample.
      Linked data requires extensive cleaning to ensure accuracy and reliability.Standardization of cleaning steps for each individual dataset; steps should be shared on publication to ensure reproducibility.
      Data analysisLinked data can contain missing variables depending on the composition of the employed data sources.Review initial linkages to ensure appropriate depth and breadth of data capture. If appropriate/possible, increase number of linkages to capture missing variables.
      Scope of available data sources can result in endogeneity problems and thus sample selection bias when estimating treatment effects.Data on genomic variants can be used as instrumental variables to address endogeneity as there is no causal pathway from genomic variants to the outcome other than via exposure.
      Lack of a counterfactual when using linked data can result in difficulties in estimating treatment effects.Natural experiments; difference-in-difference analysis; historical controls with propensity score matching.
      Large quantities of data on health care resource use can have high frequency of zero observations.Collapsed data structures and econometric models that account for excess zeros.

      Data Collection Challenges

      Small Number of Observations

      Population sequencing initiatives provide an opportunity to maximize the volume of data that can be obtained for specific conditions. This may be helpful for specific cancer types, but the incidence and prevalence of rare diseases is much lower, and patterns in disease progression and treatment response are difficult to identify, even when recruiting from population-based studies. Robust evidence linking clinical outcomes and health management for rare diseases is essential for CEAs of NGS technologies, but may be difficult to obtain.

      Potential Solutions

      Some have recommended the use of N-of-1 trials in these situations [
      • Lillie E.O.
      • Patay B.
      • Diamant J.
      • et al.
      The n-of-1 clinical trial: the ultimate strategy for individualizing medicine?.
      ]. In practice, very few N-of-1 trials have been conducted [
      • Kravitz R.L.
      • Duan N.
      • Niedzinski E.J.
      • et al.
      What ever happened to N-of-1 trials? Insiders’ perspectives and a look to the future.
      ,
      • Wegman A.C.
      • van der Windt D.A.
      • Stalman W.A.
      • et al.
      Conducting research in individual patients: lessons learnt from two series of N-of-1 trials.
      ], and as far as we are aware none have been carried out and linked to routinely collected administrative health care datasets. Methods to aggregate outcomes from multiple N-of-1 trials in a way that is not subject to allocation or confounding bias are required if such data are to inform CEAs [
      • Doble B.
      • Harris A.
      • Thomas D.M.
      • et al.
      Multiomics medicine in oncology: assessing effectiveness, cost-effectiveness and future research priorities for the molecularly unique individual.
      ]. In the meantime, another possible solution for addressing issues related to small sample sizes would be to combine data from sequencing initiatives conducted in multiple countries [
      European Commission (EC)
      EU Countires Will Cooperate in Linking Genomic Databases Across Borders.
      ]. However, aggregating data across different health systems could increase the risk of unknown confounding. Furthermore, such aggregation may increase the amount of missing data and may ultimately be restricted by ethical issues and prohibitive data access requirements.

      Lack of Outcomes Data for CEAs

      Information on health outcomes (such as HRQoL data) is essential to evaluate the cost-effectiveness of NGS technologies. Unfortunately, such data are rarely captured in routinely collected administrative health care datasets, and few of the existing big data projects in genomics are collecting such information. This means that other data sources must be identified for economic evaluations of NGS technologies to be compliant with best practice cost-effectiveness guidelines [
      National Institute for Health and Care Excellence (NICE)
      Guide to the Methods of Technology Appraisal 2013—Process and Methods.
      ].

      Potential Solutions

      In the short-term, bespoke outcomes studies could be conducted, for instance, collecting the EuroQOL five dimensions questionnaire (EQ5D) data from patients who have undergone NGS testing and from patients who have not had NGS testing. Published data sources could also be used (although little utility data exists for rare diseases) [
      Tufts Medical Centre
      Cost-Effectiveness Analysis Registry.
      ]. In the longer-term, it is crucial that health economists are involved at the design stage of future genomic-linked cohort studies to ensure that appropriate HRQoL questionnaires and other outcome measurement tools are administered to study participants. Data on patient preferences and utilities (measured using the EuroQol EQ-5D) were collected in the pilot for Cancer 2015, but not in the 100,000 Genomes Project, and there are currently no plans to collect these data in the All of Us Program.

      Data Management Challenges

      Data Linkage

      Big data for evaluating genomic tests are primarily obtained by linking sequencing data to multiple data sources from several settings (e.g., secondary care, primary care, disease registries, and mortality datasets). These data are often maintained by different stakeholders and produced from multiple platforms with different access and formatting requirements. This heterogeneity in data access can create bottlenecks in generating useful evidence from big data. Ideally, linked data would be processed in real time, but stakeholders often have different time boundaries for data requests (e.g., data may only be provided up to six months before request date). Furthermore, stakeholders commonly have turnaround times of over a month to release data because data linkage is performed in house (because of privacy and legislative requirements). Thus, the linked datasets received by researchers may be outdated upon receipt, introducing uncertainty into CEAs of NGS technologies. Linking multiple data sources from different stakeholders can understate real-world variability as only a certain proportion of patients may have all the desired data elements. For example, in the analysis of pilot data from Cancer 2015, only a proportion (621 of 922; 67%) of the total sample with resource use data also had sequencing data. This limited the analyses to these data subsets, potentially biasing any economic evaluations that use these data [
      • Lorgelly P.K.
      • Doble B.
      • Knott R.J.
      • et al.
      Realising the value of linked data to health economic analyses of cancer care: a case study of cancer 2015.
      ].

      Potential Solutions

      The challenges associated with having multiple stakeholders in data linkage processes can be mitigated by using a central data warehouse of secure servers. Health economists can then request data and perform economic analyses on secure computers, potentially located onsite. However, this requires the coordination of multiple stakeholders, some of whom may not have a research mandate. Even if such systems were in place, many routinely collected databases have a time lag between the occurrence of a health care event and the availability of these data in the dataset (e.g., for HES data in the UK there is approximately a six-month delay).
      Current best practice where data are missing for a subset of patients is to employ imputation methods (such as multiple imputation) that can operate with partial data to avoid dropping observations and introducing bias [
      • White I.R.
      • Royston P.
      • Wood A.M.
      Multiple imputation using chained equations: issues and guidance for practice.
      ]. Such approaches could help to resolve issues related to missing data because of linkage problems. Nevertheless, multiple imputation may not be appropriate if entire datasets or data elements are missing because the imputation equations require information on related variables. Furthermore, some imputation approaches (e.g., “predictive mean matching”) [
      • Roderick J.A.L.
      Missing-data adjustments in large surveys.
      ] may not work well when patient subgroups are very small, for example, with rare diseases.
      With respect to privacy, limiting data access for researchers to certain variables by using a preapproved protocol can minimize reidentification risks [
      • Lorgelly P.K.
      • Doble B.
      • Knott R.J.
      • et al.
      Realising the value of linked data to health economic analyses of cancer care: a case study of cancer 2015.
      ] and be framed as a “minimum data set” to answer a research question. Such approaches are commonly used to access routinely collected administrative health care datasets. This is particularly important as, even with consent, patients may not be aware what their data are being used for.

      Large Number of Zero Observations

      A common property of health care resource use data is zero observations. This creates problems for econometric models attempting to explain variations in resource use and costs in CEA and is a particular problem if the population of interest is relatively healthy. For example, health care resource use before a cancer diagnosis is likely to be minimal given that most individuals are relatively healthy and may only begin to consume health care resources after receiving a cancer diagnosis [
      • Lorgelly P.K.
      • Doble B.
      • Knott R.J.
      • et al.
      Realising the value of linked data to health economic analyses of cancer care: a case study of cancer 2015.
      ].

      Potential Solution

      One possible solution is to organize (“collapse”) the data into longer periods to reduce the occurrence of zero observations. Nevertheless, this can create “vagueness,” potentially obscuring interesting or important data patterns. Where there are particularly high proportions of zero observations (>50%), collapsing data is unlikely to be helpful. Instead, alternative econometric models that account for zeros, such as zero-inflated and hurdle models, may be more informative.

      Difficulties in Setting Up Big Datasets for Health Economic Analysis

      The datasets linked to create big data are usually not designed for research, and their veracity and validity for use in CEAs requires health economists to overcome several obstacles. First, time-stamped data elements may not accurately capture the timing of a reported event, either because of measurement error or because detailed information on timings cannot be shared for de-identification reasons. This makes it difficult to identify when a disorder was diagnosed or understand detailed and accurate sequences of events for a patient.
      Second, diagnostic information may be lacking because of differences in nomenclature. The accuracy of diagnosis codes is influenced by the validity of the diagnosis compared to what the patient is actually suffering from, and the veracity of the assigned diagnostic code compared with the documented diagnosis. These factors may differ by physicians or coders, which can further complicate the use of diagnostic data from big datasets.
      Third, diagnostic information can be obscured by a lack of differentiation in event coding. To quantify the cost of a diagnostic odyssey analysts must first differentiate between events that would be routine even if the patient received an earlier diagnosis and events associated with an unknown diagnosis. If such events are not differentiated, analysts may overestimate the cost of a diagnostic odyssey, which could subsequently impact on the relative cost-effectiveness of NGS.
      Finally, routinely collected administrative health care datasets are often incomplete, inconsistent, and contain errors. To produce big data that have acceptable veracity, these data must be cleaned. However, data cleaning requires several distinct steps that may further reduce the veracity of the final dataset [
      • van Walraven C.
      • Austin P.
      Administrative database research has unique characteristics that can risk biased results.
      ]. For example, assigning unit costs to HES data requires analysts to remove duplicates, organize the data into unique finished consultant episodes, and then adjust the variables to enable linkage to Healthcare Resource Group codes and National Reference Costs. Furthermore, the coding of such datasets often evolves over time [
      • Head R.F.
      • Byrom A.
      • Ellison G.T.
      A qualitative exploration of the production of Hospital Episode Statistics in a Guernsey hospital: implications for regional comparisons of UK health data.
      ].

      Potential Solutions

      One way to enhance big datasets to better inform economic analyses is to develop a set of clinical rules based on the review of carefully selected standard cases for different phenotypes of interest. A first step in such a review could be visualization of the patient pathways using schematics detailing the main events for individual patients over the entire period of available data. Algorithms developed using this approach would need to be thoroughly validated to ensure veracity. To avoid issues related to coding changes over time, it may be necessary to standardize data to a common year to facilitate linkage to unit costs. This would ensure that cost variations across years reflect changes in utilization, not coding changes.

      Data Analysis Challenges

      How to Identify and Deal with Selection Bias and Confounding

      Using representative databases when selecting datasets for linkage can create issues of selection bias and confounding. For example, HES data are available only for England, not the entire UK. Cancer 2015 used both national- and state-level datasets, but the state-level dataset may have caused variability. Using more data of the same type to improve prediction does not address issues related to bias from missing variables or selection bias in the data. This issue is not specific to the evaluation of NGS but becomes more acute when observational databases are being used as a critical source of information for estimating causal relationships in CEAs of NGS.

      Potential Solutions

      One potential solution is to use instrumental variable analysis. The relationship between an exposure (e.g., a specific disease) and an outcome (e.g., health care costs) can be confounded by either observed or unobserved confounders. Obtaining an unbiased estimate of the causal effect of an exposure on an outcome using observational data can therefore be difficult. However, the increased availability of genomic data linked to longitudinal data on health care resource use and costs provides an opportunity to better understand such causal relationships. Because the allocation of genomic variants within a population is random in many disease contexts (i.e., they can be assumed to be statistically independent of a regression error term), the effects of an endogenous variable (disease) on an outcome (health care costs) can be identified by the exogenous variation induced by genomic variants by employing instrumental variable analysis [
      • Dixon P.
      • Davey Smith G.
      • von Hinke S.
      • et al.
      Estimating marginal healthcare costs using genetic variants as instrumental variables: Mendelian randomization in economic evaluation.
      ]. Genomic variants are potentially useful instruments as they can be associated with a disease but not associated with potential confounders, and it is possible that identified genomic variants will not directly impact costs, other than indirectly via the disease.
      A second potential solution is to make use of synthetic controls. Social scientists are often interested in the effects of events or policy interventions on aggregate entities, such as companies or schools [
      • Abadie A.
      • Diamond A.
      • Hainmueller J.
      Synthetic control methods for comparative case studies: estimating the effect of California’s Tobacco Control Program.
      ]. Comparative case studies using synthetic controls are used to estimate the evolution of outcomes (such as mortality rates or crime rates) in aggregate entities compared to control groups [
      • Craig P.
      • Cooper C.
      • Gunnell D.
      • et al.
      Using natural experiments to evaluate population health interventions: new MRC guidance.
      ]. For example, in their study of the economic impact of terrorism in the Basque Country, Abadie and Gardeazabal [
      • Abadie A.
      • Gardeazabal J.
      The economic costs of conflict: a case study of the Basque Country.
      ] used a combination of two Spanish regions to approximate the economic growth that the Basque Country would have experienced in the absence of terrorism.
      A third potential solution is to apply propensity score matching. This approach can be used to ensure that the design of an observational study is analogous to that of a randomized experiment; this is because researchers do not see information on outcome variables until after a study is completed. Because the propensity score is a function of covariates, not outcomes, repeated analyses attempting to balance covariate distributions across treatment groups should not bias estimates of the treatment effect on outcome variables [
      • Rubin D.B.
      Using propensity scores to help design observational studies: application to the tobacco litigation.
      ]. Residual biases after matching may be addressed through regressions such as difference-in-difference or lagged dependent variables.

      Lack of a Counterfactual

      CEA requires a comparator intervention; however, it is not always straightforward to identify a counterfactual for NGS technologies in a big data context. Smith and Pell [
      • Smith G.C.
      • Pell J.P.
      Parachute use to prevent death and major trauma related to gravitational challenge: systematic review of randomised controlled trials.
      ] suggest that this is not necessarily an issue, noting that there are no RCTs of parachutes partly because the effect size would be so large (near certain death if there were a no-parachute comparator arm) that an RCT was unnecessary. This observation has been used to argue that RCTs are sometimes impractical or unnecessary. Nevertheless, Hayes et al. [
      • Hayes M.J.
      • Kaestner V.
      • Mailankody S.
      • et al.
      Most medical practices are not parachutes: a citation analysis of practices felt by biomedical authors to be analogous to parachutes.
      ] found that few studies have compared actual medical practices to the equivalent of a parachute. When they do, many refer to a practice actually been tested using an RCT, and half refer to an outcome less important than survival.

      Potential Solutions

      One potential solution for costs in CEA is to identify a matching cohort to differentiate costs associated with a particular genetic disease from costs associated with routine health care services. For example, in the context of rare diseases in children in Canada, the direct health care costs associated with the care of children with selected genetic diseases were compared to three matched cohorts—two cohorts of children with chronic disease (asthma and diabetes), and one cohort of children from the general population. The index event date for patients with a genetic disease was defined as the date of diagnosis (determined from a chart review). To compare cohorts, the comparison cohorts were matched to the genetic disease cohort based on sex, date of birth, income quintile, rural versus urban household at birth, and index event date (Marshall et al., under review) [

      Marshall DA, Benchimol EI, MacKenzie A, et al. Direct health care costs for children diagnosed with genetic diseases are significantly higher than for children with other chronic diseases. Genetics in Medicine (under review).

      ].
      A second potential solution is to conduct “umbrella” and “basket” trials, which are non-randomized alternatives to RCTs [
      • Rashdan S.
      • Gerber D.E.
      Going into BATTLE: umbrella and basket clinical trials to accelerate the study of biomarker-based therapies.
      ]. Umbrella trials enroll patients with a single tumor type or histology, who are then treated according to the molecular characterization of each case. These trials involve substudies that are connected through a central infrastructure overseeing patient identification and screening. This approach is particularly beneficial for low-prevalence markers and allows the testing of new drugs and biomarkers. Basket trials also have a central screening and treatment infrastructure, but in contrast to umbrella trials, these trials facilitate the study of multiple molecular subpopulations of different tumor or histologic types within a single study and can include highly rare cancers that would be difficult to study in RCTs [
      • Rashdan S.
      • Gerber D.E.
      Going into BATTLE: umbrella and basket clinical trials to accelerate the study of biomarker-based therapies.
      ]. The basket design has the flexibility to open and close study arms; hence, several drugs for many different diseases can be screened.

      Conclusions

      Harnessing big data is central to the successful research application of NGS technologies. The use of genomic big data to support and inform CEAs of NGS technologies holds great promise to inform questions of health care sustainability, but health economists face substantial challenges when using these data. Some of these challenges are common to all CEAs using observational big data, but the importance of big data to the success of NGS provides health economists with important opportunities for methods development. We hope that by characterizing these challenges and proposing some solutions, researchers are in a stronger position to use big data to evaluate the costs, outcomes, and cost-effectiveness of NGS technologies.

      Acknowledgments

      S.W., B.D., and J.B. are partly funded by the Oxford National Institute for Health Research, Biomedical Research Centre, Oxford, UK. D.M. and D.R. received travel support from Illumina to attend a past working group meeting in Boston, MA.

      References

        • Schwarze K.
        • Buchanan J.
        • Taylor J.C.
        • et al.
        Are whole-exome and whole-genome sequencing approaches cost-effective? A systematic review of the literature.
        Genet Med. 2018; ([Epub ahead of print])
        • Mehta N.
        • Pandit A.
        Concurrence of big data analytics and healthcare: a systematic review.
        Int J Med Inform. 2018; 114: 57-65
        • Collins B.
        Big data and health economics: strengths, weaknesses, opportunities and threats.
        Pharmacoeconomics. 2016; 34: 101-106
        • Phillips K.A.
        • Trosman J.R.
        • Kelley R.K.
        • et al.
        Genomic sequencing: assessing the health care system, policy, and big-data implications.
        Health Affair. 2014; 33: 1246-1253
        • Chen Y.
        • Guzauskas G.F.
        • Gu C.
        • et al.
        Precision health economics and outcomes research to support precision medicine: big data meets patient heterogeneity on the road to value.
        J Pers Med. 2016; 6 (pii: E20)
        • Genomics England
        The 100,000 Genomes Project.
        Department of Health & Social Care, London2018
        • Parisot J.
        • Thorne H.
        • Fellowes A.
        • et al.
        “Cancer 2015”: a prospective, population-based cancer cohort—phase 1: feasibility of genomics-guided precision medicine in the clinic.
        J Pers Med. 2015; 5: 354
        • Lorgelly P.K.
        • Doble B.
        • Knott R.J.
        • et al.
        Realising the value of linked data to health economic analyses of cancer care: a case study of cancer 2015.
        Pharmacoeconomics. 2016; 34: 139-154
        • Raghupathi W.
        • Raghupathi V.
        Big data analytics in healthcare: promise and potential.
        Health Inform Sci Syst. 2014; 2: 3
        • National Institutes of Health
        About the All of Us Research Program.
        Department of Health and Human Services, Washington2018
        • Lillie E.O.
        • Patay B.
        • Diamant J.
        • et al.
        The n-of-1 clinical trial: the ultimate strategy for individualizing medicine?.
        Pers Med. 2011; 8: 161-173
        • Kravitz R.L.
        • Duan N.
        • Niedzinski E.J.
        • et al.
        What ever happened to N-of-1 trials? Insiders’ perspectives and a look to the future.
        Milbank Q. 2008; 86: 533-555
        • Wegman A.C.
        • van der Windt D.A.
        • Stalman W.A.
        • et al.
        Conducting research in individual patients: lessons learnt from two series of N-of-1 trials.
        BMC Fam Pract. 2006; 7: 54
        • Doble B.
        • Harris A.
        • Thomas D.M.
        • et al.
        Multiomics medicine in oncology: assessing effectiveness, cost-effectiveness and future research priorities for the molecularly unique individual.
        Pharmacogenomics. 2013; 14: 1405-1417
        • European Commission (EC)
        EU Countires Will Cooperate in Linking Genomic Databases Across Borders.
        Brussels: EC. 2018; (April) (April)
        • National Institute for Health and Care Excellence (NICE)
        Guide to the Methods of Technology Appraisal 2013—Process and Methods.
        NICE, London2013
        • Tufts Medical Centre
        Cost-Effectiveness Analysis Registry.
        Tufts Medical Center, Boston2018
        • White I.R.
        • Royston P.
        • Wood A.M.
        Multiple imputation using chained equations: issues and guidance for practice.
        Stat Med. 2011; 30: 377-399
        • Roderick J.A.L.
        Missing-data adjustments in large surveys.
        J Bus Econ Stat. 1988; 6: 287-296
        • van Walraven C.
        • Austin P.
        Administrative database research has unique characteristics that can risk biased results.
        J Clin Epidemiol. 2012; 65: 126-131
        • Head R.F.
        • Byrom A.
        • Ellison G.T.
        A qualitative exploration of the production of Hospital Episode Statistics in a Guernsey hospital: implications for regional comparisons of UK health data.
        Health Serv Manage Res. 2008; 21: 178-184
        • Dixon P.
        • Davey Smith G.
        • von Hinke S.
        • et al.
        Estimating marginal healthcare costs using genetic variants as instrumental variables: Mendelian randomization in economic evaluation.
        PharmacoEconomics. 2016; 34: 1075-1086
        • Abadie A.
        • Diamond A.
        • Hainmueller J.
        Synthetic control methods for comparative case studies: estimating the effect of California’s Tobacco Control Program.
        J Am Stat Assoc. 2010; 105: 493-505
        • Craig P.
        • Cooper C.
        • Gunnell D.
        • et al.
        Using natural experiments to evaluate population health interventions: new MRC guidance.
        J Epidemiol Commun Health. 2012; 66: 1182-1186
        • Abadie A.
        • Gardeazabal J.
        The economic costs of conflict: a case study of the Basque Country.
        Am Econ Rev. 2003; 93: 113-132
        • Rubin D.B.
        Using propensity scores to help design observational studies: application to the tobacco litigation.
        Health Serv Outcomes Res Methodol. 2001; 2: 169-188
        • Smith G.C.
        • Pell J.P.
        Parachute use to prevent death and major trauma related to gravitational challenge: systematic review of randomised controlled trials.
        Br Med J. 2003; 327: 1459-1461
        • Hayes M.J.
        • Kaestner V.
        • Mailankody S.
        • et al.
        Most medical practices are not parachutes: a citation analysis of practices felt by biomedical authors to be analogous to parachutes.
        CMAJ Open. 2018; 6: E31-E38
      1. Marshall DA, Benchimol EI, MacKenzie A, et al. Direct health care costs for children diagnosed with genetic diseases are significantly higher than for children with other chronic diseases. Genetics in Medicine (under review).

        • Rashdan S.
        • Gerber D.E.
        Going into BATTLE: umbrella and basket clinical trials to accelerate the study of biomarker-based therapies.
        Ann Transl Med. 2016; 4: 529

      Linked Article

      • Correction
        Value in HealthVol. 22Issue 4
        • Preview
          Using “Big Data” in the Cost-Effectiveness Analysis of Next-Generation Sequencing Technologies: Challenges and Potential Solutions
        • Full-Text
        • PDF
        Open Archive