## Abstract

### Objective

To demonstrate why meta-analytic methods need modification before they can be used to aggregate rates or effect sizes in outcomes research, under the constraint of no common underlying effect or rate.

### Methods

Studies are presented that require different types of risk adjustment. First, we demonstrate using rates that external risk adjustment through standardization can be achieved using modified meta-analytic methods, but only with a model that allows input of user-defined weights. Next, we extend these observations to internal risk adjustment of comparative effect sizes.

### Results

We show that this procedure produces identical results to conventional age standardization if a rate is being standardized for age. We also demonstrate that risk adjustment of effect sizes can be achieved with this modified method but cannot be done using standard meta-analysis.

### Conclusions

We conclude that this method allows risk adjustment to be performed in situations in which currently the fixed- or random-effects methods of meta-analysis are inappropriately used. The latter should be avoided when the underlying aim is risk adjustment rather than meta-analysis.

## Keywords

## Introduction

The term

*standardization*refers to the process of facilitating comparison of summary measures of burden or risk of disease across populations. Such standardization can be done in two ways.*Internal*standardization refers to the process of ensuring that summary measures adequately reflect the distribution of burden or risk of disease within subpopulations of the same overall population. If the subpopulations are investigated independently of each other (e.g., in separate studies), the overall summary measure for the total population needs to take account of the actual size of each subpopulation within the overall population structure so that the overall summary reflects the actual population meaningfully. For example, a summary measure of mortality reported in different subpopulations by age can be combined into a summary measure for the total population after standardizing against the actual distribution of ages. When such standardized rates are compared across different populations, they can be interpreted as the mortality rate for an average member of each specific population. Thus, mortality rates in Australia versus India, obtained from different subpopulations, need to be internally standardized against the actual population structure in Australia and India, respectively, if a summary for each country is to be compared.*External*standardization can also be done by replacing the internal standard described above with a common external standard against which subpopulations from different populations are standardized, thus removing the subpopulation effect completely. This is of particular importance to studies of quality improvement, in which many estimates of disease burden or risk are strongly dependent on subpopulation, with rates of incidence or mortality being much higher or lower between different subpopulations. In this situation, the differences between populations independent of the confounding by sizes of subpopulations with different risks can be determined by standardizing against a common external standard. In this sort of standardization, the standardized rate is in itself useful only for comparison and has no intrinsic interpretation.

The process of

*risk adjustment*encompasses both standardization and other procedures for accounting for the effects of subpopulations with different risks. In this article, risk adjustment and standardization will both refer to methods of adjustment based on weighted averages in which the weights are chosen to provide an “appropriate” basis for the comparison (i.e., a “standard”). The latter is generally either the subpopulation sizes from each of the populations in the comparison or from a relevant external population. A common method in epidemiology for this purpose has been direct standardization because it can be applied on the basis of any subpopulation distribution, for example, on the basis of age, geographical clusters, and cancer incidence. Direct standardization is simply a process of weighted averaging of the subpopulation-specific rates to arrive at a standardized estimate that reflects a given subpopulation structure. The distribution of the “standard” provides the weights and usually represents the*current*or*most common*subpopulation structure for internal and external standardization, respectively, and could represent subpopulation sizes based on age or geographic cluster or any other distribution of whatever standard is to be applied. This provides, for each population, one risk-adjusted or standardized rate that reflects the appropriate contribution of the subpopulation-specific risk or rates to the standard. In this article, we demonstrate, using two examples, which risk adjustment through direct standardization can be achieved using modified meta-analytic methods, but only with our model [[1]

] that allows input of user-defined weights. The advantage here is that this method can now be extended to any standard and any effect size (ES) other than rates.## Methods

A modification was undertaken of the quality-effects model [

where

[1]

] of meta-analysis that allows moving the model from meta-analysis to risk adjustment. This model uses a risk of bias weighting scheme in addition to inverse variance weighting and the modification entailed removing inverse variance weights and replacing bias weights with normalized subpopulation weights from a standard population. The subpopulation weights are applied using a modification of our bias adjustment procedure in meta-analysis [1

, 2

] for each subpopulation to come up with a weighted average that represents the single risk-adjusted or standardized estimate across the subpopulations. This weighted averaging procedure *does not*use inverse variance weights and thus is*not*a meta-analysis. Therefore, if subpopulation rates are being combined, it would give an equivalent result to direct standardization used in epidemiology. We do not use log-transformed rates because back transformation would result in pooled estimates that depart from those computed using the standard method. The standard method used for the computation of the directly standardized rate (DSR) is given by:$\mathrm{DSR}=\frac{1}{\sum _{i=1}^{k}{\mathit{W}}_{j}}\times \sum _{j=1}^{k}\frac{{w}_{j}{o}_{i}}{{n}_{j}}$

(1)

where

*O*_{j}is the observed number of events in subpopulation (age group)*j*,*n*_{j}is the number of individuals in subpopulation (age group)*j*(or the population × person-years at risk), and*w*_{j}is the weight based on the number/total (proportion) of individuals in the age-group subpopulation*j*. The computations for the variance and confidence intervals of this DSR are outlined in Table 1.However, this estimate can be derived by using a different procedure. If weights are given by ${w}_{j}^{\alpha}={Q}_{j}+{\widehat{\tau}}_{j}$ (see Table 1 for the computation of ${\widehat{\tau}}_{j}$),

In the computation of this DSE, rates can also be one of the effect estimates standardized and in this special case, zero rates are imputed to have variances based on a single observed event as a continuity correction (see Table 1). The same method can be used by substituting ES

*Q*_{j}= ${N}_{j}/{N}_{\mathrm{max}}$,*N*_{j}is the subpopulation size, and ES_{j}is the subpopulation effect estimate of interest, which could be an ES, rate, or proportion, the directly standardized effect estimate (DSE) is given by$\mathrm{DSE}=\frac{\sum \left({w}_{j}^{\alpha}\times {\mathrm{ES}}_{j}\right)}{\sum {w}_{j}^{\alpha}}$

(2)

In the computation of this DSE, rates can also be one of the effect estimates standardized and in this special case, zero rates are imputed to have variances based on a single observed event as a continuity correction (see Table 1). The same method can be used by substituting ES

_{j}for any other ES. For the odds ratios, however, careful consideration should be given to whether the marginal or the conditional odds ratios are of interest in a particular analysis, given the mathematical fact that the marginal and conditional odds ratios are nonequivalent [[3]

]. Two examples are given below of the application of this procedure to risk adjustment in outcomes research. In the first example, external risk adjustment is done via the new procedure and compared with the direct method of age standardization to demonstrate equivalence. In the second example, this is then extended to internal risk adjustment of a relative risk (RR) measure using the incidence rates of cancer in each subpopulation as the weights and demonstrates how this may be extended beyond risk adjustment for rates.A simulation was also run (for example 1) under sampling variability by allowing [${O}_{j}~\mathrm{Poisson}\left({O}_{j}\right)$ after replacing any ${O}_{j}=0$ with 1. Thus, within each of the 18 age-group subpopulations,

*O*_{j}was now generated from a Poisson distribution with mean*O*_{j}. A thousand iterations of each set of rates were run using Ersatz version 1.3 (Epigear International Pty Ltd., Brisbane, Australia). Coverage of the confidence interval and percent bias was then computed as described by Burton et al. [[4]

].## Results—Some Examples of Risk Adjustment

### Example 1: External Risk Adjustment Across Age Groups and a Simulation Study

Individual death records with multiple cause of death were the primary source of data and accessed through the Australian Bureau of Statistics for the period 1999 and 2006. Deaths were coded according to the

*International Statistical Classification of Diseases, 10th Revision*by using the automated Mortality Medical Data System and results have been reported previously [[5]

]. The Australian population age distribution in 2006 was used as the external standard population for the purpose of risk adjustment. To examine mortality trends and differentials across time, we had created three estimates of a risk-adjusted mortality rate from renal failure due to diabetes using standard methods as follows [[5]

]: 1) Risk-adjusted rates (underlying cause rate) for diabetic renal disease based on deaths coded to diabetic nephropathy; 2) Risk-adjusted rates (multiple cause rate 1) for diabetic renal disease based on 1) above and additional deaths coded to diabetes without complications but with renal failure as a multiple cause; and 3) Risk-adjusted rates (multiple cause rate 2) for diabetic renal disease based on 1) and 2) above and additional deaths coded to diabetes with other complications (except nephropathy) but with renal failure as a multiple cause.The risk-adjusted cause-of-death rate of patients via our new procedure is obtained as follows: 1) compute the cause rate of each age subgroup of patients; 2) create a standardized weight (${N}_{j}/{N}_{\mathrm{max}}$) from the age composition of the external standard population adopted as the 2006 population in our case; and 3) apply the weighting procedure above to obtain the age-standardized rate. Table 1, Table 2 depicts the standard computation versus the modified meta-analytic procedure results. The pooled rates are identical because the process in both cases is weighted averaging. The confidence intervals differ marginally even though the process for risk adjustment here is completely different from the standard computation for direct standardization of rates.

Table 1Variance of the conventional directly standardized rate (DSR) and of the new directly standardized effect estimate (DSE).

DSR |

The 100 (1–α)% confidence limits for the directly standardised rate (DSR) are given by: |

$DS{R}_{lower}\phantom{\rule{0.25em}{0ex}}=\phantom{\rule{0.25em}{0ex}}DSR+\sqrt{\frac{var(DSR)}{var(O)}}\times \left({O}_{lower}-O\right)$ (3) |

$DS{R}_{upper}\phantom{\rule{0.25em}{0ex}}=\phantom{\rule{0.25em}{0ex}}DSR+\sqrt{\frac{var(DSR)}{var(O)}}\times \left({O}_{upper}-O\right)$ (4) |

Using Byar׳s method [ [11] ], the 100 (1–α)% confidence limits for the observed number of events are given by: |

${O}_{lower}\phantom{\rule{0.25em}{0ex}}=\phantom{\rule{0.25em}{0ex}}O\times {\left(1-\frac{1}{9O}-\frac{z}{3\sqrt{O}}\right)}^{3}$ (5) |

${O}_{upper}\phantom{\rule{0.25em}{0ex}}=\phantom{\rule{0.25em}{0ex}}\left(O+1\right)\times {\left(1-\frac{1}{9(O+1)}+\frac{z}{3\sqrt{(O+1)}}\right)}^{3}$ (6) |

The variances of the observed count O and the DSR are estimated by: |

$var({O}_{j})\phantom{\rule{0.25em}{0ex}}=\phantom{\rule{0.25em}{0ex}}\sum _{j=1}^{k}{O}_{j}$ (7) |

$var(DSR)\phantom{\rule{0.25em}{0ex}}=\phantom{\rule{0.25em}{0ex}}\frac{1}{{\left({\sum}_{j=1}^{k}{w}_{j}\right)}^{2}}\times \sum _{j=1}^{k}\frac{{w}_{j}^{2}{O}_{j}}{{n}_{j}^{2}}$ (8) |

DSE |

The variance of this directly standardised estimate (DSE) is given by^{‡}Where ωj2=σj2+γ2 where σj2 is the sampling variability for the jth category while γ2 is the method of moments estimate of between category variance [12], this process representing a quasi-likelihood approach to overdispersion correction. For the specific case of the rate, we can base the category specific sampling variance on a normal approximation to the Poisson distribution, σj2=Oj×(M/Pj)2, in which Oj are the observed events, Pj is the person-time of observation for the jth sub-population where j = 1,2,…,k and M is a constant multiplier (i.e. M = 1000 if rates are expressed per 1000). For all other effect sizes, the usual variance formulation is used (after transformation if that is required for normality). |

$\mathrm{var}(DSE)=\sum {\omega}_{j}^{2}{\left(\frac{{w}_{j}^{\alpha}}{\sum {w}_{j}^{\alpha}}\right)}^{2}$ (9) |

The weight is given by ${w}_{j}^{\alpha}\phantom{\rule{0.25em}{0ex}}=\phantom{\rule{0.25em}{0ex}}{Q}_{j}+{\widehat{\tau}}_{j}$ as previously described [ [2] ] with the modification (compared to meta-analysis) that for the purpose of generating the weights only, study variance is assumed to be = 1 for all j. |

The computation of ${\widehat{\tau}}_{j}$ is as follows: |

${\widehat{\tau}}_{j}=\left[(\sum {\tau}_{j})k\frac{{Q\prime}_{j}}{\sum {Q\prime}_{j}}\right]-{\tau}_{j}$ (10) |

where |

${Q\prime}_{j}=\{\begin{array}{ll}\left[\frac{{\tau}_{j}\sum {Q}_{j}}{(k-1)\sum {\tau}_{j}}\right]+{Q}_{j}\hfill & \mathrm{if}\phantom{\rule{.25em}{0ex}}(\exists )({Q}_{j}){Q}_{j}<1\hfill \\ {Q}_{j}\hfill & \mathrm{otherwise}\hfill \end{array}$ (11) |

and |

${\tau}_{j}=\frac{1-{Q}_{j}}{k-1}$ (12) |

Where

*O*is the**total**observed count of events in the local or subject population;*O*_{lower}and*O*_{upper}are the lower and upper confidence limits for the observed count of events; var(*O*) is the variance of the**total**observed count*O*; var(*DSR*) is the variance of the directly standardised rate.† Where z is the 100 (1–α/2)th percentile value from the Standard Normal distribution. For example, for a 95% confidence interval, α = 0.05 and z = 1.96 (i.e. the 97.5th percentile value from the standard normal distribution).

‡ Where ${\omega}_{j}^{2}\phantom{\rule{0.25em}{0ex}}=\phantom{\rule{0.25em}{0ex}}{\sigma}_{j}^{2}+{\gamma}^{2}$ where ${\sigma}_{j}^{2}$ is the sampling variability for the

*j*th category while ${\gamma}^{2}$ is the method of moments estimate of between category variance [[12]

], this process representing a quasi-likelihood approach to overdispersion correction. For the specific case of the rate, we can base the category specific sampling variance on a normal approximation to the Poisson distribution, ${\sigma}_{j}^{2}\phantom{\rule{0.25em}{0ex}}=\phantom{\rule{0.25em}{0ex}}{O}_{j}\times {\left(M/{P}_{j}\right)}^{2}$, in which *O*_{j}are the observed events,*P*_{j}is the person-time of observation for the*j*th sub-population where*j =*1,2*,…,k*and*M*is a constant multiplier (i.e.*M*= 1000 if rates are expressed per 1000). For all other effect sizes, the usual variance formulation is used (after transformation if that is required for normality).Table 2Age-standardized death rates from renal disease due to diabetes for Australia (1999 and 2006): Comparison of effects of calculating rates using two different multiple cause–based definitions with those using an underlying cause only.

Age-standardized death rates (per 100,000) | Australia | |
---|---|---|

1999 | 2006 | |

Underlying cause rate (UCR) | 0.671 (0.554–0.807) | 0.636 (0.532–0.754) |

Via the modified procedure | 0.671 (0.496–0.847) | 0.636 (0.472–0.800) |

Coverage of 95% CI | 99.5% | 99.5% |

Bias% | 5.2% | 4.4% |

Multiple cause rate 1 (MCR1) | 2.494 (2.259–2.748) | 3.436 (3.188–3.699) |

Via the modified procedure | 2.494 (2.135–2.854) | 3.436 (2.985–3.887) |

Coverage of 95% CI | 99.4% | 99.9% |

Bias% | 1.3% | 0.5% |

Multiple cause rate 2 (MCR2) | 3.171 (2.906–3.454) | 4.256 (3.979–4.547) |

Via the modified procedure | 3.171 (2.743–3.600) | 4.256 (3.746–4.767) |

Coverage of 95% CI | 99.8% | 99.9% |

Bias% | 0.8% | 0.6% |

CI, confidence interval.

Standardized to the Australian population in 2006.

Results of the simulation are also shown in Table 2 and demonstrate excellent coverage of the confidence interval by this method and reaffirm the appropriateness of the use of the normal approximation to the Poisson distribution for the generation of the variance of the subpopulation rates even when there are low event rates. Also, as expected with empirical weights, the estimator is biased because of covariance between the effect and the weights. This was demonstrated in the case of rates, however, to be a very small percentage of the magnitude of the effect and can therefore be ignored (Table 2).

### Example 2: Internal Standardization and Extension Beyond Rates

We reanalyzed the report from various meta-analyses of risks of different types of cancers in a recent article on cancer risk in diabetes [

[6]

]. Subpopulation weights within the studies in this article were the incidence figures in the United States of each cancer type for the population (defined by male sex). Where there was more than one report of the *same*cancer type, a meta-analysis was used to arrive at a single pooled estimate for that cancer type. Next, population standardization by this modified procedure was done. It should be emphasized that because we are averaging patient subpopulation RRs to arrive at a summary RR for the whole population, it does not matter if all RRs come from one study or from multiple studies (each providing a subpopulation RR—as in this example). In addition, standard subpopulation sizes are derived separately and need not be derived from the same source. Thus, unlike a meta-analysis, in which all studies are being pooled to achieve the best estimate of an underlying reality, here, the standardized estimate is simply a weighted mean event rate for a series of subpopulation RRs, using the subpopulation sizes from the same population as the weighting scheme.Results are shown in Figure 1 and computed via the use of the new procedure using an Excel sheet. It is evident that, for men, the risks are lowest for prostate cancer and because this forms the largest subpopulation by incidence (reflected by the size of the boxes in Fig. 1), its ES dominates after risk adjustment. Ignoring subpopulation weights would lead to the false impression that the overall cancer risk was increased, although in reality, the overall cancer risk was not increased but rather the smallest (lowest incidence) subpopulation groups (cancer types) had increased risk. If this were to be compared with the population of females (not shown), there would be a higher risk in the female population because protective effects of diabetes are not seen for any of the cancers in women.

## Discussion

The new procedure we outline for risk adjustment allows any ES or rate or proportion of interest to be standardized against any relevant subpopulation distribution. This would not have been easy to do manually and that is why many studies of burden of disease default to meta-analysis when risk adjustment is actually being sought [

7

, 8

]. Up till now, there was no effective method of introducing subpopulation weights into a meta-analytic procedure and thus these researchers erroneously used conventional models of meta-analysis when, in reality, risk adjustment was required. This procedure differs from meta-analysis in that there is no common underlying “true effect,” making inverse variance weights redundant during the risk adjustment procedure—a change that no longer allows us to call this procedure a meta-analysis.Assigning population weights to subpopulation ESs is a means of correcting for disproportionate selection whether it is by age or by geographic clustering. Thus, although a group of estimates, each on its own, represents samples of its subpopulations, it may not be representative on an entire population perspective. The process of risk adjustment of rates is a classic epidemiological method that removes the confounding effect of the subpopulation structure on rates that differ in populations we wish to compare over time. It also provides an easy-to-use whole population summary measure that can be useful for decision makers. This is the most common use of risk adjustment and is applied in two ways. First, rates of disease across countries can be

*internally*standardized to arrive at an estimate of burden in the region. The second key role of this procedure is in adjusting effect measures that reflect the quality of health care among providers, for example, the use of*externally*risk-adjusted death rates in determining hospital quality. Risk adjustment is thus of importance in reconciling key differences among patient subpopulations, thus permitting comparisons of like with like. In both the latter instances, failure to adjust appropriately for patient risk produces comparisons that are flawed, misleading, and, sometimes, meaningless.Although standardization usually meant using a common external standard, we have coined the term

*internal and external standardization*to distinguish the new category of standardization we term*internal standardization*. Burden of risk is the main target of internal standardization or internal risk adjustment. The aim here is to produce a valid comparison of whole population risk from a risk factor that affects subpopulations differently. Thus, if men and women are two different populations that contain different sizes of the same cancer subpopulations, a comparison of the effect of the risk factor (diabetes in our example) on overall cancer risk requires risk adjustment based on cancer incidence. The RR represents the RR of cancer with diabetes versus no diabetes in each subpopulation, and the risk-adjusted RR tells us what the overall burden of risk is on the basis of which cancers are common or rare. We demonstrate that, overall, there is no net increase in the burden of cancer risk in men because protective effects are seen in some cancers. This is not so in women, so if we were comparing the burden of risk from diabetes across genders, the risk-adjusted RR gives the real difference—no net change in men while a similar analysis in women (not shown) would have demonstrated a net increase in cancer risk.Some researchers have attempted to apply population weights within meta-analysis [

[9]

] in an attempt at internal standardization of subgroup differences across the population. The methodology used by the latter authors, however, does not seem to be valid because they made the mistake of combining meta-analysis and risk adjustment. We had previously advocated a method for doing so [[2]

], but this was incorrect [[10]

], and as we demonstrate here, this has to be done outside the framework of meta-analysis per se. We demonstrate using the estimates for cancer risk in diabetes how this should have been done (Fig. 1) and results can sometimes be very different from what has commonly been the case. Indeed, studies that have looked at regional burden of disease may generate point estimates from diverse populations using fixed-effects models or random-effects models [7

, 8

] and this would obviously be wrong because the aim here is *risk adjustment*as opposed to estimating the true underlying effect and thus the inverse variance weights or random-effects weights based on study size have no real implication and simply create bias.## Conclusions

Both meta-analysis and risk adjustment share the common property of weighted averaging. An ES may be adjusted for one or more compositional characteristics of a population by treating ESs as subgroups of a standard distribution with specific weights. These weights are representative of the distribution of the characteristic for which adjustment is desired. Following both the direct method of risk adjustment and this modified meta-analytic procedure, the adjusted ES is a weighted average of the subgroup ES that belongs to either a hypothetical external standard or an internal standard with distribution of subgroups as the weights. We advocate the use of this modified meta-analytic procedure in these situations when risk adjustment is sought. We also plan to introduce this procedure as software that can assist researchers standardize across any ES and any internal or external standard distribution (coming soon on www.epigear.com).

## Acknowledgments

We are extremely grateful to three anonymous referees for helpful comments on an earlier draft. The responsibility for the content of the article and views expressed is ours. We are also grateful to Professor Annette Dobson for providing critical insights to the revised article.

Source of financial support: The authors have no other financial relationships to disclose.

## References

- Methods for the bias adjustment of meta-analyses of published observational studies.
*J Eval Clin Pract.*2013; 19: 653-657 - Meta-analysis of heterogenous clinical trials: an empirical example.
*Contemp Clin Trials.*2011; 32: 288-298 - Confounding and collapsibility in causal inference.
*Stat Sci.*1999; 14: 29-46 - The design of simulation studies in medical statistics.
*Stat Med.*2006; 25: 4279-4292 - Mortality from diabetic renal disease: a hidden epidemic.
*Eur J Public Health.*2012; 22: 280-284 - Diabetes and cancer, I: risk, survival, and implications for screening.
*Cancer Causes Control.*2012; 23: 967-981 - Worldwide stroke incidence and early case fatality reported in 56 population-based studies: a systematic review.
*Lancet Neurol.*2009; 8: 355-369 - Burden of disease and circulating serotypes of rotavirus infection in sub-Saharan Africa: systematic review and meta-analysis.
*Lancet Infect Dis.*2009; 9: 567-576 - Calculating prevalence of hepatitis B in India: using population weights to look for publication bias in conventional meta-analysis.
*Ind J Pediatr.*2009; 76: 1247-1257 - Erratum: meta-analysis of heterogeneous clinical trials: an empirical example.
*Contemp Clin Trials.*2013; 34: 35 - Statistical Methods in Cancer Research.Lyon: International Agency for Research on Cancer, World Health Organization. 1987
- Meta-analysis in clinical trials.
*Control Clin Trials.*1986; 7: 177-188

## Article info

### Publication history

Published online: June 23, 2014

### Identification

### Copyright

© 2014 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc.

### User license

Elsevier user license | How you can reuse

Elsevier's open access license policy

Elsevier user license

## Permitted

### For non-commercial purposes:

- Read, print & download
- Text & data mine
- Translate the article

## Not Permitted

- Reuse portions or extracts from the article in other works
- Redistribute or republish the final article
- Sell or re-use for commercial purposes

Elsevier's open access license policy