An official website of the United States government

# Impact Story: Using innovative statistical approaches to provide the most reliable treatment outcomes information to patients and clinicians

Back to Regulatory Science In Action

Using Bayesian hierarchical models, CDER statisticians are improving our understanding of how drugs affect different groups of patients.

When evaluating drug treatments, determining how and to what extent a drug works in different patient subgroups can be addressed by statistical approaches that make use of results from every subgroup when understanding the treatment effect for a given subgroup.

To make this concept more concrete, consider a physician who has experience using a new drug in two males and two females. The physician then prescribes the drug to a fifth patient—a female. To predict how she may respond to the drug, the physician would probably consider information from all four previous patients, although the outcomes from the earlier two females may be a little more relevant than the outcomes from males.

Over time, as the physician prescribes the drug more often, he or she will gain more information about outcomes from the drug. For a future female patient, what information would be considered in determining what the patient can expect from this drug? How relevant the data from male patients is depends on how similar the results from males are to the results from females, as well as the amount of data available on females. If we have very precise estimates from just the female data, the male data would be less relevant than it was earlier for determining what outcomes female patients can expect.

This is where a statistical approach known as “shrinkage estimation” can be useful. Shrinkage estimation is applied to decide how to weigh data in different subgroups in a way that increases the precision of the subgroup estimates. Data from patients in the subgroup of interest are more relevant than data from patients outside that subgroup. Conversely, data from a patient outside the subgroup of interest becomes less relevant as the number of patients in the subgroup of interest grows.

## Drug Trials Snapshots

Clinical trials study the average treatment effect across patients. However, a drug’s treatment effect will often vary in clinically meaningful ways, across patients and groups of patients with specific traits (e.g., age, sex, or race). Given these differences, patients, patient subgroups and clinicians may need to decide whether a drug that has been proven beneficial based on its overall effect in a clinical trial is appropriate for them. One way they can find this information is by consulting the FDA’s Drug Trials Snapshots.

Drug Trials Snapshots (DTS) provide the information to individual patients and patient subgroups. DTS provide information not only about who participated in clinical trials for new molecular entities and original biologics, but also information on study design, results of efficacy and safety studies, and whether there were differences in efficacy and side effects among various subgroups defined by sex, race, and age. Published on the FDA website 30 days after a drug’s approval, the snapshots use consumer-friendly language, focus on subgroup data and analyses, and link to both the package insert (which provides information on prescribing) and reviews in Drugs@FDA.

Currently, the subgroup efficacy results in DTS are based on an analysis of each subgroup individually. For example, in Table 1, we see the percent of responders stratified by race for the HIV drug Genvoya compared to the HIV drug Stribild. However, this approach of summarizing results by using data solely in a subgroup is often not the best way to provide relevant information for patients and providers. CDER’s statisticians are working with their colleagues across the agency to improve the information provided in the snapshots.

 Table 1. Percent of responders to treatment with Genvoya vs Stribild according to race. Race Genvoya Stribild White 214/235 (91.1%) 217/243 (89.3%) Black 115/129 (89.1 %) 40/44 (90.9 %) Asian 17/20 (85%) 113/132 (85.6%) Other 44/47 (93.6 %) 13/16 (81.3%) From Drug Trials Snapshots.

## Borrowing from the data to improve estimation

Figure 1. Variability in the sample estimates of treatment effects when underlying rates do not (top panel) or do (lower panel) depend on sex. The four simulations (two in each panel) represent ten random drawings of 100 patients from large male or female populations in which a given percentage are responders (essentially the same as repeated blind draws of 100 balls from a jar containing a much larger number of balls of which a certain percentage are purple or green). In the top panel 50% of male and female patients are responders. In the lower panel 55% of males are responders and 50% of females are responders.

It may seem logical to analyze each subgroup of patients individually and to conclude that the best estimate about the effect of a drug in a subgroup is the average of the responses in that group. Surprisingly, when analyzing several groups, this is not the best approach. This is because sample-estimated treatment effects vary across subgroups more than the corresponding true treatment effects across subgroups.  We see this in Figure 1 (top panel), which shows random, simulated response rate results from 10 hypothetical studies where each has data from 100 male and 100 female patients.  True response rate for both sexes is 50 percent, and simulated response rates range from 43% to 54%, with the absolute differences in response rate ranging from 1% to 8%. Because the underlying response rates are equal for both sexes, all the variability is within-subgroup variability.

In the lower panel of Figure 1, the underlying (true) response rate for males has increased to 55%. Simulated response rates range from 38% to 63%. Variability in the absolute difference in response rates across studies ranges from 2% to 19%. Thus, the overall variability in the difference in response rates equals the variability of the underlying true response rates between sexes plus the within-subgroup variability due to random sampling.  As these examples illustrate, sample-estimated subgroup treatment effects are susceptible to random highs and lows.  Shrinkage estimation quantitatively addresses these highs and lows to produce estimates that are closer to the true values.

## Shrinkage estimation using Bayesian hierarchical models

Shrinkage estimation can be done using Bayesian1 hierarchical models, which are statistical models with multiple levels of detail.  In the model with three levels shown in Figure 2, at the top level are studies. At the middle level are different subgroups, and at the bottom level are different patients. For a given subgroup, information from other subgroups is also used to estimate its treatment effect. Outcomes from all patients are relevant, with an outcome from a patient in the given subgroup more relevant than the outcome of a patient not in the given subgroup.

Figure 2. A Hierarchical model structure with three levels: studies, subgroups, and patients.

As more information is used, the shrinkage estimates are more precise, and closer to the true subgroup treatment effects than the sample estimates. Furthermore, the variance in the collection of shrinkage-estimated treatment effects across subgroups is close to what we believe is the variance in the true treatment effects across subgroups.

Recently, CDER statisticians used shrinkage estimation based on a Bayesian hierarchical model to estimate treatment effects across regions in the Liraglutide Effect and Action in Diabetes: Evaluation of Cardiovascular Outcome Results trial, which compared liraglutide (an antidiabetic medication) to placebo in patients with type 2 diabetes at high risk for cardiovascular disease. Their analyses (see Table 2) help to clarify the extent of differences the effect of this drug had across regions, including Asia, Europe, North America, and rest of the world.

 Table 2. Revised estimates of the hazard ratio (liraglutide to placebo) in four geographic regions based on a Bayesian hierarchical model† Region Sample estimates Shrinkage estimates HR (95% CI) HR (95% CI) Asia 0.62 (0.37, 1.04) 0.80 (0.59, 1.09) Europe 0.82 (0.68, 0.98) 0.84 (0.71, 0.98) North America 1.01 (0.83, 1.22) 0.94 (0.79, 1.12) The Rest of the World 0.83 (0.68, 1.03) 0.85 (0.72, 1.00) †Source: Rothmann, Applying Hierarchical models when evaluating treatment effects across regions. HR = Hazard Ratio; CI = confidence interval.

CDER statisticians are applying Bayesian hierarchical models to other critical areas in drug evaluation as well, such as in the evaluation of treatments for children. Considering the adult and pediatric data together improves the quality of decision-making in the pediatric setting by borrowing from the adult results. The amount of borrowing from adults (the weight that can be given to the adult data) is based on an evaluation of all available data and depends on ratio of variability in adult and children’s data to variability between subgroups. Such an approach can be especially helpful for pediatric indications where recruiting pediatric patients for clinical trials can be difficult.

How does this research advance drug evaluation and public health?

Clinical trials are typically conducted in patients that differ by sex, age, race, and genetics, and results may vary by patient subgroup. CDER scientists are advancing statistical approaches to analyze trial results that can provide more predictive, accurate estimates of the effects of drugs and biologics in different patient subgroups.  When conveyed to patients and medical professionals, for example through the Drug Trials Snapshots Program, these estimates help ensure that treatment decisions are made with complete understanding of the available evidence.

Figure 3. Shrinkage estimation, two scenarios. As explained in the text shrinkage of the subgroup results from a clinical trial using a Bayesian hierarchical model is based on a weighted average of the response for all patients together and the individual subgroups. The relative weights of the two is determined by the ratio of within-subgroup variability to between subgroup variability. As this ratio increases, more weight is given to the overall estimate and the revised shrinkage estimate moves closer to the overall average.

In the hypothetical results shown at the top, within-subgroup variability (represented here by the colored bars showing 95% confidence intervals) is high relative to the across subgroup variability (essentially, the distance between estimates).  This high within-group variability would be expected in a trial with only a small number of patients in each group because of random sampling effects. With weak evidence that the groups are different, it makes sense to give weight to the overall percentage of responders calculated from all patients, and shrinkage would shift the individual estimates towards the overall estimate (in this way, data from all patients are used to construct the best estimate for each subgroup).

In the second example, because the sample for each subgroup is much larger within-subgroup variability is very low while the observed between-subgroup differences are high. The narrow confidence intervals and the widely separated subgroup averages increase our certainty that the groups are significantly different. Although response rates from the trial would still be expected to exaggerate the true differences between the subgroups (Figure 1), this exaggeration would likely be very small based on the low within-subgroup variability. Here, little weight is given to the overall estimate and there would very little shrinkage of the subgroup results toward the overall result.

See Alosh, Mohamed, et al. "Statistical considerations on subgroup analysis in clinical trials." Statistics in Biopharmaceutical Research 7.4 (2015): 286-303, for a brief definition of Bayesian statistics and its application to subgroup analyses.