# An adaptive trial design for testing the bioequivalence of generics of highly variable drugs

*CDER statisticians have developed and evaluated an adaptive trial design for demonstrating bioequivalence for highly variable drugs.*

Under federal law and regulations, developers of proposed generic products are typically not required to test their drug for efficacy as is necessary for a new drug, but they must show that their product is bioequivalent to a formulation of a drug that has undergone such testing (the reference drug). Bioequivalence (BE) requires that there not be significant differences between the test and reference drugs in parameters that reflect bioavailability at the site of action, for example, time to reach maximum concentration or the area under the concentration-time curve.^{1} (In practice in testing for BE, one typically works with the log-transformed values of such measurements.) One clinical study design that may be appropriate for establishing bioequivalence is a cross over design in which each subject in a trial receives either the test or the reference drug, and then after a reasonable delay is switched to the other drug. (In a replicated cross over study, the subject would be switched to the alternative treatment more than once so that replicate measures of the PK parameter for a given drug in the same patient are obtained).

Based on the observed values for the differences in log-transformed responses in each subject, one can derive a probability distribution for the magnitude of the average difference, which can be converted into the geometric mean ratio (GMR)^{2} of the parameters for the test drug and the reference drug in the population from which the sample comes. The distribution allows one to assign a probability that the true value of the GMR falls within the interval shown in figure 1 and is the basis of two statistical tests to determine if BE is demonstrated: in an approach called average bioequivalence (ABE) testing, the probabilities that 1) the GMR of test to reference values is less than the lower threshold and 2) the ratio is greater than the higher threshold must each be less than 5%.

The yellow line in figure 1 traces a probability distribution for the difference in log-transformed PK parameters for a drug for which BE has been demonstrated. However, in an identically designed study, the second drug fails (blue line). The averages of the differences and the GMRs from the trial data for the two drugs are the same (0 and 1, respectively), but for the failed drug the probability distribution of possible values is much wider, and the probability that the true GMR exceeds one of the regulatory limits is higher. Because of the wide probability distribution, the trial of the second drug was likely lacking in power (i.e., the probability that the test would conclude that the two drugs have the required similarity in terms of PK response when in fact they do). If more subjects had been enrolled in the second study, the drug may well have been shown to be bioequivalent––having information from more subjects would likely narrow the probability distribution so that the portions of the probability outside the BE limits would be small enough for a conclusion of BE.

## The challenge of highly variable drugs

Developers of generic drug candidates that are designated as highly variable (HV), i.e., drugs with high within subject variability in PK values, frequently encounter the second scenario in figure 1. The width of the probability distribution in a cross over study for establishing bioequivalence is driven by within-subject variability, which depends on the magnitude of differences in a given subject, the number of subjects, and the number of replicate measurements in each subject.^{3} If the within-subject variability is not known accurately in advance, or if faulty assumptions have been made about the true PK parameters, developers of generics may find that their trial is underpowered for establishing BE. One could recruit a very large number of subjects, or perform more measurements, but such approaches are costly, and there are ethical concerns related to exposing individuals to unproven treatments. Because an HV drug typically has a wide therapeutic index, or range of doses in which the brand name drug has been shown to be safe and effective, FDA has introduced a modification to traditional BE testing called reference scaling in which a BE test can be scaled to the intrasubject variability by effectively widening the limits for BE based on the degree of within-subject variability observed in the trial.

**Figure 1**. The two robability distributions are obtained from identically designed hypothetical cross over studies with identical numbers of subjects. The distributions are of the possible values of the geometric mean of the log-transformed ratios of test to reference values for the subjects in the trial. In these distributions, the area between any two values corresponds to the probability that this geometric mean lies somewhere between them. The drug corresponding to the yellow distribution has passed the BE test: the probability that the geometric mean is outside each of the regulatory limits is less than 0.05 (the value of alpha used in two one-sided tests). In a second study (blue distribution), BE has not been demonstrated, but there is not strong evidence that the test drug is not bioequivalent. This study may have been underpowered to demonstrate bioequivalence due to the larger variability of measurements in a given subject. The regulatory thresholds correspond to GMRs for untransformed PK values of 80 and 125%. *The figure is adapted from that of Davit, BM et al., (2012) Implementation of a reference-scaled average bioequivalence approach for highly variable generic drug products by the US Food and Drug Administration AAPS J.14:915-24.*

## The MSABE approach

The approach described above was developed for systemically acting drug products that can be compared through a PK study of plasma concentrations of drug metabolites. The high variability reflects within-individual variation in metabolism. Most locally acting drugs, however, require a different type of study to provide useful information about the bioavailability at the site of action. Variability observed in certain of those studies is due to the measurement process itself.

Acyclovir cream is a topical antiviral drug for which FDA has recommended a specific approach that incorporates reference scaling. The PK parameters for this drug are derived from drug concentrations measured in skin samples^{4} from human donors, and PK data obtained this way are highly variable. In the recommended Mixed Scaled Average Bioequivalence (MSABE) approach, one first calculates the within subject variability in PK measurements of the reference drug. If this variability is below a certain threshold, then average bioequivalence testing is performed. If it is above the threshold, then a reference-scaled test is performed in which the window for bioequivalence is widened as a function of the estimated within-subject sample variability for the reference drug. In this case, however, the calculated GMR itself must still fall within the traditional ABE thresholds.

Recent work at CDER* provides insight into how MSABE performs in a range of scenarios that generic drug developers might encounter: after deriving an equation for the passing rate of MSABE in terms of within subject variance and geometric ratios in the population studied, CDER statisticians performed Monte Carlo simulations that developers could apply to estimate the number of subjects needed based on what is known about these parameters for the generic drug.

## An adaptive trial design to establish bioequivalence for highly variable drugs (adaptive- MSABE)

Adaptive clinical trials i.e., trials that allow for prospectively planned modifications to one or more aspects of the design based on accumulating data from subjects in the trial, have the potential to be more flexible, informative, and efficient than traditional trials. One hindrance to their wider adoption is the difficulty in knowing how they might perform in practice (their operating characteristics). In the study mentioned above, the statisticians incorporated MSABE into an adaptive trial design that could potentially reduce the number of underpowered BE trials and carefully simulated its performance.

In the proposed design, bioequivalence is first evaluated in a modest number of subjects (perhaps as few as four) and a fixed number of replicates per subject (perhaps as few as six) using the MSABE approach with an alpha level of 0.0294,^{5} regardless of the power consideration. If the drug passes, the testing stops. If the drug fails, one evaluates whether the test achieved sufficient (80%) power to conclude that the GMR exceeds the egulatory limits. If the test is found to be powered at this level, the drug fails. However, if the first stage of the trial is not powerful enough, additional participants are enrolled to increase the power to 80% based on an alpha of 0.0294, and the drug is tested a second time on data from both stages of the trial using this alpha.

To understand how the adaptive design would perform in scenarios generic drug developers encounter, the researchers performed extensive trial simulations in which the GMR of the PK parameters of test and reference products and the within-subject variability were systematically varied while using a specific number of donors and skin samples per donor (Figure 2). The passing rates, whether the trial proceeded to a second stage, and the size of the sample calculated for the second stage of the trial were determined for each scenario. The statisticians also examined the effects of using a different approach to controlling Type 1 error (the Bonferroni method, which is preferred to using 0.0294 at each stage when the numbers of subjects in the two phases of the trial are not the same). Overall, they found that this approach worked effectively to control type I error and did not cause a marked increase in calculated sample sizes. A case study in which random samples of a real data set for an acyclovir generic that had passed a BE study were used to simulate trials using the a-MSABE design suggested the adaptive trial could be successful in reducing the number of required donors.

An R package, adaptIVPT, containing code, data, and documentation that CDER developed in a standardized collection format has been made available so that the drug development community can perform their own analyses, and as CDER researchers and their collaborators advance novel methods, expanded versions of this online resource are expected to support researchers and drug developers. One promising direction of future research may be exploring the use of Bayesian statistical approaches in the context of adaptive trials of highly variable drugs.

**How can this work advance generic drug development?**

CDER researchers and collaborators are developing new ways to statistically compare generic drugs with their corresponding brand name product, including complex innovative designs such as adaptive trials with novel in-vitro measurement techniques. These study designs and statistical methods will lead to more efficient use of subjects’ measurements, thus cutting development time and costs. This will, in turn, make more safe, effective, and affordable generic drug products available to the American public.

*Lim, Daeyoung, Elena Rantou, Jessica Kim, Sungwoo Choi, Nam Hee Choi, and Stella Grosser. "Adaptive designs for IVPT data with mixed scaled average bioequivalence." Pharmaceutical Statistics (2023).

**Figure 2**. CDER statisticians and collaborators conducted extensive simulations to determine the passing rate of a generic product in their adaptive-MSABE trial under various scenarios. The plots show the passing rates and the terminal sample sizes when the initial number of donors was 4 (with six replicates for each donor). The left panel shows passing rates for values of the true geometric mean ratio (GMR) and within subject variability (standard deviation) for the reference drug ((*s _{wr}, c*olor scale at right). Note that at a GMR near the regulatory limits, the MSABE rejects fewer drugs as intrasubject variability increases. The panel at right shows the sample sizes used to reach a bioequivalence conclusion with MSABE for different combinations of within subject variability and GMR.