FOOD AND DRUG ADMINISTRATION

  ADVISORY COMMITTEE FOR PHARMACEUTICAL SCIENCE

  8:30 a.m.

Thursday, November 29, 2001

  Conference Room

5630 Fishers Lane

Food and Drug Administration

Rockville, Maryland 20857

ATTENDEES

COMMITTEE MEMBERS:

VINCENT H.L. LEE, PH.D., Acting Chair

Department of Pharmaceutical Sciences

School of Pharmacy

University of Southern California

1985 Zonal Avenue

Los Angeles, California 90033

NANCY CHAMBERLIN, PHARM.D., Executive Secretary

Advisors and Consultants Staff

Center for Drug Evaluation and Research

Food and Drug Administration (HFD-21)

5600 Fishers Lane

Rockville, Maryland 20857

GLORIA L. ANDERSON, PH.D., Consumer Representative

Fuller E. Callaway Professor of Chemistry

Morris Brown College

643 Martin Luther King Jr. Drive, N.W.

Atlanta, Georgia 30314-4140

JOSEPH BLOOM, PH.D.

University of Puerto Rico

School of Pharmacy

4th Floor, Office 416

P.O. Box 365067

San Juan, Puerto Rico 00935-5067

JUDY BOEHLERT, PH.D.

President, Boehlert Associates, Inc.

102 Oak Avenue

Park Ridge, New Jersey 07656-1325

JOHN DOULL, M.D., PH.D.

Professor Emeritus of Pharmacology and

Toxicology and Therapeutics

University of Kansas Medical Center

3901 Rainbow Boulevard

Kansas City, Kansas 66160-7471

WILLIAM J. JUSKO, PH.D.

Professor of Pharmaceutics

Department of Pharmaceutics

School of Pharmacy

State University of New York at Buffalo

Buffalo, New York 14260

ATTENDEES (Continued)

COMMITTEE MEMBERS: (Continued)

ARTHUR H. KIBBE, PH.D.

Chair and Professor

Department of Pharmaceutical Sciences

Nesbitt School of Pharmacy

Wilkes University

176 Franklin Avenue

Wilkes-Barre, Pennsylvania 18766

MARVIN C. MEYER, PH.D.

Professor, Chair and Associate Dean

for Research and Graduate Programs

Department of Pharmaceutical Science

University of Tennessee

847 Union Avenue, Room 5

Memphis, Tennessee 38163

JURGEN VENITZ, M.D., PH.D.

Department of Pharmaceutics

School of Pharmacy

Medical College of Virginia Campus

Virginia Commonwealth University

Box 980533, MCV Station

Room 450B, R.B. Smith Building

410 North 12th Street

Richmond, Virginia 23298-0533

 

SGES/CONSULTANTS:

WILLIAM H. BARR, PHARM.D., PH.D.

Executive Director Center for Drug Studies

Medical College of Virginia

MCV West Hospital

Room 12-410

1200 East Broad Street

Virginia Commonwealth University

Richmond, Virginia 23298

STEPHEN R. BYRN, PH.D.

Charles B. Jordan Professor

Head, Department of Industrial & Physical Pharmacy

Purdue University

1336 Robert E. Heine Pharmacy Building

West Lafayette, Indiana 47907

ATTENDEES (Continued)

SGES/CONSULTANTS: (Continued)

LLOYD E. KING, M.D., PH.D.

Professor of Medicine

Dermatology Division

Vanderbilt University

1301 22nd Avenue North

Nashville, Tennessee 37232-5227

KATHLEEN R. LAMBORN, PH.D.

Professor, Department of Neurological Surgery

University of California, San Francisco

350 Parnassus Street, Room 805, Box 0372

San Francisco, California 94143

LEMUEL MOYE, M.D., PH.D., M.S.

Associate Professor of Biometry

University of Texas School of Public Health

University of Texas

1200 Herman Pressler Street

Suite E815

Houston, Texas 77030

 

GUESTS/SPEAKER PARTICIPANTS:

LESLIE Z. BENET, PH.D. (participating by telephone)

University of California, San Francisco

Department of Biopharmaceutical Science

533 Parnassus Avenue Z-68

San Francisco, California 94143-0446

SANFORD BOLTON, PH.D.

67 Phelps Avenue

Cresskill, New Jersey 07626

THOMAS J. FRANZ, M.D.

3417 Barrington Drive

West Linn, Oregon 97068

LYNN K. PERSHING, PH.D.

Research Associate Professor

Department of Dermatology

University of Utah Health Science Center

50 North Medical Drive, 4B454 SOM

Salt Lake City, Utah 84132

ATTENDEES (Continued)

GUEST/INDUSTRY PARTICIPANTS:

LEON SHARGEL, PH.D., R.PH.

Vice President, Biopharmaceutics

Eon Labs Manufacturing, Inc.

227-15 North Conduit Avenue

Laurelton, New York 11413

EFRAIM SHEK, PH.D.

Divisional Vice President

Pharmaceutical and Analytical Research and Development

Abbott Laboratories

Dept. 04R-1-NCA4-4

1401 Sheridan Road

North Chicago, Illinois

AVI YACOBI, PH.D.

Taro Pharmaceuticals

5 Skyline Drive

Hawthorne, New York 10532

NEVINE ZARIFFA, PH.D.

Therapy Area Director, Cardiovascular and Urology

Biomedical Data Sciences

GlaxoSmithKline Pharmaceuticals

mail code UP4205

1250 South Collegeville Road

P.O. Box 5089

Collegeville, Pennsylvania 19426

 

FDA PARTICIPANTS:

MEI-LING CHEN, PH.D.

DALE CONNER, PHARM.D.

MAMATA GOKHALE, PH.D.

AJAZ S. HUSSAIN, PH.D.

LARRY LESKO, PH.D.

STELLA MACHADO, PH.D.

RABI PATNAIK, PH.D.

JONATHAN WILKIN, M.D.

HELEN N. WINKLE

ATTENDEES (Continued)

 

ALSO PRESENT:

CHARLES BON

Biostatistician

AAI International

2320 Scientific Park Drive

Wilmington, North Carolina 28405

LASZLO ENDRENYI, PH.D.

Department of Pharmacology

University of Toronto

Medical Sciences Building

Room 4207

8 Taddle Creek Road

Toronto, Ontario M5S 1A8 Canada

CHRIS HENDY, PH.D.

President and CEO

Novum Pharmaceutical Research Services

5900 Penn Avenue

Pittsburgh, Pennsylvania 15206

KAMAL K. MIDHA, C.M., PH.D., D.SC.

346 111 Research Drive

Saskatoon, SK S7N 3R2 Canada

M. MOHAN SONDHI, PH.D.

Consultant (formerly of Bell Labs)

105 Intervale Road

Mountain Lakes, New Jersey 07046

K.L. SPEAR, M.D.

President

Spear Pharmaceuticals

Fort Myers, Florida

MARIO TANGUAY, B.PHARM, PH.D.

Associate director, Pharmacokinetics

MDS Pharma Services

Saint-Laurent, Quebec, Canada

C O N T E N T S

AGENDA ITEM PAGE

CONFLICT OF INTEREST STATEMENT

by Dr. Nancy Chamberlin 11

 

DERMATOPHARMACOKINETICS

Introduction to the Issues

by Dr. Dale Conner 15

Data Presentations

by Dr. Lynn Pershing 31

by Dr. Thomas Franz 47

by Dr. Mamata Gokhale 61

Introduction to Discussion Questions

by Dr. Dale Conner 71

Committee Discussion 75

 

OPEN PUBLIC HEARING PRESENTATIONS

by Dr. K.L. Spear 109

by Dr. Chris Hendy 113

by Dr. M. Mohan Sondhi 120

by Dr. Laszlo Endrenyi 123

by Mr. Charles Bon 128

by Dr. Mario Tanguay 132

by Dr. Kamal K. Midha 135

C O N T E N T S (Continued)

AGENDA ITEM PAGE

INDIVIDUAL BIOEQUIVALENCE

Introduction to the Topic and Discussion Topics

by Dr. Lawrence Lesko 143

Background & Concepts of Individual Bioequivalence

by Dr. Mei-Ling Chen 152

Results from Replicate Design Studies

in NDAs and FDA Database

by Dr. Mei-Ling Chen 162

Results from Replicate Design Studies in ANDAs

by Dr. Rabi Patnaik 173

Individual Bioequivalence: Have the Opinions

of the Scientific Community Changed?

by Dr. Leslie Benet 185

FDA Research Plan

Dr. Stella Machado 198

Discussion by Committee Members and Invited Guests 203

P R O C E E D I N G S

(8:30 a.m.)

DR. LEE: Good morning. I don't think you can see me, but I am Vincent Lee. I am acting chair of this committee. I'm also professor and chair at the University of Southern California.

I'd like to go around the table and have the cast introduce themselves, and please identify according to whether you are a guest or committee member or some other capacity. Bill?

DR. BARR: Bill Barr, Virginia Commonwealth University.

DR. LEE: Are you here as a guest?

DR. BARR: I'm here I guess as a special consultant.

DR. LAMBORN: Kathleen Lamborn, University of California, San Francisco. I guess I'm here as a consultant, too.

DR. MOYE: Lem Moye, University of Texas, Houston. I think I'm a prospective committee member.

DR. BYRN: Steve Byrn, Purdue. I'm an "ex-spective" -- I don't know what word we would use -- a retiring member of the committee and a special consultant.

DR. LEE: Actually Steve was the past chair, and he will step in in case I falter.

DR. JUSKO: William Jusko from the University at Buffalo. I'm a regular committee member.

DR. DOULL: John Doull, University of Kansas medical center, regular member.

DR. BLOOM: Joseph Bloom, University of Puerto Rico, regular member.

DR. ANDERSON: Gloria Anderson, Morris Brown College, Atlanta, member.

DR. BOEHLERT: Judy Boehlert, private consultant to the industry, member.

DR. KIBBE: Art Kibbe, Wilkes University school of Pharmacy, member.

DR. CHAMBERLIN: Nancy Chamberlin, Executive Secretary.

DR. VENITZ: Jurgen Venitz, Virginia Commonwealth University, regular member.

DR. MEYER: Marvin Meyer, emeritus professor at University of Tennessee, member.

DR. KING: Lloyd King, consultant, Vanderbilt dermatology.

DR. WILKIN: Jonathan Wilkin, Director of the Division of Dermatologic and Dental Drug Products, FDA.

DR. WINKLE: Helen Winkle, Office of Pharmaceutical Science, CDER.

DR. HUSSAIN: Ajaz Hussain, Office of Pharmaceutical Science, CDER.

DR. CONNER: Dale Conner, Director of Division of Bioequivalence, OGD, FDA. Speaker.

DR. SHEK: Efraim Shek, Abbott Laboratories, industrial representative.

DR. SHARGEL: Leon Shargel, Eon Laboratories, industrial participant.

DR. FRANZ: Tom Franz, dermatologist. Here as a speaker.

DR. PERSHING: Lynn Pershing, University of Utah, speaker.

DR. LEE: Thank you. I call on Nancy Chamberlin to read the conflict of interest.

DR. CHAMBERLIN: We will have a few members joining us by phone today. Patrick DeLuca, Nair Rodriguez-Hornedo, Mary Berg, and this afternoon we'll have Les Benet.

The following announcement addresses the issue of conflict of interest with respect to this meeting and is made a part of the record to preclude even the appearance of such at this meeting.

Since the issues to be discussed at this meeting will not have a unique impact on any particular product or firm, but rather may have widespread implications with respect to an entire class of products, in accordance with 18 U.S.C., section 208(b)(3), all committee participants with current interests in pharmaceutical firms have been granted a general matters waiver, which permits them to participate in today's discussion.

A copy of these waiver statements may be obtained by submitting a written request to the agency's Freedom of Information Office, room 12A-30 of the Parklawn Building.

With respect to FDA's invited guests, there are reported interests which we believe should be made public to allow the participants to objectively evaluate their comments. Thomas Franz, M.D., is a stockholder in DermTech International, a contract research organization that conducts research in clinical trials for companies developing drugs and products for use on the skin. Dr. Franz also receives consulting fees from Connetics Corporation.

Laszlo Endrenyi, Ph.D., has consulted with several pharmaceutical companies, both brand name and generic, on an ad hoc basis.

Lynn Pershing, Ph.D., has consulted on dermatopharmacokinetic issues for Aesgen, Clay-Park Labs, Alpharma, Biomedical Development Corporation, Taro Pharmaceuticals, and DPT Labs. She also has consulted with several pharmaceutical companies, for example Dermik Labs, GlaxoSmithKline, Roche, and Baker Norton Pharmaceutical, on other matters.

Leslie C. Benet, Ph.D., and his spouse are stockholders in Alteon, Pfizer, Watson Pharmaceuticals, Allergan, American Home Products, Elan Corporation, Schering-Plough, Amgen, Bristol-Myers Squibb, Cell Genesys, Genzyme Transgenics, Genzyme Biosurgery, GlaxoSmithKline, Eli Lily, Merck, Pharmacia Corporation, Procter & Gamble, Quintiles, Sangstat Medical, Valentis Inc., and Walgreens.

Dr. Benet is also involved in contracts and grants from R.W. Johnson, CV Therapeutics, Amgen Pharmaceuticals, Daiichi Pharmaceuticals, and Fujisawa Health Care. He also serves as consultant for Avmax, Incorporated, Roche, Biosciences, Amgen, Wyeth-Ayerst, Fujisawa, AstraZeneca, Searle, R.W. Johnson, and IMPAX.

In addition, Dr. Benet has received compensation from Finnegan, Henderson, Farabow, Garrett & Dunner, L.L.P., for services as an expert witness on behalf of American Home Products, Wyeth-Ayerst, ESI-Lederle, Geneva, Novartis, Teva, Zeneth Goldline, Mylan, and IMPAX Laboratories.

Dr. Benet has also lectured for Bayer, Glaxo, Genetech, American Society of Transplantation, Merck, several universities, and the FDA.

Finally, Dr. Benet is the founder and chairman of the board of AvMax, Inc., president of Avalon, Inc., and is co-founder of Oxon. He also serves as a member of the corporate boards for Alteon, IMPAX Labs, InforMedix, Institute for One World Health, Josman Labs, Molecular Delivery Corp., Main Therapeutics, Inc., Roche Biosciences, UMD Inc., Silico Insights, Agouron, Allergan, Alza, Amgen, Ares-Serono International, Axys Pharmaceuticals, Biochem Pharma, Boehringer-Ingelheim, CV Therapeutics, DuPont Pharmaceuticals, Fujisawa Health Care, Genentech, Basis Therapeutic Corp., Pharmacia, Procter & Gamble, R.W. Johnson, McNeil, Ortho, and Wyeth-Ayerst.

With the exception of One World Health and Roche Biosciences, Dr. Benet has vested and unvested stock options in these firms.

We would also like to note for the record that Leon Shargel, Ph.D., Eon Labs; Efraim Shek, Ph.D., Abbott Laboratories; Nevine Zariffa, Ph.D., GlaxoSmithKline; and Avi Yacobi, Ph.D., Taro Pharmaceuticals, are participating in this meeting as industry representatives, acting on behalf of regulated industry. As such they have not been screened for any conflict of interest.

In the event that the discussions involve any other products or firms not already on the agenda for which FDA participants have a financial interest, the participants are aware of the need to exclude themselves from such involvement and their exclusion will be noted for the record.

With respect to all of the participants, we ask in the interest of fairness that they address any current or previous financial involvements with any firm whose product they may wish to comment upon.

DR. LEE: Thank you very much, Nancy.

We're going to have a busy day today. We have two important issues. This morning it will be on dermatopharmacokinetics. This afternoon it is going to be on individual bioequivalence. We have before lunch an hour for open hearing and I understand that we have six presenters.

Before I turn the floor over to Dale Conner, let me alert all the speakers to stay on time because I do have an electronic gavel, which I did not use yesterday, and I hope that I do not need to use it today because the committee does need the full 30 minutes to discuss three very important issues. Thank you.

Dale?

DR. CONNER: One question. Does that gavel give shocks to the speakers?

DR. LEE: Do you want to find out?

(Laughter.)

DR. CONNER: Sure. I'm willing to be a guinea pig.

My task today is to lead off this very exciting discussion, hopefully not too exciting, and to introduce you to the topic for those of you who are new committee members or perhaps not quite so familiar with the very long and illustrious and kind of controversial history of this particular technique, or proposed technique.

I'm going to start off with a little discussion of bioequivalence in general because it's been my observation certainly that in some of the past discussions, both committee members, the observers, as well as unfortunately some of the FDA people didn't really seem to quite understand the object of what we're trying to accomplish with bioequivalence. We at the agency use bioequivalence, obviously, to approve generic drug forms of innovator or reference products, but also the innovators use these same techniques to test or to gain approval for changes in their existing formulations. When you explain it, it doesn't seem so very complicated, but it can be very confusing.

I'll start out with a little bioequivalence 101, or at least my version of it, and then I'm going to go into very brief, and hopefully simple, explanations of this technique, which will then be expanded upon by the later speakers, and perhaps show you somewhat of a history of what has gone on in this topic. It's quite a long and checkered history, I guess you could say.

To start off, I'll give a personal definition of bioequivalence. You can think of it as we practice it, certainly in generic drugs, as pharmaceutical equivalents whose rate and extent of absorption are not statistically different when administered to patients or subjects at the same molar dose under similar experimental or clinical conditions. It's important to remember that when we're talking about an ANDA or perhaps a change in an existing product for an NDA, that we're talking about pharmaceutical equivalents. That's the first point of confusion that many people in the outside world have.

When we talk about pharmaceutical equivalents, we talk about the exact same drug substance. So, for comparing two products in an equivalence, say for an ANDA, the starting understanding is that they have the exact same drug substance. But there are other things that need to be the same to be called pharmaceutically equivalent. They're the same dosage form. So, if we're looking at tablets, we're not comparing that to a capsule or a solution. If we're looking at an ointment for topical administration, we're not looking at a cream or a topical solution. So, the dosage form is the same, and the intended use and generally the labeling is the same as well.

That's the first thing that people understand. We're not talking about therapeutic substitution, where one substitutes or studies a totally different drug substance or a totally different type of product. It's very, very similar products containing the exact same drug substance. That's point number one.

The purpose of doing this at the end is to establish therapeutic equivalence of these products. What a clinician wants to know is that if my patient is switched from an existing product to the other product that I'm going to see the same therapeutic effect, and therapeutic effect in this case encompasses both the desirable and undesirable characteristics. The therapeutic or efficacy part, as well as the toxicity profile, shouldn't be different either.

It's the FDA's position that generics or the institution of a new dosage form of an existing NDA that's approved can be substituted for the other product or the reference product without any other adjustment in dose or other additional therapeutic monitoring that wouldn't ordinarily be done in the normal course of managing that patient.

I put a last statement, which is very true for oral products, is that the most efficient method of assuring therapeutic equivalence is to assure that the formulations themselves perform in an equivalent manner. We have a number of ways and proposed ways to try and do that.

First off, this is my simple scheme, and since in past discussions in the committee of DPK we often tried to draw correlations or draw understanding from what's happening or what's alleged to be happening with the DPK from the oral route, and there have been statements that it's very similar, and there have been statements that it's not very similar at all. I thought I'd start out by taking a simple case and discussing bioequivalence in the oral route. My next slide or two will have a very similar depiction of what may happen when you administer topical products for the skin, but let's take the simple case first.

It's important to realize that when you give an oral product, or perhaps any pharmaceutical product, it comes in what you might call a package or dosage form: a tablet, capsule, cream or ointment. And a critical event to be able to get therapeutic results from this product is the drug substance or active drug component has to leave the formulation and go into the patient at some point. That's really a very, very critical step. In bioequivalence that's really what we're trying to measure, the characteristics of this dosage form that allows the drug to leave the dosage form and be available to the patient.

When you really think about it and you look at all the steps here, in an oral drug the drug is in solid dosage form most of the time, it goes into solution, usually in the GI tract, and it goes into solution only when it's released in solid form from the dosage form. Eventually goes through the gut wall into the blood, eventually carried to the site of activity and leads to a therapeutic effect, either desirable or undesirable.

There are obviously many more boxes that could be added to this, metabolism, routes and so forth. This is a very ultra-simplified view just to illustrate the course of events.

Fortunately for oral products, we have a more or less nice chain of events, of which we have the blood, which we can easily measure blood concentrations. We can extrapolate back and tell how this particular dosage form is performing, and by performance I mean how is it releasing the drug to the patient.

Also clinicians are kind of happy with this because the blood is also related to the drug appearing at the site of activity, so you can also get some information from the blood about the therapeutic effects as well, so if you'd like to look at that directly.

In this particular systemic availability of an oral product, we have a very nice part where we can sample, we can do essentially a single test, and we can really answer most if not all of our questions about how the comparable dosage units or products are performing in relation to each other.

The blood is very nice because most of the time it's linear in response and it's not very sensitive to the dose you study it at for most of the products; whereas therapeutic effects, if you assess them for this purpose, say, by doing a comparable clinical trial, are not quite so linear, and I'll discuss that a little later. So, that's the simple oral case.

Now this, as you'll see, is very, very simplified, and I have two versions of this. The first version, for those of you who might be in favor of DPK, this is the version you would want. I have another one which kind of expands on this for those of you who are not in favor of DPK. So, I don't want to act like I'm just presenting a one-sided view of this.

My ultra-simplified scheme here has changed somewhat. I still start out with a dosage form, of which drug has to be released out of the dosage form and made available to the patient or the subject in our studies. Then we have that drug being released from the dosage form going into tissue, and I've got it lumped all the tissue of the skin together in one place. Again, as anyone will know, that's really kind of a leap in concept. That's about where we sample DPK. Obviously, we're sampling the stratum corneum, which isn't all the tissue or all the routes that the drug uses to get into the skin.

After achieving this step, it's distributed to its site of activity, which is not too far removed from where the drug is released to the tissue. It results in therapeutic effects, and at some later time possibly drug appears in the blood.

I don't want to really say this is an after-thought, but it happens after the part that most of us are interested in, which is site of activity therapeutic effects. Eventually that can lead in some products to some systemic effects as well.

So, my order of things from my first slide is changed around a bit. You'll see that the very nice way of doing things, taking a simple test from the blood, is not exactly so straightforward. Some people have proposed that you can still use blood to kind of back-extrapolate here, but that's generally not a widely accepted view.

Most of the time what we do right now, which I'll go into a little bit more in subsequent slides, is assess this box right here. We look at application to the skin and we look at, say, two pharmaceutically equivalent products under similar conditions, and we see if they give us the same therapeutic effects. That's basically how we approve most of the topical drugs now.

The proposed method that you're looking at today is something that samples here, very close to the event we're interested in, but it samples one route, one type of tissue that that drug is entering into.

Now, here's the variation on that if you don't like DPK. What I've done here is perhaps one step more realistic as far as my model goes. I've separated what I called tissue in the previous slide into separate boxes. We have up here drug in stratum corneum, we have drug through follicles, which is brought up constantly in this discussion, that this does not represent drug in follicles, and drug through other methods, however many you want to list, in sweat glands, and various other ways to bypass the stratum corneum.

The first slide, the DPK lovers' slide, would say that I'm measuring only the stratum corneum, but the stratum corneum tells me enough about the other things that I can make inferences about all of these together, and this is really all I need to do. I guess people who don't believe in DPK say no, these boxes aren't the same. By sampling stratum corneum, it does not tell me anything useful about this route or other routes as well.

So, stratum corneum is really at best a partial picture, or if this is a major route to the site of activity, then it may not tell me much useful at all. That's basically one of the negative beliefs about DPK. You really have to believe that DPK and sampling the stratum corneum tells you a lot about the whole picture. Anyone who knows the skin knows that these may or may not be the same. Probably are not.

Again, the rest of my scheme is the same from the previous slide.

As I mentioned, what we do now is generally for these products, with a few exceptions, we do a bioequivalence study with clinical endpoints, so we take real patients with the particular disease state. We apply the products, often in a parallel type of study, and study the clinical responses between the two products that are being compared. Usually if it's an ANDA, it's the reference listed drug product in the Orange Book. Quite often, that's the brand name product approved under an NDA and the ANDA product.

The critiques of this, although we certainly have approved many products on this basis, is that that approach is somewhat extensive. It really often requires a sizeable number of patients. The variability is quite high and there is some belief that this may be insensitive to differences in formulation performance. I'll tell you the theoretical basis of that statement.

BE studies, for certain of the products such as corticosteroids we have alternative methods that depend on pharmacodynamic effects. For instance, with glucocorticoids they cause a blanching or lightening of the skin on a temporary basis. You can relate the potency of the steroid in the release of the drug from the product into the skin by how much of this blanching response you get. Over many years that's been developed into a technique for looking at comparative release of drug from these products. So, for topical corticosteroids very often we can use this pharmacodynamic endpoint type of study, and there is a guidance out on that.

There is also some data on doing in vitro drug release, although that's seldom if ever used as the primary study to get a topical product approved. Generally it's some type of in vivo study and usually in the top category.

Just a brief comment, and this can be generalized, I think, to any kind of clinical response type of study. As I said, blood levels usually are very nice and linear. If you study them at a slightly higher or slightly lower dose, it doesn't change the response you get within variability.

If you remember from your pharmacology training, the responses, both clinical and pharmacodynamic responses generally fit into some kind of sigmoidal relationship, and it's sigmoidal because I'm displaying it on a log dose scale. That's important to point out. The important parts of this response versus dose are that we have a part where we give a dose that's so low that we're not getting any discernible response, and then at some point as we're increasing the dose -- and I've drawn it very steeply here, it can be somewhat flat -- it increases with small increases in dose until it finally gets to a maximum response. If you go beyond that and give more dose, you don't really get much more response.

Eventually in clinical practice, if you keep going out in many doses, you'll get other responses like undesirable toxic responses, but this particular continuum, if you're looking at receptors, you may have occupied all the receptors with drug and you can't occupy or stimulate more than 100 percent of the receptors, so you might get a plateau at the top.

When you look at comparing two products, two very, very similar products which are designed to be nearly identical in their performance, you might see from this graph that it's very important that you study it at the right dosage level. If you study up here, where you're at the top of the dosage response curve, you can have a very large difference between the products at the relative dose that they deliver in relation to each other, and simply get little if any difference in the response you're measuring, whatever that clinical response is.

If you studied it in this dosage range, where you're on this steep part, you get actually quite good sensitivity, or potentially good sensitivity of the difference in those two products. So, that's a really critical aspect of looking at when you're doing clinical equivalence type of studies because you could end up doing a very large, every expensive, somewhat complex study and come out with virtually no sensitivity to tell the difference between products. So, something to consider when you do a type of clinical endpoint studies.

Just briefly -- and this will be expanded upon by the other speakers -- the theory, as I said, is that DPK, or dermatopharmacokinetics, is a pharmacokinetic approach applied to drug concentrations in stratum corneum. So, you're literally sampling the stratum corneum after drug administration and looking at the appearance and perhaps disappearance of drug from that particular tissue.

The method, very briefly, is that tape stripping is used. If you apply tape to the skin, it strips off a layer of the stratum corneum, and by doing that successively you can kind of drill down or sample down into the stratum corneum until there's no more left, simply by taking successive tape strips, and then the investigators simply take those strips and assay them for drug. So, the uptake and elimination from the stratum corneum are determined.

And it's alleged the differences in formulation are determined -- the advantage is you can do this at the same time in the same individual. So, you can apply both products to the same individual at the same time, which is an advantage over even our oral products, where even though you cross the treatments over and you use the same subject, they still have to be studied at different times. So, it's kind of one of the nice features of doing some of these dermatology studies.

A brief history, and I won't dwell on this, but I have a couple of slides on the very long history of this particular topic. As you'll see, even I was amazed that we started back in the late 1980s, '89, and I'll just flip through these. I don't want to dwell on all of the facts. We started off with some workshops back in '89. This was a constantly discussed topic. We had a couple of advisory committees. That was the Generics Drug Advisory Committee, which I believe is what this committee used to be called. Some international meetings, some more workshops, more international meetings, some trade association meetings, and more advisory committee meetings of course, and some expert panel meetings. Until finally we get to an AAPS symposium and the last joint advisory committee meeting, and that was a meeting that combined this committee with the Dermatology Advisory Committee in a joint meeting. For those of you who were here, you remember that was kind of an exciting meeting as well.

So, finally we get to the issues that we considered or thought about today. I listed three-year. The first is really kind of a fundamental one that underlies perhaps all of the discussion. Is the DPK method an appropriate approach for establishing bioequivalence of topical drug products? It's something that, in all those years that we've been talking about it, still hasn't been resolved. The information that will be presented today is really more information, some scientific studies and so forth that are interesting and will I think cause a lot of thought and discussion.

The second one is, are results and conclusions derived from the DPK method consistent within and between laboratories? Because a regulatory method, even if the first question is true and accepted, really isn't a very good method if you can't reproduce it between labs. If it's very lab-dependent and one lab will get a passing grade on comparison of certain products, and two other labs that do this technique get a very different answer -- and that happens quite a lot -- it really isn't a suitable method if you can't reproduce it from lab to lab or from time to time within the same lab.

The last method is even if one and two are true, if only one or two laboratories can do this, and it's so difficult or expensive or arduous to set up that no one else in the world can possibly do it, it's probably not a good regulatory method either. So, even if you accept one and two, the method needs to be not too difficult to set up, not too difficult for experienced investigators in other fields to set up and get running, and it shouldn't take millions of dollars or years to set up this method.

You'll see that we've done some work in our FDA labs, who started out having no experience in this method, and how long it took them to get up and running and getting data where they were happy with the performance of the data. It's important that a regulatory method be reasonably easy to set up and get good results without an extraordinary effort or time.

That's the end of my introductory talk.

DR. LEE: Thank you, Dale.

Are there any questions from around the table for Dale?

(No response.)

DR. LEE: Thank you.

We have three presentations by invited guests, and I'd like the committee members to listen very carefully because we need to have answers to those three questions. First I'd like to invite Lynn Pershing to the podium.

DR. PERSHING: Good morning. Today I'm going to present some work about bioequivalence assessment of three 0.025 percent tretinoin gel products, and compare the dermatopharmacokinetic method with the clinical trial efficacy method.

I want to emphasize, before we start, that there are many players and there are many groups of people who influence whether a topical product gets to the market and actually is used in patients, which is the consumer, the ultimate individual that we're all supposed to be focused on. Despite our individual missions, we have physicians, the innovator industry, the health care, insurance people who are really controlling the drug products we use, clinical research organizations, scientists, generic industry, and also ultimately the FDA, who decides whether that product actually ever gets used in the consumer. The important point here is that we all have a common goal, and that common goal is to provide the best therapeutic drug products that we can to that consumer. So, let's keep that in mind as we go forward.

The hypothesis we're testing today is the that the dermatopharmacokinetic method will assess bioequivalence between three tretinoin gel products, similar to a clinical trial efficacy method.

So, the issue in bioequivalence really is, how much different can two products be and still be bioequivalent in an individual. Very important to the issue is understanding what it means to say bioequivalence in a topical drug product. First of all, they should have the same concentration of active, which is the drug in this case, and the second very important issue when you're trying to deliver a drug into the skin and looking at bioequivalence, is that they are Q1 and Q2 similar.

Q1 is qualitatively similar in the vehicle composition, that they have the same vehicle components. Q2 is that quantitatively they're similar, that the concentration of those vehicle components are as similar as they can be. In contrast, bioavailability is when they are Q1-Q2 different, that the vehicle composition may be composed of different vehicle components. They may have Q2 differences where there's a different concentration of the active. Here we'd be looking at three different concentrations of a particular drug in a similar vehicle, or that the vehicle components are actually even different.

In the case that we're going to talk about today, the three 0.025 percent tretinoin gel products, there were three that were decided to be studied: the innovator and two test generic products. One test product was Q1-Q2 similar to the reference product. The other was different. So, it would be termed a Q1-Q2 different product. We're going to see, then, how DPK can either differentiate these three products based on whether they're Q1 or Q2 similar.

I want to show you the clinical results first because that's our reference point at this stage. We're trying to compare does DPK actually predict the clinical results. And we know that it was an acne trial. Actually information is available from the FDA web site. Two parameters were compared: efficacy and safety. The products that were Q1-Q2 similar were bioequivalent for both parameters. The products that were Q1-Q2 different were not bioequivalent in efficacy and they were not bioequivalent in safety. In fact, the test product was two times safer than the innovator product, but it was less effective.

Important in drug delivery issues for the skin is that when you apply a topical product to the skin, that drug, as Dr. Conner has discussed, has to leave the vehicle and partition into this 10- to 20-micrometer thick skin layer that controls all drug uptake into the skin. This stratum corneum layer is easily accessible. It's nonviable. It's exfoliated at one layer every day, and it's easily removed with adhesive discs.

Important to remember from fixed second log diffusion is that when you apply an external chemical drug in this case to the stratum corneum, you set up a concentration gradient through the skin. The highest concentration will be in the stratum corneum, and you have a concentration gradient even through the stratum corneum. If you don't get drug into that stratum corneum, you rarely get a therapeutic effect.

We can collect that stratum corneum. We can harvest that stratum corneum, using adhesive discs. In our study we've used D-Squame adhesive discs. They're commercially available. They come on a polymer backing, 10 individual discs that can be bought in different sizes, either a 1.37 centimeter or 2.2 centimeter diameter. One of these panels of 10 discs is used for each skin site and analyzed in an individual.

Using this product we've noticed that the first 10 skin strippings at a particular site removed about 325 to 350 micrograms of stratum corneum, and this is done in 12 people, four sites in each person. And the variability reflects the differences between people. If we then took another series of 10 discs and tape-stripped that site, we'd see that half of the stratum corneum was removed.

That could be an issue when you're quantitating how much drug is in those skin strippings, how many skin strippings should you collect. What I want to show you here is if you've adequately removed residual drug and you're quantitating them, how much drug is actually in the stratum corneum, and we know that drug is in a concentration gradient through the stratum corneum, you should see that there's more drug in the -- this is percent dose applied of total retinoids -- you should see there's more drug in the first skin strippings than the second set of 10 skin strippings, and indeed we do. But that could be influenced by the amount of stratum corneum that you actually remove.

But you note that even if you correct the percent dose applied for the amount of skin removed, you still see a concentration gradient through the stratum corneum. That's a very important validation step in this work.

To be able to adequately analyze the drug in the skin strippings that you collect, you need to have a validated bioanalytical assay. What I'll share with you is we developed an assay for tretinoin and its isomer analog, isotretinoin, with the recovery from these adhesive discs of greater than 87 percent for both, with no interferences from the matrix or the stratum corneum with the analytes of interest or the internal standard, with an appropriate linear regression, accuracy of greater than 85 percent, a precision less than 11, with a limit of quantification of four nanograms per ml. They were also stable to pre- and post-extraction stability.

The other thing you have to do is develop a reproducible method to actually dose the drug to the skin site. For this we use a 250 microliter Hamilton syringe that's aliquoted at 5-microliter intervals, and we validated that among the three products -- less than 10 percent coefficient of variation -- they reproducibly deliver a 5 microliter dose.

The experimental design in this study was such that we wanted to capture both the uptake of tretinoin and isotretinoin, as well as total retinoids into the skin, as well as elimination. To do that, we performed a pilot study to determine the appropriate time points to capture that descriptive profile. We washed all ventral forearms of the subjects that were enrolled in the study an hour before the study. 7 minutes prior to application, we collected skin strippings from untreated control sites, and at time 0 applied the drug for either 15 minutes, 30 minutes, 1 hour, or 1.5 hours to the subjects, and then for the elimination phase, after one-half hour, removed the drug and looked at 3, 6, 9, and 12 hours after elimination. These are the time points that were found in our pilot study to best describe the innovator products' uptake and elimination profile into the stratum corneum.

The product application randomization schedule was as follows. We used both right and left forearms. We blocked all uptake time points to the right arm and all elimination time points to the left arm. The different doses in this case were randomized to four regions, 1 through 4 for each subject. The elimination time points were randomized on the left arm per subject. Product A, B, and C were randomized to either site 1, 2, or 3 in each subject, and that randomization schedule was held at all regional points.

The demographics of the study included 49 subjects, about equal numbers of males and females, with an average age of 30.7 years, representing 41 caucasians, 6 Asians, and 2 Hispanics, consistent with our percentage of ethnic distribution in the state of Utah. And there was a hand preference: 45 right-handers versus 4 left-handers.

It's important when you're doing DPK to have surface area considerations of the treated site versus the adhesive size that you use in collection of the stratum corneum. In our study we used a skin site surface area of 1.2 centimeter diameter. The adhesive disc was 1.3 centimeter diameter, and so when you overlap adhesive on the skin site, there was a slight overlap beyond the surface area of the treatment site.

So, data. Data is what we live for, right? This is the three products of tretinoin gel, 0.025 percent, the three different products. What I want to draw your attention to is that the products that are Q1 and Q2 similar produce an identical tretinoin uptake and elimination profile. The product that is Q1-Q2 different from the innovator produced a profile that was 60 percent of the innovator.

When we analyzed the data with biostatistics, we see that the 90 percent confidence interval, which for acceptance of bioequivalence is set to be 80 to 125, that for both Cmax and the AUC 0 to the last detectable endpoint, that the two products that were Q1 and Q2 different failed bioequivalence, both for Cmax and the AUC parameters. The products that were Q1 and Q2 similar, however, showed bioequivalence at both parameters, Cmax and AUC. Therefore, based on just tretinoin, this product would fail bioequivalence and that product would pass bioequivalence.

Tretinoin has a natural isomer, isotretinoin, and so even in the product you have about 5 percent of isotretinoin, so it's important with products like this that you also measure the isomer analog, which is isotretinoin.

In the same skin sample, you follow the isotretinoin, and again you see the products that are Q1-Q2 similar produce identical profiles of isotretinoin. The product that was Q1-Q2 different from the innovator produced a profile of about 60 percent of the innovator. Biostatistically the products that are Q1-Q2 different, it fails bioequivalence for Cmax and AUC. The product that was Q1-Q2 similar passed bioequivalence criteria.

Another method to look at the product that has isomers involved is to do the total concentration of all the retinoids, and so we also analyzed total retinoids. This was done again on 49 people, and you'll see the same result. The products that are Q1-Q2 similar produce identical profiles, and the product that was different produced a different profile. And again, biostatistically the products that are the same pass bioequivalence. The products that are Q1-Q2 different fail bioequivalence.

So, if we compare the three methods, DPK, clinical efficacy and clinical safety, we see that the products that are Q1-Q2 similar pass bioequivalence in all three methods. The product that was Q1-Q2 different failed DPK bioequivalence, failed clinical efficacy bioequivalence, and failed clinical safety.

In summary, DPK is a good method for bioequivalence assessment of topical drug products. It's objective. It's sensitive. It's discriminating. It's precise, accurate. Most importantly, it's scientifically and clinically relevant. And it's comparable to pharmacokinetic methods used for oral solid dosage forms.

In conclusion, then, DPK results predict the clinical efficacy and safety results. DPK is a sensitive, reproducible, and valid method for bioequivalence assessment of topical drug products.

Thank you.

DR. LEE: Thank you, Lynn.

Any questions for Lynn? John?

DR. DOULL: You said you looked at males and females and ethnicity. Did they have any effect at all on the results?

DR. PERSHING: We saw no statistical difference between males and females, and no statistical difference in the ethnic groups that we evaluated.

DR. DOULL: Were these all similar age people?

DR. PERSHING: The average age is 30.7, plus or minus -- I think it's in the handout. But they were, yes.

Obviously important here, and this is a nice aspect, as Dr. Conner has brought up, is that all the drugs are evaluated in the same person at the same time. If we had individual groups of young, middle, and old people, maybe we could see a difference. But in this case the demographics that are in your handout describe the people that we used.

DR. LEE: Dr. Conner?

DR. CONNER: I'd like to make two statements or clarifications from the FDA standpoint.

First off, Dr. Pershing's definitions of BA and BE are not the regulatory or FDA definitions, so it's important to point that out. Especially the way that she has defined BA is not our definition, and I actually don't even agree with it.

DR. PERSHING: I'll just say that's quoted directly from the draft guidance.

DR. CONNER: The other thing is, it's important to point out the approval criteria for these three products. The Q1 and Q2 obviously is important, but it's important to note that the two products which Dr. Pershing was referring to as Q1 and Q2 are approved as equivalent products. One is an NDA. The other is approved under the ANDA process and should be considered equivalent and switchable for the products. Officially in the Orange Book, those are substitutable products.

The third product, which does not show bioequivlanece under DPK or even the clinical, is approved under the NDA process, so it's not considered officially switchable. It's a stand-alone NDA product, and it's important to know that when you're looking at this.

And obviously since it has a different NDA and it has totally different labeling. It has its own labeling, and it has its own package of information on which it was approved, of which these clinical studies were just a small portion. It has efficacy and all of the things that are needed to approve an NDA.

DR. LEE: Two more questions.

DR. KIBBE: Could we also conclude from this data that we don't need to do any bioequivalency testing if they're Q1-Q2 equivalent when we first look at them? That it predicts the outcome and why do the results?

For a long time with oral products we've tried to find the mystery method that would allow us to test in a laboratory, and we don't have to go into humans. I was wondering if we're at that point here.

DR. PERSHING: You can still be Q1 and Q2 similar and, because of manufacturer processing, not be equivalent DPK because there are a lot of things that go into drug delivery of topical drug products for the skin. Particle size, and particle size still is important in topical drug delivery just as it is in a solid oral dosage form. So, all the physical parameters that influence bioavailability for oral products also pertain very similarly to topicals.

If you only looked at physical parameters, you wouldn't know how it performed. So, this really is a performance evaluation.

DR. LEE: We'll take two more questions and then move on. Gloria?

DR. ANDERSON: What evidence do you have that the uptake mechanism and perhaps rate between the two that are the same and the one that's different are the same?

DR. PERSHING: As I understand the FDA issues, it's that rate and extent is important. You'll notice that the rate may not have been so different, but the extent was. That's why the Q1-Q2 different product didn't obtain the same DPK profile. The Cmax was lower and the AUCs were therefore lower. Both rate and extent are important and you have to have both of those.

Now, was the rate different? I didn't calculate the rate, but you can see because they overlap on the uptake part of the curve, they're quite similar, but the extent was different, so the Cmax was different.

DR. ANDERSON: Actually my question really is, did that affect the outcome, the results that you obtained?

DR. PERSHING: I see what you're saying. Actually what you're really probably referring to then is in your experimental design, what if the Q1-Q2 product just attained its Cmax? The Tmax was different. And that's a valid point. When you're doing bioequivalence testing, you have to meet the reference product profile.

DR. LEE: Kathleen?

DR. LAMBORN: Your comment that you could be Q1-Q2 the same, and yet be bio-inequivalent raises a question about the extrapolation of the results that you presented here, where you knew right off the bat that you were not Q1 and Q2 equivalent. Would you expect that the results, if they were Q1 and Q2 equivalent, would be more subtle and therefore you might, in fact, not be able to pick up the differences? In other words, you've used this where you had Q1-Q2 not similar as a justification for saying, see, we're sensitive and specific. But you don't have the Q1 and Q2 the same, so maybe you're working with a bigger difference than you would really be wanting to try to discern. If that makes sense.

DR. PERSHING: We've analyzed about 20 different topical drug products from five different drug classes, and frankly we noticed a lot of interesting things. Usually I don't even know what the vehicle composition is, and I, frankly, didn't even know the vehicle composition when I did this study. I did know they were Q1 and Q2 different. I did know that they had been evaluated in a clinical study, and that's it, when I did this work.

We have looked at products that were supposed to be Q1-Q2, and a lot of times they don't pass DPK. And if you further investigate as to the mechanistic basis of that, sometimes you can get back to physical parameters.

But what I'm trying to say is that you need a performance test. You can't just look at vehicle composition because sometimes they can be Q1 and Q2 different, and they might produce a similar profile because all the parts of the product can influence drug delivery.

I'm just saying if you're Q1-Q2 up front, you have a much better chance of passing DPK. We can take Q1-Q2 different products. Every time I have, I see a difference in DPK. And experience just shows that.

DR. LAMBORN: But you have found some products which are Q1-Q2 the same which have consistently demonstrated differences using this method.

DR. PERSHING: I just presented some work at AAPS that showed that five different lot numbers of a particular innovator product is not always bioequivalent. So, you do need to have a performance test to evaluate for these kind of differences. Just because we manufacture, it doesn't mean it's always perfect.

DR. LEE: Very well. Thank you very much.

Bill?

DR. JUSKO: When we study oral drug products, we allow the full natural time course of absorption and disposition to be followed. In this technique, uptake is followed from 1.5 hours, so my question is, if you had allowed the full natural time course to be examined, how likely would you have been then to possibly see differences between products, possibly have a different interpretation of the entire set of results.

DR. PERSHING: Excellent question.

DR. LEE: Is it going to be a brief answer?

DR. PERSHING: It's going to be a brief answer.

Each product, each drug, each concentration may have its own unique profile, and that's why you do a pilot study to determine what the appropriate time points are. For a gel product, it could be a very different time course than it is for a semi-solid cream or ointment. In fact, most of the studies are done over a 24-hour time point.

But what we found in our pilot study is that in Utah, in my subject population base, that if I applied and left it on for 4 hours, the drug was already eliminating. It starts to eliminate even before 1.5 hours. That's why those time points have been chosen because we had done a pilot study. We found what Tmax is and the half-life, and we have experimentally designed the time points of collection for a pivotal study that are pertinent to those two parameters.

DR. LEE: Thank you. I think it's time to move on. Dr. Franz, are you ready?

DR. FRANZ: Yes.

This study that I'm going to present, or the work I'm going to present, was conducted at DermTech under the sponsorship of Spear Pharmaceuticals, and was really done at the sort of general request that the FDA made at many of these prior meetings for other people to get involved, the industry, academics, so that there would be a wealth of data from different labs that could be used to evaluate the suitability of DPK. So, it was in that spirit that Spear Pharmaceuticals sponsored this study.

The work is similar in that presented by Dr. Pershing in that we examined two of the three products that she examined. We have looked at the Avita product, which is reported not to be clinically as effective, and that was compared to Retin-A, the innovator product. A lot of the details of the studies are the same, but there are a few differences.

In the first study that I'm going to report, this was a study in 36 subjects, 14 females, 22 males, 15 caucasian, 15 Hispanic -- so the demographics are a bit different than in the prior work that was presented -- 4 African-Americans and 2 Asians. Mean age of 32.2 years.

In our study, this one and the next one I'll present, the situation is this. Two-by-two square sites are demarcated on the ventral forearms, and there's basically two rows on each forearm. Randomization of these two products is between paired sites here so that for instance, if the lateral site gets the test product then the medial site will get the reference. So, they will always applied as pairs. But they will be randomized, medial to lateral, and they will also be randomized from proximal to distal.

Like Lynn, we use one arm for the absorption phase, one arm for the elimination phase, and again, this is randomized. We are using 4 square centimeter site areas. We applied 20 microliters, so this is the same dose that was used in the prior work, basically 5 microliters per square centimeter. It's applied with a positive displacement pipetter and then evenly spread over the area with a smooth glass rod. We cover the forearms with a non-occlusive aluminum screen that sits up above the forearm so that there's no possibility of touching the dose sites. Then because this is a light-sensitive compound that we're working with, they're covered with a cotton sleeve to minimize light exposure.

Based on the pilot work that we had done, we came to a similar conclusion that Lynn came to, that basically peak absorption seemed to be reached at 1.5 to 2 hours. But we tend to want to leave the drug on a little bit longer than that, so our absorption phase actually goes to 4 hours with these points being the sampling points, half, 1, 2, and 4 hours. And then the elimination phase goes much longer: 8, 12, 24, and 48 hours.

At each one of these times in the absorption phase, the sites are blotted three times with Kimwipes. They are wiped once with a dry cotton-tipped swab, and then stripped 22 times with Transpore tape. Basically the Kimwipes and the dry cotton swab is to pick up any liquid or anything that's not quite dried that might prevent good adhesion when the tape is applied. So, this is what's done with the sites on the absorption arm.

The elimination phase is really initiated on the other arm at four hours, the end of the absorption phase, and at that point every site on the elimination arm will be blotted with the Kimwipes and again dried with the cotton-tipped applicator, and then stripped twice to remove unabsorbed drug. That's done on all sites. Then of course later at 8, 12, and 24 hours, paired sites will be stripped, this time only 20 times because we've already taken the first two strips to remove unabsorbed drug on the surface.

Strips 1 and 2 in all cases are discarded, as are the Kimwipes and the cotton-tipped swab. We pool the next 10 strips. They're extracted in acetonitrile and analyzed by HPLC. Then likewise the last 10 strips are also pooled separately, extracted, and analyzed. We have a validated HPLC assay for tretinoin and its isomer, the isotretinoin.

All the data I will be presenting are the two isomers' sums. I won't be presenting individual data.

And, of course, all these procedures take place under dim yellow light to prevent as much isomerization as we can.

Here is the data presented first by strip sets, so I'm presenting what the first pool of 10 strips look like, and then the data deeper in the stratum corneum, the second 10 strips. You probably can't read this up here, but in fact red represents the test product, the Avita product, showing higher stratum corneum levels, both in the first 10 strips and in the second 10 strips. Then the Retin-A product, which is shown in the dark line. So, that is the first 10 strips versus the second 10 strips.

If we just sum all this data and present the total of what's found in 10 strips, that's presented here. So, there is good separation between the two products. Clearly they're not behaving in the same manner, but in contrast to what Lynn was finding, we're finding just the reverse. We're finding higher drug levels in the stratum corneum than the Retin-A product.

Now, we were concerned when we got these results, and we thought obviously we had switched the tubes and we had mislabeled, so we wanted to get a quick repeat. Just a week ago we finished a second study. It's kind of a half study. It was done just in 18 subjects, and we looked only at the absorption phase because the differences that we had seen in the first study really took place in the absorption phase. Everything was basically conducted the same way, with that exception that we didn't do an elimination phase. We're still collecting at half, 1, 2, and 4 hours.

The other thing that was done different, we wondered about different response of the tapes. We used Transpore in the first study. So, in this study, in lieu of an elimination phase, we did one arm with D-Squame, and the other arm with Transpore to see what differences the two types might produce. Otherwise, the procedures were essentially as I reported in study number one.

In essence, we duplicated the work of the first study. We still found higher stratum corneum contents for the two tretinoin isomers for the Avita product as compared to the Retin-A. This is using D-Squame here and Transpore here. So, what's presented are the data from the first 10 tape strips for D-Squame and for Transpore. So, differences between products are found with both tapes, but we seem to get much greater recoveries of drug with D-Squame.

If we look at the second 10 strips, we see a similar behavior, much greater recovery with D-Squame than Transpore. Clear separation, actually broader separation in the second 10 strips between the two test and reference products. I should say that in terms of statistical analysis we did find that the test and reference products were different. So, tape stripping here has clearly been able to show that there's a difference between the two products.

I'm a little puzzled about why they're in different directions than what Dr. Pershing presented, but I've been assured by Dale Conner that we may have a good hypothesis coming shortly, so there may be a good explanation for this.

I wanted to use the rest of the time just to present work that had been alluded to, and actually I think partially presented previously at one of the prior meetings, just to suggest that as we look at techniques for proving bioequivalence of topical drug products, to be aware of some of the other techniques that are and have been used, and I'll mention a couple of them specifically here.

There's a number of well accepted techniques available to confirm the bioequivalence of topical products, and the two I'm going to talk about is the human cadaver skin assay, the assay that's actually used very frequently by the majority of pharmaceutical companies to develop topical products. In the pre-clinical phase of development, the screening of different formulations often involve the use of cadaver skin, so it's a well grained model that is widely used and has a long history going back well over 30 years. Of course, as you would expect, when it comes to transdermal devices, this is a critical factor in the development of them. So, I'll just show some data on cadaver skin assay, and then a specific assay for retinoids.

I'm calling it here the transepidermal water loss assay. It's just a variation of another widely used test in the industry, the 21-day cumulative irritation assay. In probably most, if not all, NDA submissions for topical products, irritation data is submitted. It may be animal but in many cases human data is also submitted and the test most frequently used is what's called a 21-day cumulative irritation assay, a subchronic assay. The variation is here we're not just looking at irritation, redness and scaling, but we're also monitoring another endpoint, which is transepidermal water loss through the skin.

Now, the reason I'm presenting this data is that this work was actually done for Spear Pharmaceuticals, and it is to show bioequivalence by these surrogate tests for two products that have now been shown by clinical assay to be bioequivalent. So, what we're examining here are the two generic products from Spear Pharmaceuticals, the .01 and the .025 gel products, and comparing them to the same strength of the innovator gel products, and our object is to show concordance with the clinical results, which in fact did show bioequivalence.

The human cadaver skin assay is well known, but it's basically using dermatomed skin, and in this case we're looking at skin obtained from 8 different donors. The outer portion of the skin is exposed to ambient conditions, just like exist in this room, and dosing is at the level of 6.25 microliters per square centimeter of each of the four active gel products. We are sampling the dermal receptor solution at these times, ranging from 4 to 48 hours.

In this case, because the amount penetrating is so low and below detection limits by HPLC, we have actually spiked the products with radioactive tretinoin, but proven that the tracer is actually a good tracer for the parent drug itself by doing a rate of release test on the products to show the concordance between the rate of release of the isotope and the cold drug, that the specific activity measured at the beginning of the release is the same as the specific activity measured at the end of release. So, this is a tracer truly behaving as a tracer and not one of the tracers with also problems caused by yttrium exchange.

At the end of 48-hour sample, we're washing the surface of the skin with isopropanol to remove unabsorbed drug. We're also separating the skin at epidermis and dermis and digesting and analyzing for radioactive content, so it's basically a mass balance study. The primary endpoints are AUC and maximum flux. We also look at time of maximum flux, and we have some secondary endpoints too that are based on dermal-epidermal content and mass balance.

The transepidermal water loss is a much different study. The first one is in vitro; the second one is in vivo, using normal subjects, again using the forearms. What happens is that small amounts, really the same dose that was used in the tape stripping studies, are applied to demarcated sites on the ventral forearms. A daily application is made for 20 days, so every day the subjects come back, the sites are evaluated for redness and scaling and a measurement taken of transepidermal water loss with an instrument called an evaporimeter, and then after all those readings are done, the drug is reapplied.

In order to make the forearm skin a little bit more like the face, which is the normal site of application for retinoids, we actually apply Saran Wrap for 5 hours to enhance the absorption of these retinoids. As I mentioned, at each study visit, prior to the next dose, we're measuring transepidermal water loss and then we're grading the skins for erythema, but mostly for peeling, which has turned out to be the best endpoint.

So, again, we have two primary variables upon which to do our statistical assay. One is the maximum value for transepidermal water loss that's achieved, and the second one is the days to full peel. Because this was a placebo-controlled trial that only went for 21 days, we basically had to assign a value of 25 days to any sites that didn't peel by 21. So, these are the two tests, basically, that we are going to be looking at.

I should say in general the second test, although it's really a chronic irritation test, it's based on the fact that retinoids alter the differentiation of the skin, and when they do that, they change the barrier properties of skin. The primary function of the barrier is to keep water in, so as one alters the barrier-to-water loss, we see an increase in water loss through the skin. So, that's sort of the physiology behind this pharmacodynamic assay.

Let's look first at transepidermal water loss. What happens is that during the first week of this subchronic application, one sees basically no change in the skin, one measures no change in transepidermal water loss, but as one gets into the second week and the third week, then one begins to see changes both in how the skin behaves, the peeling, and one also begins to see changes in transepidermal water loss.

So, if you look here, for instance, at the .025 percent gel comparing the Spear and the Retin-A product, we find that for the value for maximum transepidermal water loss, which normally would be around 4, but because we subtracted the placebo response, it really is 0. So, we're going from 0 up here to 12. The Spear product is showing 12.3, with the usual large standard deviation we see with skin studies, and the Retin-A showing 12.1, and again the large standard deviation that we normally see. But good agreement in terms of maximal transepidermal water loss.

Likewise, when you go to the lower strength product, you see good agreement: 7.8 versus 8.2. What you also see is dose response, that these values here are different from these values here. So, the low-strength product is producing less of an effect than the high-strength product.

Likewise, when you go to days to full peel, the high-strength gel are taking on average about 18 days. Standard deviation is less in this case. And with the low-strength product -- the study only goes for 21 days, so what you're seeing is that very few sites on the low-strength gel are actually going all the way to peeling, and we have a lot of 25 days being added in here to give us a value that's greater than 21. But we also see good agreement between the two test products at that concentration, and again we see dose response, a differentiation between the low strength and the high strength.

In terms of statistics, if we look at the low strength and look at the confidence interval for the ratio of the log transformed data, we see that they fall very nicely within the 80 to 125 parameters, both for transepidermal water loss and for days to full peel. Likewise, for the high-strength product, we see the confidence intervals fall between 80 to 125 for both of those primary parameters. So, as another test for consideration of testing of bioequivalence of retinoids, this certainly is one that, in this one test where we have clinical data to compare it to, has held up rather nicely.

When we looked at the cadaver skin data, we also found good agreement with the clinical results, and we were able to show bioequivalence. Showing the rate data for the low-strength product and the high-strength product, I can't even read myself. This I believe is the test and this the reference product for the low strength, and what we're looking at here is rate of absorption as a function of time. This study going out over 48 hours. Even better agreement with the high-strength product between test and reference. It's really difficult to tell the difference. These here represent the standard error error bars.

We'll just look at the numbers for the confidence intervals. Looking at the low-strength product, the two primary parameters were AUC and here maximum flux. And again, if we look at the confidence intervals, 97 to 107, 92 to 115, well within the 80-125. If we look at all the secondary parameters with the exception of the dermal content, we also find that they fall within the 80-125. Only the dermal content for this strength and the next strength, if we could get the next one up, are the ones that don't fall within the 80-125.

But if you look at the two primary parameters, AUC and maximum flux, you see 95 to 110, 95 to 127, close enough for me. I'm sure Dale would agree with that. FDA is very flexible in this regard.

(Laughter.)

DR. FRANZ: And again, only the dermal numbers not falling within the 80-125.

So, there are other tests in addition to the tape stripping DPK method, and I think what's nice about the skin is we do have a lot of tests that are available to us for consideration. As we pointed out many times, one nice thing here is that these tests and references are compared side by side at the same time on the same subject, so there's tremendous advantages over what the people with oral bioequivalence have to do. And with that I will stop.

DR. LEE: Thank you, Tom. I think that we are behind a little bit. Are there any burning questions, just one or two?

DR. VENITZ: I just wanted to make sure that I can compare the two studies. You're comparing Avita to Retin-A. How does that compare to Dr. Pershing's A, B, and C, because you mentioned that you get a discrepant result.

DR. FRANZ: Well, the same products were compared in both studies. We just used two of the three that were in her study. The Avita and Retin-A were in her study as they were in our study. Her data found the Avita content to be lower than Retin-A. We found the Avita stratum corneum content to be greater, so there's obviously some methodological differences. We both found them to be statistically different from the test product.

DR. LEE: We'll come back to that at the discussion period. Marv, did you want to say something?

DR. MEYER: Yes. Tom, you did two separate studies actually. One was smaller than the other one.

DR. FRANZ: Yes.

DR. MEYER: Did you compare, say, the area under the curve from the smaller study to the larger study over the first 4 hours?

DR. FRANZ: I don't have that data in my head, but looking at the y axis, in terms of what the stratum corneum drug content was, it agreed very well. The two tests agreed very well, so we were getting the same amount out when we looked at the Transpore tape. The AUCs, therefore, should be the same, but I don't recall what the data was.

DR. LEE: All right. I would like to invite Dr. Mamata Gokhale from the agency to present her data.

DR. GOKHALE: Good morning, everybody. I'm going to talk about the internal DPK study which we conducted in collaboration with the Division of Product Quality Research. I will start with a short recap.

Currently there are three tretinoin gel products on the market, and of these, Avita and Retin-A were approved as new drugs. The formulation of Avita is different from Retin-A, while the third tretinoin gel product is a generic one by Spear, and it is qualitatively and quantitatively similar to the Retin-A. The Orange Book lists Retin-A as the reference listed drug among these three products.

The earlier two speakers have shown that the DPK approach can be used to determine bioequivalence or bio-inequivalence of these products, and their results correlated with the clinical studies.

So, the question for us is, can the DPK approach be used as a regulatory method?

To address this question, we focused on three issues. The first was, is the skin-stripping technique easily transferable? Are the results reproducible, and are the required time and effort reasonable?

With this in mind, our objective was to determine the feasibility of conducting a DPK study in a new laboratory. I want to emphasize that determination of bioequivalence was not our objective. Our agency had already sponsored a DPK study at the University of Utah, and that was successfully completed. Therefore, we decided to use Utah protocols in our study, and throughout my presentation you will see that the Utah study is used as a reference point.

So, we divided our study in three phases. The phase 1 was conducted with Retin-A, and the purpose was to practice the skin-stripping technique in our laboratory. Phase 2 was also conducted using Retin-A, and the purpose was to determine stratum corneum profiles over time. And phase 3 was conducted using all three products, and the purpose was to compare three gels simultaneously.

So, I'm going to walk you through our study now. I'll address the pre-dosing part of phase 1 first, and in this phase we focused on two variables. First was the stratum corneum weights, and the second was weights of the Retin-A that was going to be applied to the skin sites.

I will basically describe the skin-stripping procedure, which is basically the same as used in the Utah study. Forearms of the subject were washed and dried and circular areas corresponding to adhesive tape discs were marked on the skin sites using a template, and a stack of 10 adhesive discs was weighed and used to remove successive 10 layers of stratum corneum. Then it was weighed again, and the difference gave us the weight of the stratum corneum layers, which were removed.

When we compared the results of the right arm with the left arm, you can see that there were differences between the two arms in both the studies.

Intra-arm variability of stratum corneum weights was comparable in both the studies.

After we had some idea about the stratum corneum which we are removing, we moved on to monitoring the weight of the dose that we were going to use in our study, and we dispensed the gel using a Hamilton syringe, and the dose was 5 microliters, which weighed around 4 milligrams in our hands, which was comparable with the Utah study.

So, with this data in hand we were ready for dosing. A validated HPLC method was used to quantitate tretinoin and isotretinoin, which were together expressed as total retinoids. And I want to also mention that we used internal standard in our assay as recommended by the agency's guidance for bioanalytical method development.

Now, here the drug was applied and left there for 2 hours. At the end of 2 hours, residual drug was removed using cotton swabs and stratum corneum layers were harvested.

You are looking at related drug disposition across stratum corneum in terms of percent of applied dose. The reason we did this was because disagreement between different investigators as to how deep down the skin stripping should be continued. So, we harvested stratum corneum layers in three sets. Strip number 1, strips 2 to 10, and strips 11 to 20.

As you can see, the first strip contains way too much excess compared to the other layers, and this told us that strip number 1 contains the residual drug product, and it made sense to discard it.

Now, if you look at the total retinoid concentrations in strips 2 to 10 versus strips 11 to 20, there is a gradient. And this told us that it was sufficient to harvest strips number 2 to 10 during our profile studies, and going further down really didn't have any particular advantage. I want to point out that this data correlated well with the Utah study.

At this stage, we were ready to look at the stratum corneum profiles of total retinoids over time.

I should also point out about our recoveries. You can see that if you compare recovery after total tretinoins from the right arm verus the left arm, there isn't much difference between the two arms, and that trend was seen even in the Utah study. However, I want to point out that our stratum corneum concentrations were higher. They are twice as high as those obtained in the Utah study.

Intra-arm variabilities were high in both the studies, and this seems to be the inherent nature of this technique.

I'm going to talk about the stratum corneum profiles. This is a time course study. Retin-A was applied at 0 hour. The skin stripping procedure was the same as I described. We discarded the first strip, harvested only strips 2 to 10. The residual drug was removed at 1.5 hours, and sampling was continued further, up to 10.5 hours.

You are looking at this top curve shown in red. That shows our initial effort. You can see that the total retinoid levels were higher than those obtained in the Utah study. Also, we did not obtain a good elimination phase.

However, with practice we got better and after using about 5 subjects in our study, we were able to obtain stratum corneum profiles which were superimposable with the Utah study.

What I want to point out is, from red to dark blue shows the progression of our learning curve. We got better with practice and with more experience. We also got more comfortable with the skin stripping technique.

So, at this stage we are ready to move on to phase three, which involved comparison of three different products simultaneously. How did we do?

Again, each drug was applied at 0 hour. Sampling was started. Residual drug was removed at 1.5 hours and sampling was continued up to 10.5 hours for each drug. Now, note that here we had to triple our effort because we needed skin site corresponding to each of these products at each sampling point, which meant accommodating about 27 sites on both arms.

So, we had only 2 subjects in our study. In our hands concentrations of Avita were lower compared to Retin-A and Spear products, and the question is, how did we compare with Utah?

This is the Utah study which you saw earlier. They had 49 subjects in their study. Their profiles were well defined. Differences or similarities were clear, and concentrations of Avita were lower than Retin-A and Spear products.

Now, I want to emphasize again the purpose of Utah study was determination of bioequivalence or inequivalence, while the purpose of our study was to determine the feasibility of conducting a DPK study in a new laboratory. I think even with two subjects we were able to achieve that goal.

So, when we compared the DPK parameters, you can see that with respect to Cmax there were some differences for each product between the two studies. Tmax for Avita was delayed compared to the two products in our study. And the AUC was lower compared to Retin-A and Spear in our study, which compared very well with the results seen in the Utah study.

Can we draw any inference with this limited number of subjects? Yes. We did see a trend in our study that was similar to the Utah study. You can see that at 1 hour stratum corneum concentrations of Retin-A and Spear products were higher than those of Avita, and the same was the case for AUCs where they are higher for Retin-A and Spear compared to Avita. And this correlated again well with the Utah study. So, you can see here that differences or similarities in the formulations can relate with the DPK parameter such as AUC.

I also want to touch upon the time line. It took us about 1 month to go from phase 1 to phase 2, and another month to go from phase 2 to phase 3. I want to point out that at phase 1 we were totally inexperienced. We were uncomfortable with the skin stripping technique. However, with practice we gained more experience, and at this point we were comfortable with the skin stripping technique. We were confident about our results.

So, to conclude, the DPK approach has the potential to detect formulation differences in topical products. The skin stripping methodology can be easily transferred, and results are reproducible, and finally, the time and resources needed for the transfer of skin stripping methodology and the DPK follow-up are reasonable.

I want to end with acknowledgments. The team from the Office of Testing and Research. Robbe Lyon was responsible for the protocols. Everett Jefferson and Bob Hunt were responsible for the HPLC analysis. Everett also helped me with the skin stripping, and he was pretty good at it. Tapash Ghosh from OCPB also helped us in skin stripping. And I want to acknowledge Dale Conner and Barbara Davit for their support while I was working on this project. Last but not the least, I also want to thank Dr. Pershing for useful discussions, and finally, thank you all for your attention.

DR. LEE: Thank you, Mamata. We have time maybe for a couple of questions.

DR. MEYER: I noticed on your phase 1 post-dose, was this early on while you were just learning? Because your variability is tremendous, certainly compared to Utah. You have CVs of 100 percent, 50 percent, whereas Utah is down below 10 percent for stripping 2 through 10, for example, and you're bearing all that variability in a single number that comes out as a subject value. So, you hide a lot of the variability when you take the sum of all nine strips. Was this an early-on study?

DR. GOKHALE: Yes. In fact, the very reason I put it up was to show how did we do, even in the beginning.

DR. MEYER: How did you do at the end in terms of variability?

DR. GOKHALE: How did I do at the end? Okay.

DR. MEYER: I mean, in terms of variability. If you compared Utah.

DR. GOKHALE: Well, our variability was reduced. It was manageable. I didn't have enough time to put all the numbers, but I can say that we had a better control over the variability.

DR. MEYER: But as good as Utah?

DR. GOKHALE: Yes. Within 20 percent.

DR. LEE: Are you satisfied, Marv?

DR. MEYER: No.

(Laughter.)

DR. MEYER: But I received the answer I wanted.

DR. LEE: Mamata, I just want to get a sense for the timing of these experiments. Did you do yours after the results of the two studies were made available to you?

DR. GOKHALE: No. Actually we started after Dr. Pershing's study was completed, and I think Dr. Franz's study was ongoing at the time.

DR. LEE: Thank you. Okay, thank you very much.

Dale?

We have two committee members who are supposed to be calling in but they are not here yet.

DR. CONNER: I'm just going to show a few slides to start off the discussion period.

As you've seen from a previous couple of studies, we've had two very experienced investigators in laboratories come in and present data, as well as our own attempts to get one of the versions of this method up in our lab. When I saw the results from at least the two expert laboratories, I was, to say the least, a little perplexed. Perhaps even alarmed would be a closer description because although the results of both studies showed what we expected, the two NDA products, which we knew from previous clinical work were not clinically equivalent, they showed them different, the results went in different directions. The fact that they showed a difference didn't comfort me in the fact that they got such different directions.

Dr. Pershing's Utah data seemed to go in the direction we expected from the clinical results, and Dr. Franz's didn't. They went in the opposite direction. So, I was a little concerned. Of course, the group in the FDA was scratching our heads as to what could have possibly caused this. We've had some discussions with Dr. Franz and Dr. Pershing about what could possibly have accounted for these such dramatic differences.

It's important to note or to go over what has already been pointed out, and this is our hypothesis. And we did a little quick and dirty experiment that may illustrate this.

We think that a very, very important factor in explaining this is the actual stripping technique that both of these investigators used, because as they both described, it's somewhat different. The Utah study tends to strip pretty much the exact area of application. They don't go outside that area to any great degree. Dr. Franz, on the other hand, has chosen to apply to a somewhat larger area, but also strip a considerably larger area around the area. How could this possibly affect the results?

Now, first off, before I show this kind of quick and dirty experiment, whenever I name specific products and seem to be relating their characteristics, I always get angry calls from manufacturers saying I've somehow maligned their product or something, and I want to add a disclaimer that the following results or illustration is in no way a comment on the quality or appropriateness of any product that we happen to mention. This is simply an illustration of different properties, and I'm not saying whether those are good or bad properties in clinical sense. So, I'm not trying to malign anyone's product.

The laboratory, the OTR folks, did a little experiment that goes as follows. They took filter paper and they put 10 microliters of each of these three products, Avita, Retin-A and tretinoin.

And I'd like to point out that Avita and Retin-A are separate NDA approved products through our previous work, and what you've heard today, you realize that they are not considered bioequivalent based on clinical grounds, and each of them was approved under its own NDA with its own labeling. The third, the tretinoin gel from Spear is an ANDA approved product that was approved on the basis of equivalence to the Retin-A reference product. So, that's important to point out.

Our expectation going into this, from all the data we have as far as clinical data, was that the Retin-A and the Spear tretinoin would come out equivalent, and we did not expect the Avita to come out equivalent.

But how could differences in these products actually be affected by differences in stripping methodology? Now, the hypothesis is that as has been stated, the Retin-A and Spear are Q1 and Q2, where the Avita is not. Perhaps they had some different properties that would account for why you got more drug in Dr. Franz's technique of stripping and not in the Utah study.

This is a very, very crude illustration of different properties. As I said, it's not supposed to malign anyone's product or say one is superior or inferior.

But the lab in OTR in FDA just put out 10 microliters on filter paper. As you can see, these are the two circles of the Spear and Retin-A products and this is the Avita. Now, first off, time points. They took pictures of these with a scale down below -- and this was drawn by the person that took the pictures -- to try and say, do these have different physical properties that would account for differences that would be picked up by the two methods. As you see, the comment of the person who did this was that the air in the lab at the time was very dry and of course you're putting out filter paper, which isn't really skin. So, that's why this is very crude.

But already the Retin-A and Spear, which are the equivalent products and Q1 and Q2, seem to be a little dried out. They seem to be staying in the same place, whereas the Avita is starting to spread around the edges.

We look at it at 15 minutes. These look pretty darned dry to me, and still within their original application period. They haven't spread out. Yet, the Avita is still spreading. If you paid attention to the scale, the circle is now a little bit bigger than it was at 2 minutes. If we went to 30 minutes, again it gets a little fuzzy around the edges, but it's larger yet in its circumference. These are still the same. Kind of thoroughly dried by now. And finally again, even harder to see, it isn't picked up well by the camera, but even the spot is spread a little bit further.

This is obviously filter paper. On the skin it may spread more or less, but it shows a difference in physical properties of these two non-Q and Q products. Think, if you will, the hypothesis might be that because the Avita spreads out into the surrounding area beyond the application point, that Dr. Franz's stripping method of taking extra area around may actually pick up more drug than the method that's performed at Utah, which simply strips the actual application area. So, that may account for why the exposure of drug and the eventual absorption into that stratum corneum that's harvested is different for the two methods. Just something to think about, when you're trying to figure this out as we did, why two investigators with somewhat different techniques got different results, you may want to take this into account.

DR. LEE: I don't know whether it would be appropriate to ask you this question. What is the action item?

DR. CONNER: This is an update on available data. We've had previous advisory committee meetings on this. Trying to get some kind of read by the committee now with the updated data about where we should go with this technique. When I look at the various results from different very expert laboratories, I'm inclined to say that there may be some doubts in committee members' minds about whether we should continue with this. We have a draft guidance out. Should we continue that draft guidance or should we pull back and perhaps reassess where we're going with this technique, and perhaps look at some of the other techniques, of which Dr. Franz just mentioned a few, perhaps in place of this.

The committees in the past -- and you saw from my slides we've had a number of committee meetings on this -- have come up with some feeling based on existing data about whether we should continue, whether we should pull back or whether we should just totally abandon this and try to develop some other techniques to potentially replace it, or to perhaps do better than we are doing with just clinical equivalence trials. Or perhaps to say the clinical equivalence trials are probably the best that we could do.

DR. LEE: It seems we are kind of doing a skin stripping experiment in this committee. Never mind.

(Laughter.)

DR. LEE: Anyway, there are three issues before the committee. Yes?

DR. WILKIN: I think just to amplify from the agency's perspective Dr. Conner's comments, at the last joint meeting of the two committees, the Pharmaceutical Sciences Advisory Committee and the other committee that is very much interested in this, the Dermatologic and Ophthalmic Drugs Advisory Committee, that Dr. Hussain mentioned at the last meeting, and Helen Winkle confirmed in a discussion while the meeting was going on, that ultimately before actually moving ahead with DPK and sort of confirming it as the method, that we would want to have a joint committee meeting with all of the data. I know that the standard for all of the data means that you would have more than you had this time, which were just a couple of abstracts.

I think really the intent here is to get a read on whether this really is where we need to go with it. I think Dr. Franz very eloquently pointed out that we may have been overlooking some opportunity costs in not really considering alternative methodologies. Dr. Conner pointed out that we have been with this from 1989. I didn't realize it was 1989. It's painful to hear that it was even earlier than what I thought.

DR. CONNER: Makes you feel kind of old, doesn't it?

DR. WILKIN: It makes me definitely feel old.

You know, I think it's really whether we want to devote the resources to having the really full, complete joint committee meeting that has all of the data, and we go through this, or whether you think this is the time to pull the plug and move on to other methodologies.

DR. HUSSAIN: I agree with what Jonathan just mentioned. At the joint committee we had -- and many of you were there -- a very negative reaction to this method from the Derm Committee members, and the phrase "You're beating a dead horse" was used, and it's in the minutes.

One of the objectives here is to have this committee discussion and see whether it's worth even going to the next step of a joint committee meeting. I don't want to keep going to that committee and getting those type of comments back.

DR. LEE: That's why I said that we have the experience of a skin stripping experiment. We're down to the 10th stripping.

DR. WILKIN: Maybe a clarification. I think the message that came out of that last joint advisory committee was a very negative message, but I don't think it came solely from the dermatologists. I recall Dr. Venitz and I actually looked up rapidly here where he indicates that I think in the final analysis, and probably going to agree with those clinicians -- I think there were other members of the PSAC that were seeing the same sort of thing. I don't really think this is a dermatologic clinical viewpoint versus a more data-driven kind of clinical pharmacology kind of view.

I don't feel any schizophrenia. It works both in dermatology and clinical pharmacology, and I have to say that I think this all fits together. I think I hear both committees essentially saying the same sort of thing.

DR. LEE: Very well. Now the stage is set for the discussion. We have until 10:45 to come to some kind of a solution. I think the agency does look to this committee for some guidance, and there are three issues. I invited Dr. Doull to, more or less, lead the discussion.

DR. DOULL: Mr. Chairman, let me start off by asking Dr. Pershing. You had in one of your figures a biphasic curve where you were talking about the absorption, topical drug delivery it's entitled. You were talking about the absorption through the stratum corneum and down through the other layers. That first thin layer there, for this particular drug how many strips do you think that would be?

DR. PERSHING: You always get a concentration gradient through the skin with any drug, in any vehicle. It's just the steepness of that concentration gradient. One of the important aspects in DPK is to make sure you've adequately removed residual drugs so you're actually capturing that concentration gradient.

What was the other question about that?

DR. DOULL: As I read through this guidance, it seems to me there are a number of questions which are difficult for me to answer. One of them is the predictive value of this whole procedure in predicting the clinical effect, say, of an antifungal or whatever. They're not really dealt with a lot in the guidance. You need some kind of assurance in there that this procedure, if you do it, will in fact give you the right answer.

I looked back at what we said in the July meeting and I don't find a lot of assurance there either. I guess what concerns me is how do you really know, when you go from drug to drug and person to person, and if you did this on the back or if you did it in a child -- and we talked last night about diseased skin. How do you really know that this system is really giving you the right answer?

DR. PERSHING: Over the years, the different advisory committees have consistently challenged the FDA on the guidance in that take specific examples where there are clinical efficacy trials done and do DPK, show us where you fail a clinical study and show us what DPK with those products do. I think this came from Dr. Wilkin actually who was one of the proponents of this even within the FDA. And take one where the clinical studies pass but DPK fails, and keep going back and forth.

We have done those, and in every successive advisory committee we've shown that data. We've shown it in five different drug classes, and now this is the final coup de grace, so to speak, because the FDA had a very specific example where similar and dissimilar products, approved by ANDA versus NDA methods, where DPK could be evaluated. According to Dr. Lamborn's comment, I think it's very appropriate that it should be evaluated in products that have been evaluated in a clinical study, and we've done that.

I've also done it in diseased skin. I've also done it in a psoriasis study, where if you compare one elbow to the other with a generic versus a reference product and you follow the clinical efficacy, would it agree? And the answer is yes. I've done it in tinea pedis. Does it work? Yes, even in diseased skin.

DR. DOULL: If you did it on the upper forearm or on the back?

DR. PERSHING: Well, in tinea pedis, of course, we used the plantar surface of the foot to collect DPK from the diseased site. In psoriasis, we used the elbows. So, it does work, in diseased skin, in healthy skin.

I would caution you about diseased skin. Diseased skin is not an effective barrier, and it will diminish differences between drug products where healthy skin will not. When you have an active, ongoing disease, you do not have a discriminating barrier.

Second point. There are many drug classes for which there are no pharmacodynamic surrogate markers. Antivirals are a case in point. The clinicians can't even agree what the clinical endpoints should be. What do you do in those cases? You have a self-resolving disease, herpes simplex virus disease. They're self-resolving in cold sores in two weeks. What are you going to do there? They're difficult, difficult studies to perform.

Do you want to do a 21-day study? If you think 1 day has variability, imagine 21. Those are issues that have to be considered, I think, and it's a cost/benefit ratio.

DR. DOULL: The other thing. In your studies you compared area under the curve and Cmax and Tmax and so on. I would think for an antifungal, for example, if it's a threshold phenomenon and you don't get above the threshold you're not going to kill those. It's not going to work, no matter how big the AUC is.

I was looking at your data as whether Cmax or area under the curve or Tmax, whatever is most predictive in this system, and I'm not sure I can sort that out from your data.

DR. PERSHING: Both turn out to be important, which is why the FDA still uses both parameters to make their assessment. That's what I've concluded. Extent is very important, and extent will dictate the shape of the curve.

DR. LEE: I'd like to remind us that we are addressing these three issues on the screen.

I understand there's a member on the phone. Who's that? Somebody called in but is not hearing us.

Bill?

DR. BARR: It seems to me, as we were evaluating these data, this is very much like we do in any bioequivalence study in the sense that we're looking at blood levels and whether the area under the curve has some clinical relevance may depend upon the drug. What we're really looking at is some measure of how it's getting to the biophage in some way that we can evaluate it comparatively.

What I saw today I thought was two remarkably reproducible within the study site pieces of data, which is as good as we get from most blood level studies, I think. We're looking for some method in which we at least can compare things in a reproducible way.

My question is that the FDA did a very quick study with a very limited number of subjects in order to evaluate the other two studies. It seems to me that the real key in this is just how good you are at this. How long did it take you in order to be able to get reproducible data? Is that a major factor in whether we ought to use this as a means between different laboratories?

It's a little bit like when we, for example, do CACO II cells. We find out whenever you look at one laboratory and compare it to another, the results can be three or four-fold different. But if you standardize it within a laboratory then it becomes reproducible. Is that what we're dealing with here?

DR. PERSHING: I've taught people to perform stratum corneum harvesting or skin stripping in two days.

The important thing here is that you have an immediate feedback system that helps you. In other words, I make them weigh the skin strippings on a sensitive balance, and you'd be amazed at when they're doing this how they learn to do it reproducibly when they weigh it and find out what the immediate result is.

The weight in and of itself is a great way to learn how to collect the stratum corneum harvesting in a very consistent manner. It doesn't require any special tools other than the same person collects every single time point from the same subject in the study. That's very critical. If you're going to have multiple investigators collecting the stratum corneum with these adhesive discs that they have to be validated to be reproducible between themselves as well as within themselves.

I kind of fashion that after collecting blood samples. When phlebotomists first start, they don't have very good quality blood samples either. So, practice always makes perfect. You have to be trained and you have to validate and document that you can do it well.

DR. BARR: It's also a little bit like even the variability we had within in vitro dissolution, which is as about as simple as you can do, but still it was necessary to have some kind of external comparator that we could use to compare laboratories and get results. It seems to me that maybe that's the way that we have to go in something like this, is to find some kind of a comparator that you can use between labs and maybe within the lab with different investigators.

DR. LEE: Yes, I want to give the microphone to Steve and then I want to go around the committee and see what does each member think.

DR. BYRN: Dr. Pershing, do you have -- and maybe this is for everybody, and I know we're kind of out of order -- do you have the slides from Dr. Spear, a set of slides from him? Because he's stating things that are quite a bit -- if you look on page four of those. And I know Dr. Spear is going to speak later, I guess. But what you're saying seems quite a bit different from what he's saying. Maybe you could just explain to us the difference.

For example, he's saying that two top DPK research sites got contradictory results, and we can't comment on other classes of derm drugs. And SS is not rugged.

Then in the next slide he's saying, comparative these clinical trials is difficult to perform, highly variable and insensitive. Spear Pharm has performed four 400-patient clinical trials. Skin stripping is as hard as clinical trials. And then he says clinical trials are the only confirmatory studies with drugs that act below the stratum corneum.

So, I don't know whether you can comment on all those, but there seems to be quite a bit of difference between what you're saying and what he's saying.

DR. PERSHING: First and foremost, I think it's important that both study sites found that the Bertek and the Ortho product were bio-inequivalent. It's very important.

Secondly, about the 400-patient clinical trial. In patient clinical trials we never know what the intrasubject variability is. You can't document precision or accuracy. And an objective method like DPK you can. You know what your intrasubject variability is, and that dictates how many people you have to evaluate to achieve any statistical significance. That's a great benefit.

DR. BYRN: Well, what about the contradictory part? Do you agree that it is contradictory?

DR. PERSHING: I think the key there, to be quite honest, is that it would have been most beneficial in comparing these two studies if all three products had been evaluated. I think what was important is that when your Q1 and Q2 are different, we both saw in these products that they were bio-inequivalent. It would have been very interesting to know how the Spear product performed in Dr. Franz's lab in comparison to the other two products as well.

DR. BYRN: So, you're thinking that we need to do more work, some additional series of studies to answer whether they're truly contradictory or not.

DR. PERSHING: I think that in any study that's done, it would be very beneficial to have two products that have been shown to be similar in a clinical trial and look at DPK. I think it's also interesting to have the opposite view, where they failed a clinical trial and they showed DPK results in the same trend.

I think that was a very important step that Dr. Wilkin encouraged us to do. And I think it proves the point that DPK can discriminate between products that are physically, chemically, or clinically bioequivalent, or aren't equivalent. That's the true test.

How many studies do you need to do that? We've done an antifungal study, an antiviral study, an antibacterial study, and now a retinoid study. What they generally show is that you can pass a clinical study and you might fail DPK. But in this case DPK and clinical trials agree.

Why is that so? It's because clinical endpoints are not always good indicators of bioequivalence. What we have found in our own research is that with topical drugs we generally deliver lots more drug than we actually need to get the effect we desire, and that by delivering too much drug you're on that plateau of response. So, clinical and biological markers often don't differentiate between products.

DR. LEE: I see that Dr. Franz wants to make a point.

DR. FRANZ: Yes. Just one comment. I think it would be important to look at the clinical data on the comparison of Avita to Retin-A because, as I read the summary basis of approval, the statistical section was a nightmare. The results depended on which studies were thrown out. There was a problem. It was a multicenter study but there was one investigator common to two multicenter studies. And for regulatory reasons, not as I understand it scientific reasons, the study was thrown out. Whether the two drugs gave the same answer or different answers depended on which was thrown out.

I think we need to look closely at this gold standard we're using for comparison with DPK. I was hoping perhaps someone here would review the basis upon which we're saying that the Avita is inequivalent.

DR. LEE: I would like to ask the committee members if there are any questions for these three speakers. Also we have accessible to us Dr. King and Dr. Wilkin to provide some advice so we can address these issues in the next 15 minutes. Bill?

DR. JUSKO: When one is doing traditional bioequivalence studies measuring plasma or blood concentrations, the sample that is taken is homogeneous and reasonably representative of all the blood that's circulating. My big concern with what I've seen this morning is the fact that the sampling is so susceptible to not artifacts per se, but all sorts of variation.

I would find more credibility in the sampling technique that takes a larger portion of the tissue. It seems like if you're only focusing on a lesser piece of the tissue and not recovering as much of the drug that was administered, the results shouldn't be as representative of what is really being absorbed.

I think the jury is out on this whole technique. More evaluation needs to be done in regard to sampling issues.

DR. LEE: Art?

DR. KIBBE: I'd like to go back to what Steve first went after, and I still can't let go of, and that's when we do an evaluation of two things. And if I say that Manute Boll is taller than Bugsy Moggs, and you turn around and tell me that Bugsy is taller than Manute, and we both agree they're a different height, I still don't like that outcome. I think there ought to be at least rank order. I would have even been happier if the two studies, one had said they're different and the other had said they're the same but they were in the same rank order, then I would say, okay, what kind of sensitivity problems have we got.

But when the techniques can be used in two different labs doing quite reputable, representative work and come up with absolutely opposite responses, I agree with Bill. The jury is still out in my mind. I don't think we've got a robust test.

I don't necessarily think that because our clinical endpoints are not robust we should go to something else that is also not robust. When we're looking for a product that cures a condition and that's really the ultimate goal, we have to stick with that until we've got something that's very predictive of the product's ability to work in a clinic.

DR. LEE: Kathleen?

DR. LAMBORN: I'm not sure I disagree with the conclusion, but I do think, in making the decision, you need to remember that we're just trying to talk about are they equivalent or are they not equivalent. If you measure two different things, the fact that you get that they disagree, but in one case it goes higher and in one case it goes lower -- in other words, if the results that were shown relative to that filter paper turns out to be the rationale for this difference, then if someone tells me that one product spreads out and ultimately it gets just as much to the site as the other one does, then they may be equivalent in efficacy but by the technical definition of bioequivalence they aren't equivalent because they aren't behaving in exactly the same fashion. So, the fact that the small measure tells me it's less and the bigger measure tells me it's more, I don't think necessarily disqualifies the technique.

Then if we get to some of these other issues, like, well, what if the primary site of action is different, and what's happening here doesn't represent what's happening some place else, then I get a little bit less comfortable.

I think one of the things I was very interested in was the comment that using normal skin would be more sensitive detecting differences than diseased skin, because that's certainly been one of the issues that's come up many times in the past. I'd have to defer to others as to whether they're comfortable that that in fact is a robust statement.

I guess the other thing is that I'd like a clarification. The draft guidance, if it were to be proceeded with, would mean that this would be one of the methods available, stated as a preferred method, and why is it that we haven't been looking at some of these other methods that were mentioned this morning. I'd like to pose that as a question.

DR. LEE: Are you soliciting an answer?

DR. LAMBORN: I think from the agency.

DR. CONNER: The guidance is out as a draft and has been out as a draft for quite a while. Perhaps one of the things that we would look for from the committee is what should be the future of that guidance.

What I've heard a couple of times today, which I at least in part agree with, is that when you look at the results of these two labs, I'm not comforted at all by the fact that the overall answer was that they were not equivalent. I too was disturbed by the fact that they went in different directions. I think we have a hypothesis why, and I suppose you could take that hypothesis and confirm it and say, okay, well, either Dr. Franz or Dr. Pershing should really alter the way they do things to get more realistic or true data. But then again the question has been brought up, what is the gold standard that we're comparing it to.

I guess one of the things the committee should decide, and I heard some doubts about this, what I'm interpreting is that DPK, this technique is not there yet. But the question is, do you see any potential to be ever there, ever acceptable. Or is it something that we should just take a step back and look at other things that might be much more suitable to perform to achieve our goals in this. Is this truly, as a previous committee said, beating a dead horse, and that even if we got a lot more work in on this, we probably never would be much further than we are now?

Should we, say for example, withdraw the guidance? Should we go back into research mode, look at this and everything else and try and bring something forward and develop something that might be a bit more suitable, both from a cost-effective standpoint and being equal to or superior to what we're doing now.

DR. LEE: Ajaz?

DR. HUSSAIN: I just wanted to give some information for Dr. Lamborn's question in terms of will this be the only method or is this one of several methods.

Traditionally in the bioequivalence, world we tend to define one preferred method and stick to that. There are many reasons for that. That has been the traditional approach. At the joint committee meeting, I had proposed the possibility of alternate methods, but in the bioequivalence world, for legal reasons we prefer to have just one method.

DR. LEE: Dr. Wilkin?

DR. WILKIN: I really hadn't heard the proposed difference between the Franz-Pershing methodologies, the explanation for that, until now or the filter paper study.

One attractive hypothesis is the spreadability, but an alternative hypothesis would be that the two that we saw on the right had more volatile components, that those volatiles went off fairly rapidly, and that the active ingredient went out of solution. We have to remember that the active has to be in solution before it's going to be in that thermodynamic gradient that is going to move it across the barrier. I think actually if the volatiles are the key, then maybe the Franz method is showing what really happens.

If I could just follow up with a very brief thing. I think validating a new method actually has three stages, and the first stage is, can you get reproducibility within one laboratory. I think we've seen that in the two laboratories they can reproduce their own method.

The second level of validation is, can someone else in another laboratory, maybe talking with another investigator, follow a recipe and come up with essentially the same kind of output. I think potentially that's what the FDA has shown, that using the exact same protocol that was used in Utah, that you can come up with very similar kinds of results. It's a small n, and there's some question about the variation in the outcome, but I think potentially it's achievable.

The third stage is the part that several of the members of the committee I think have been calling into question, that really hasn't been focused on, and that is, what in the end does it mean? I think in the end you can get through those first two levels of validation, and that only gives you a controlled artifact. You've got something that's kind of reproducible. Everyone can kind of get the same set of numbers. I think that they can get to that stage. The question is, what does the answer mean in the end.

Dr. Conner gave sort of the pro and con. In the past he's been kind of on the supportive side, I think, for DPK. It's nice to see there's enough evolution toward neutrality that he can give both the pro and con sides of this.

I think it was Dr. Jusko raised the point about how well does the stratum corneum conform to our notion of a compartment. One aspect of the oral model, looking at blood AUCs, that just makes it an incredibly powerful predictive model is that it is a well mixed compartment. The stratum corneum is not mixed at all. The second piece that I didn't hear you say but I think you were alluding to, is that the blood is in equilibrium with the target organ, the concentration. That has enormous predictive potential, that model.

In this particular circumstance, we really don't know that there is an equilibrium between what is found in the stratum corneum, which in most of these disease states, as pointed out by Dr. Pershing, doesn't even really exist in diseased skin, the stratum corneum.

So, I think it's that third step. In the end you can get a number. It's probably going to be reproducible from lab A to lab B, but at the end of the day the real question is, does this conform to the grand analogy? Does it really conform to the solid, oral dosage products and the AUC in blood? I think that's still a key unanswered piece.

DR. LEE: Thank you.

Marv?

DR. MEYER: It's very difficult, I think, at least for me, to make a judgment here because I hear different results but I hear different techniques. I think in order for this thing to work we'll have to have three labs perhaps. We have three now: Tom's, Lynn's, and the FDA lab. And the FDA lab would have to do more than two subjects. I'm not convinced that they've shown comparison to Lynn's or Tom's data.

And I think we need more drugs. Lynn says she's done additional drugs but I haven't seen the data personally. I'm sure she's published it and I was too lazy to look it up.

DR. PERSHING: And presented here.

DR. LEE: Anyway. Because the guidance, as I read it, says this guidance applies to antifungal, antiviral, antiacne, antibiotic, corticosteroid, and vaginally applied drugs. All I've heard about is one drug. So, I think it's difficult to say whether this guidance will work.

Maybe one of you could answer. Has someone tried this technique with comparing a 20 percent lower dose? Can you detect a 20 percent difference?

DR. FRANZ: Yes.

DR. LEE: You can. Is it bioequivalent if you have just a --

DR. FRANZ: I think the data that I showed on the transdermal water loss had the two strengths of Retin-A, .01 and .025. We've done similar work with tape stripping. It's easy to differentiate doses.

DR. LEE: 20 percent, though.

DR. FRANZ: I've not gone that low, no.

DR. PERSHING: I've done plus or minus 25 percent. In fact, the guidance request that you demonstrate dose responsiveness of the products you're interested in evaluating, specific to the drug, and actually we do plus or minus 25 percent. Depending upon the drug and depending upon the vehicle, you can either achieve plus or minus 25 percent, or plus or minus 50. It depends on the drug and the vehicle.

DR. MEYER: Does that mean you can tell the difference between plus 25 and minus 25?

DR. PERSHING: Yes.

DR. MEYER: But not 100 percent and 80 percent.

DR. PERSHING: Yes. It depends on the drug and the vehicle. And I'll say some drugs you can tell a difference of plus or minus 25 percent of the marketed formulation. It's dose-responsive.

Some other drugs, however, you can only detect differences between plus or minus 50 percent, and that's because the marketed concentration is already pretty maxed out for what will go on the skin.

In general, DPK is dose-responsive, and I just published an article on triamcinolone acetonide, a corticosteroid, that showed .025, .1, and .5 percent, that DPK is dose-responsive and actually so is the vasoconstriction response.

DR. LEE: Last question.

DR. MEYER: One quick question for the chair. We have a couple of presentations on this topic at 11:00. Are we going to vote before we hear?

DR. LEE: Yes, we are. We're going to express an opinion.

Dale?

DR. CONNER: Just one point. In a way we were fortunate with tretinoin in that we had two NDA products where we actually had some comparative clinical data between them. Most of the products, even when they're multiple NDAs, it's not a question of whether they are or aren't equivalent. It's a question of whether anyone has ever actually studied that. So, one of the reasons why we went to tretinoin in these particular products was because we already had data. Even though it's been criticized here, we actually had some data that set up this clinical gold standard.

A lot of the other topical products, even if they exist as different NDAs, we just don't have the actual data to connect them, either as equivalent or non-equivalent. So, in a way we kind of lucked out with this one in that we had the data.

DR. LEE: Okay. From my perspective the draft guidance was drafted in 1995. So, the context upon which this has been drafted has evolved. I just ask the committee whether or not you are ready to address these three issues. I think that we have to provide some guidance to the agency on what to do with this. John?

DR. DOULL: Well, I think we're a lot closer to being able to answer these three questions than we were in July. I thought we might have a tentative yes for the last question. Even though it's only two people, it's in the right direction.

Clearly this is not sufficiently solid in order to move ahead, I think. Dr. Conner has said we're going to look at an alternate methodology and so on. I guess that means it's an ongoing project and that as we get more information about alternative methods and we develop more of the parameters that have to do with these methods, it will be closer.

So, I would say we're still in the holding situation, although we're better off than we were in July.

DR. LEE: Marv, you were about to offer some direction.

DR. MEYER: Okay. One of the questions really, in terms of getting some additional data, and I think Dale pointed out quite correctly, to me I would be very convinced if I had five drugs, two bioequivalent and one bio-inequivalent of each one, studied in three labs. That would make the vote easy. Unfortunately, we will have to wait quite a while to get that data, I imagine, if we'll ever get it.

So, I think we have a choice of leaving this guidance that is of questionable, broad application on the books or withdrawing it. And at some point in time, with further research, sponsored by whoever, bring it back. There's nothing wrong with taking something back and then getting more data and bringing it forth again, is there? I mean, that's certainly a viable approach.

I just don't hear right now a convincing set of data to allow this thing to continue to linger out there and generate debate.

DR. LEE: Ajaz?

DR. HUSSAIN: Marv, your suggestion of the difference, we probably would not have the clinical studies to back up the decisions. But if the studies are done that show a difference, that DPK is able to pick up 20 percent, whatever difference -- and we already have some data -- but do that across therapy categories and formulations, would that be helpful to you?

DR. MEYER: It certainly would be helpful. There are other issues, like is the stratum corneum an appropriate sampling compartment, and I don't know that, but I hear experts question that as a place to sample. That would need to be resolved also.

DR. LEE: Dr. Wilkin?

DR. WILKIN: I think it's nice to know that you can pick up different concentrations of an active from the same vehicle. I just remind the group that really the key question is, can you detect differences when the active is at the same concentration but you've got different vehicles. So, it's helpful to be able to see different concentrations in the same vehicle. I would say that that's probably necessary information, but probably not sufficient. I think that's another way of saying what you just said, but I would agree with that.

I think at the end of the day we need to know more about differences in vehicles, some of which are Q1-Q2, and others which are not Q1-Q2.

DR. LEE: Yes, Art?

DR. KIBBE: A couple of things. The data we saw today was on a gel, which of course is probably the most homogeneous semi-solid we ever use, and it's as close to a solution that we get in a semi-solid. So, if there was going to be a neat system to work on.

Looking at the three items up there, I think issue one, I would have to change viable to possible. I'm not ready to say it's viable.

Issue two, I still think that the two labs got two different answers, even though the labs say they got the same answer. So, I don't think we've gotten to two, or at least we've demonstrated two.

And then I think Marv is right. We need some more studies to get to issue three.

DR. LEE: Kathleen?

DR. LAMBORN: I would suggest that the answer is that it certainly is not a demonstrated method at this point that's sufficient, and that one of the things that should be considered, one of the things I keep hearing around the table is the variety of different types of topical products that current guidance is applying to. And I would suggest that if it is to be re-thought, perhaps withdrawn and at a future date brought back, perhaps a more focused guidance that would apply to an area that's felt that this technique would be most applicable might be a way to move this back into a procedure. I'm very uncomfortable with this being termed as the method for this full range of techniques. It might be that an incremental approach might help some.

DR. LEE: Jurgen?

DR. VENITZ: Given the history of this, going back over 12 years, and I had the fortune or misfortune of attending the previous meeting a year ago jointly with the Dermatology Committee, my answer to issue one hasn't changed. I don't think this is a viable method. I don't think we can go back and collect the data that we would really need to assess that because that data doesn't just depend on showing bioequivalence or inequivalence in the DPK scenario, but also linking it to the clinical studies. From what I hear you say, and I think the same issue came up last year, for most products we don't really have that endpoint.

So, additional data to assess the technical side of what you're doing right now I don't think would satisfy my issue because my issue is that I can't link what you're testing to clinical endpoints, which presumably we are trying to predict in terms of therapeutic substitution.

DR. LEE: Lemuel.

DR. MOYE: While I don't think the disparate results between the two external labs is the last nail in the coffin of this procedure, I do think it's an important setback. I think that if a procedure is to be viable, it certainly has to be reproducible using the same drug.

Now, the experimental methodology apparently is very complex, perhaps more complex than was initially envisioned. Those inter-experimental methodologic differences are going to have to be worked out, I think, first before we expand the examination and go to different drugs. So, I would say to number one that it is not viable now, and I don't think that we can really address issues two or three until we get the inter-experimental methodology differences worked out.

DR. LEE: Bill?

DR. BARR: I would agree. It seems to me that if we looked at these two studies and found that they agreed, we would all agree that we ought to move ahead with this method. This method has the advantage that it does give us a means of looking at the time course of transport of the drug, which is of course what we do in bioequivalence, and we never usually worry about whether or not at that specific point we can relate that to the clinical efficacy. We usually separate those two and try to state, first of all, we need to know whether or not the transport to some potentially active site would be the same.

What we see is something that's quite different, and we have one reconciling study which has two people that have been used in it, and perhaps a hypothesis of what these differences are. It seems very clear to me that that next step has to be done to reconcile that before we can go on, and I would suggest that the FDA put some resources into that to perhaps look into that in a little bit more detail, to try to at least resolve that issue before we try to make a judgment.

DR. LEE: I would like to invite the committee members who have not yet spoken to express an opinion if they so choose. Judy?

DR. BOEHLERT: I would agree with the comments that have been made. It's not a viable approach at this time. I'm troubled by the discrepancies in results between Dr. Franz's lab and Dr. Pershing's lab. In my mind that lends to the development of a test that will give you the results that you want, that you can manipulate the test to get the results that you want, and that's not what we want for a regulatory guidance. The area of stripping apparently was important here. So, I can design the area of stripping to get me the result that I want, and that's not appropriate for regulatory guidance. So, we need to do some more work on how the test is conducted.

I'm also troubled by the fact that we may not have tested all the different types of dosage forms that are out there, creams and ointments and different delivery systems, some of which the drug is in solution, some of which it's not. And would we see the same results if we looked at all of those diverse systems.

DR. LEE: Gloria, you would like to comment?

DR. ANDERSON: I guess I really don't have anything different than what has been said, other than the fact that early on in your presentation you mentioned the fact that it is not known whether or not the uptake at that site -- and I'm a chemist so I won't try to use these biology words -- that that is the only method of uptake. And given the fact that in these studies, both of these studies I think, the methodology involved wiping off the excess of the cream or the ointment or thte gel or whatever it was, and throwing it away without doing I guess a weight balance. Weight balance was mentioned by someone, but it appears to me that that was not an accurate weight balance because if you wipe something off, if you wipe the excess off with a Kimwipe and throw it away, then you don't know how much is on there. It seems to me like that might give some idea of whether or not the uptake is equivalent to the loss from the patch or whatever it's called.

DR. LEE: Okay, so what I've heard this morning is that the situation is far more complex than we envisioned, and it seems to me that the committee is not comfortable to agree with issue number one as stated. So, is it a plausible method but not a viable method.

And issue number two is, does the DPK approach show an appropriate level of between-lab consistency? Based on what we saw this morning, then the answer is no.

And issue number three is -- it's too long. It should not be so difficult or complex? Well, I think the answer is obvious.

Is the committee comfortable with that summation? If so, thank you very much.

That concludes the first session, and let's say that we propose to have a 5-minute break and come back at about 10 after 11:00. Thank you.

(Recess.)

DR. LEE: In the open public hearing, here are the ground rules. Each presenter is going to have 5 minutes to make a presentation, 1 minute to answer questions, if any. The first two address the issue about the derm guidance, and the last few pertain to the IBE for this afternoon.

So, Dr. Spear is at the podium. And, Dr. Spear, are you ready?

DR. SPEAR: Yes, and I'll keep it to 5 minutes.

DR. LEE: Thank you.

DR. SPEAR: Spear Pharmaceuticals has, as you know, supported the study of Franz and Lehman. The FDA sponsored the skin stripping study of Dr. Pershing, and this was a critical step forward in accepting the draft guidance for all dermatologic drugs. Realizing the importance, I felt that it was important to commission Dr. Franz to perform a similar study at another site so that we can really look at this scientifically.

There's no financial connection between DermTech International and Spear Pharma, and the product was sent blinded to Dr. Franz.

The big issue was, is this test rugged? Will the two top places in the country that perform skin stripping report the same results? And we've already discussed this.

Derm products have various sites of action in skin. Skin stripping is really stratum corneum stripping. It really shouldn't be calling it skin stripping.

Antivirals and antifungals act very superficially. Skin stripping theoretically may be the right test, but there's no available data still today -- and I think that's what the committee is wrestling with -- to confirm that skin stripping is predictive of action below the stratum corneum. For example, anti-acne drugs and corticosteroids. I'm going to keep my comments today regarding tretinoin.

There's still no data on the effect of diseased skin. In dermatology we're dealing with diseased skin states like acne, psoriasis, or eczema where the normal stratum corneum is disturbed. It is a leap of faith to say that how skin stripping behaves on the inner arm of normal skin predicts the effect of drugs in diseased skin, and that's one of the big rubs.

Now, let's look at the two sites here, and we've done this today and I'm going to go very quickly. For a test to be rugged, slight differences in materials or techniques should not really affect the comparative results. Whether or not it's a little bit bigger, a little bit smaller should not really affect that this test is rugged. They really follow the same draft guidance. And we sent Dr. Franz's study to the FDA to review and they actually changed it so the same amount of drug was applied.

Now, comparing Avita gel to Retin-A gel, Dr. Pershing shows lower AUC and lower Cmax, indicating that it absorbs less. Dr. Franz shows it absorbs more. So, the conclusion, my conclusion is that if the two top DPK research sites in the country get contradictory results, the skin stripping methodology is really not adequately developed. The draft guidance seeks to imply the skin stripping test to all dermatologic drugs. We cannot comment on other classes of derm drugs, but in this example, for anti-acne tretinoin, skin stripping is not rugged.

The draft guidance says, "comparative clinical trials are difficult to perform, highly variable and insensitive." We performed four 400-patient clinical trials on tretinoin products showing bioequivalence to the brand. A skin stripping study today is certainly as difficult to perform, and it seems as highly variable and insensitive.

Some claim the draft guidance must be accepted because generics cannot be proved in any other way. Clinical trials can be done and remain the only confirmatory studies for drugs that act below the stratum corneum.

My conclusion is the FDA seeks to lump all dermatologic drugs into one test. However, there is a movement, and I'm listening today, that they're re-evaluating this position.

Remember, the skin is complex and has multiple sites of action, and we believe that one test does not fit all. We suggest the draft guidance be amended to include, maybe at this point a compromise, only stratum corneum drugs.

Other DPK tests should be investigated for the deeper action drugs, like the cadaver skin test, with the data that Dr. Franz showed, and not just close your mind and put all your eggs in one basket with skin stripping.

Now, also we've talked here today about clinical relevancy. I'm going to point out two very important points. First, how does Avita penetrate? Avita is promoted as less irritating, so the nice neat little package was to say that clinically it's less effective. But here is a study in the Journal of Pharmaceutical Science, performed by the company who brought out Avita by Penederm with this poly-polymer that they said that it reduces tretinoin penetration, while enhancing epidermal deposition compared to Retin-A. Enhancing epidermal. So, if you're stripping the epidermis, the stratum corneum should have more. Therefore, with skin stripping you should really have more Avita gel with a higher AUC, consistent with Dr. Franz's results.

Let me also point out, in the Journal of the American Academy of Dermatology, the published clinical results of Dr. Lucky, in 215 patients, they showed no difference in total lesion counts at 12 weeks for Avita gel versus Retin-A gel. That's your clinical relevancy there.

Another point that I'd like to make is I went back in the medical officer review and looked at why the FDA has on there that Avita is less effective than Retin-A. What happened was, there were at two multi-sites, and one of the NDA rules is you must have two independent studies. So, one of their researchers, who was Dr. Jarrett, was dropped from both studies, and he donated 57 percent of the patients and 49 percent of the patients. When his data was included, they were bioequivalent. When his data was dropped out of there, then it showed that Avita was less. Dr. Jarrett was included in Dr. Lucky's publication. So, actually the clinical results show that Avita and Retin-A are actually at 12 weeks the same.

That concludes my comments. Thank you very much.

DR. LEE: Thank you very much. Any questions?

(No response.)

DR. LEE: Thank you.

The next one is Dr. Chris Hendy.

DR. HENDY: Good morning. Thank you for giving me some time to give you a very short presentation.

As Dr. Conner explained, the DPK has been under discussion for some time, and at the last advisory committee meeting, there were some suggestions from the committee that maybe alternative methods may be considered. At Novum, we always do what the FDA tell us to do. We did decide to take a look at some alternative methods, and I would very briefly like to present some of those to the committee today.

21 C.F.R. 320.24 states clearly: "The following in vivo and in vitro approaches in descending order of accuracy, sensitivity and reproducibility are acceptable for determining the bioavailability or bioequivalence of a drug product." It goes on to list a hierarchy, the number one of those which is "an in-vivo test in humans in which the concentration of the active ingredient or active moiety, and, when appropriate its active metabolites, in whole blood, plasma, serum or other appropriate biological fluid is measured as a function of time."

Using a pharmacodynamic method like the vasoconstrictor assay for the corticosteroids is the third in the hierarchy, and in fact the last acceptable method, fourth, is using comparative clinical endpoint studies.

Many topical products are also available in oral formulations with the same indication as the topical formulation. They also cover a wide range of indications. I've listed some examples there. I'm sure the dermatologists amongst you will be able to give me many more, but that is a whole wide list of indications and different types.

The fact that many topical products are also available as oral formulations means that circulating blood levels must be relevant to the safety and efficacy of the product. Many times the site of action is right next to the blood level, as Dr. Conner pointed out this morning. Bioequivalence of the oral formulation would be evaluated by measuring blood concentrations.

The current draft guidance for industry, the one we've been talking about today, confirms the hierarchical requirement of the Code of Federal Regulations with the following rider. "For topical dermatological drug products, PK measurements in blood, plasma, and urine are usually not feasible to document BE because topical dermatological products generally do not produce measurable concentrations in extracutaneous biological fluids."

Since the development of this guidance and the development of new and more sensitive analytical assays, this statement no longer holds true. The following data are examples of a variety of different topical products where the time course of absorption and elimination of the active moiety can be accurately characterized. In all the examples I'm about to show, the amount and method of drug application is consistent with the product labeling. So, we haven't significantly overdosed to get levels. It's consistent. We haven't left it on for longer than the product labeling would recommend.

Unfortunately I can't give the drug names because data has been given to me by some of our sponsors. They don't want me to reveal who they are or the actual drug because it is proprietary information. I can tell you it's an antibacterial. This is a two-way crossover study comparing a test and reference formulation. As you can see, the two curves are quite close to each other. However, this product would not pass bioequivalency 80 to 125 percent confidence intervals.

This is another product. This is another antibacterial. Again you can see we can easily measure the concentration in the skin. This is usually a twice-a-day formulation, and you can see this is following a single application, left on for 12 hours.

This product actually is from a full bioequivalency study. This product does meet confidence intervals according to current FDA guidelines. For anyone who had any doubts about the skin acting as a reservoir, this product was actually removed from the surface of the skin at 4 hours, consistent with the product labeling, and as you can see, the Tmax is not until 8 hours.

This is another product, just comparing a small pilot study, again. This is a different route of formulation, but does qualify as topical, and again you can see we get a nice PK profile.

Many topical products are absorbed to such an extent that the measurement of the active moieties in biological fluids is feasible, as I've demonstrated here with four different products. Many topical products have a site of action such that circulating blood levels are relevant to their efficacy as they're also available in oral or other formulations.

Some topical products are absorbed to such an extent that circulating blood levels could pose potential safety issues. There are several topical products on the market that do have a statement similar to that in their product labeling.

Most topical products have very poor clinical efficacy dose relationships, and I think that's well known.

Clinical efficacy studies are the least sensitive method of determining bio-inequivalent formulations, and our goal must be to make sure that we are not putting bio-inequivalent formulations onto the market.

Evaluating systemic absorption raises the bar for the generic formulation, as it's the most sensitive in determining a bio-inequivalent product than any other of the current methodologies.

And I would suggest that using a pharmacokinetic approach, as demonstrated here, is consistent with the requirements of 21 C.F.R. 340.21.

DR. LEE: Thank you.

Any questions? Dr. Wilkin?

DR. WILKIN: Well, we had quotes from the C.F.R. but we didn't actually have all of the quotes in the section that I think adds some context to this. The part about the in vivo test in humans, in which "the concentration of the active ingredient or active moiety, and, when appropriate its active metabolites in whole blood, plasma, and serum" -- and that's the part that you quoted. But the sentence that follows that I think is also important to the understanding. It says: "This approach is particularly applicable to dosage forms intended to deliver the active moiety to the bloodstream for systemic distribution within the body."

Then if you go on further down in the same section, it speaks to the clinical studies and it says: "This approach may be considered sufficiently accurate for determining the bioavailability or bioequivalence of dosage forms intended to deliver the active moiety locally." Topical preparations to the skin. It gives some other examples.

Just to clarify that the C.F.R. I think has a somewhat slightly different view on that.

On the other hand, I would say this is an exciting thing to think about, that the limits of detection with newer technologies have gotten to the level where now we could look at blood AUCs.

I think we have to remember, though, that it's how the drug is distributed to the active site in the skin, which is a very heterogeneous organ. There are a lot of different active sites, and will the vehicle send the active down the follicle, or will it go through the epidermis between the follicle? So, it actually could end up in the blood at the same rate in the end, but it might get there by different pathways. On the one hand, it might bypass the critical place in the skin where we're really ultimately interested in rate and extent, but nonetheless it's certainly an interesting thought.

DR. LEE: One minute to answer.

DR. HENDY: I absolutely agree with Dr. Wilkin's comment. But obviously a number of the products we've put up here do not act in the stratum corneum. We know that from their pharmacology, and they are going to be working a lot closer to circulating blood levels, and one would assume that there is some kind of homogeneous area there. But I'm not a dermatologist so I really can't go on further than that. But I do think it's a methodology that maybe we should be looking at as an alternative to some of those that are being suggested.

DR. LEE: Thank you very much.

Okay, we now move into the IBE positions. Dr. Sondhi?

DR. SONDHI: This paper has been submitted for publication, so that's why I think you didn't get copies of these in the handout.

What we are trying to show is that you can get a probability distribution of the bioequivalent metric, and I just wanted to show that.

The metric was defined by Hyslop as follows. You see that on the viewgraph there. P equals mu T minus mu R squared, et cetera, where the mu's are the means of the pharmacokinetic parameter for the test and reference products. The sigma's are the test and reference variances within subject, and sigma i squared is the variance of the difference in the means.

For sample values of the test and reference means and variances, and the X's and S's that I've shown there, for those sample values you can get an estimate of phi, which is now a random variable.

Now, the problem is, of course, to find the 95th percentile of the probability distribution of the phi hat and accept bioequivalence if the value is below the FDA-specified value.

Hyslop, et al. found the upper 95 percent confidence interval of a linearized version of this metric. Instead, what we are proposing is that it's also possible to, in fact, get the entire probability distribution function, or an estimate of that probability distribution function, whose interval then gives us the cumulative distribution. If the 95 percent point of the cumulative distribution is below the FDA-defined value, we accept bioequivalence.

Now, the probability distribution of phi hat can be determined if the joint distribution of all those variables that I've shown there is known, but of course, in general, it would be a very formidable task. However, under the usual assumptions of statistical independence of these variables, the computation is quite feasible. And that's the purpose of this paper, to show that it's quite feasible.

I might say, of course, you get an approximation to the probability distribution because we substitute sample values of the means and variances, since the actual values, of course, are not known.

Just to make the notation simpler, I just gave names to these parameters. Xt minus Xr is Y, and Si squared is Z and so on. So, if you write it in this way, all we can say on the bottom is that the metric phi is just this ratio of G over V minus 1.5.

I obviously won't give you the derivation of this probability distribution, but just tell you the steps involved. If you assume the Xt and Xr to be independent, then you need a formula to find the sum of the square of the difference Xt minus Xr. And that's a known formula which one can use.

Then you can compute the PDF of W, which is the sum of two random variables, independent random variables, and that we know how to do.

Then the distribution of G we do the same way because it's the sum of two other independent variables.

Then the ratio G over V, we need a formula for finding the distribution of the ratio of two independent variables, and that's fairly well known.

With these few steps, one can then get the probability distribution of the metric phi.

I've written a program for this, and once the program is written it, of course, runs in a few seconds, so I'll just give you two graphs showing the examples of using this method.

Here is a comparison of results by the two methods, Hyslop's and the one that we are proposing here. I'm showing just two cases of situations where the pass-fail is right at the boundary. In other words, they're very sensitive measurements. So, you can see that in all of these cases the pass-fail was exactly the same for Hyslop's and with us. Very rarely do we find any difference in the decision.

This is not a very good graph because the action is taking place only on the first inch or so of it, but this is a plot of the entire cumulative distribution for a particular set of parameters.

DR. LEE: Questions for Dr. Sondhi?

(No response.)

DR. LEE: If not, thank you very much.

Professor Endrenyi?

DR. ENDRENYI: First, I'm grateful for the opportunity to be able to be here. CDER has known that I haven't always agreed, and I think it is very gracious. But I still don't agree.

The first suggestion is that individual bioequivlaence in practice has unfavorable properties, and I underline "in practice." The acceptance or rejection of individual bioequivalence can be due to random chance alone.

To demonstrate, in the model of individual bioequivalence, as you are going to see this afternoon, an important term is the difference of within-subject variance of the test and reference products.

Another term is the difference between means, and the question is how these two terms play against each other. That's called mean variance tradeoff.

Generally speaking, there is reward if the variance within-subject variation of the test formulation is smaller than the within-subject variation of the reference formulation, under this condition. Then in that case the test formulation is better. So, in contrast, that would be a penalty if the test formulation has a higher variation than the reference formulation. This is what the model says.

But in practice, when it comes to estimated variations, if the true variations of the two formulations are identical, then it makes sense that in practice the test formulation and variation can be higher, estimated variation, or lower than that of the reference formulation and actually that these conditions can occur with equal probability.

This is what this slide demonstrates for actual data which the FDA collected by '99. By now they have additional data. You see here that the reward condition and penalty condition occur with about equal frequency. Furthermore, large rewards and large penalties also can occur with fairly high frequency and usually with equal frequency, or similar frequency.

This follows, then, theoretical considerations, and the consequence is that the acceptance and rejection of individual bioequivalence can be due to random chances alone.

Turning to higher variable drugs, which is one of the two drug classes for which replicate designs are recommended, the analysis of trials that scaled individual bioequivlanece, reference scaled individual bioequivalence be used, but we contend that scaled average bioequivalence is much more effective for the purpose.

Now, the next slide would demonstrate this, but I shall turn to it only if there is time.

Again, the following slide demonstrates the next statement, namely this, which has to do with the proposed ratio of geometric means, the GMR. In the guidance, GMR for individual bioequivalence of 1.25 is recommended, and in the demonstrations we show the scaled average bioequivalence and this constraint is workable. Doesn't change much the character of the test.

On the other hand, the test of the scaled individual bioequivalence and the constraint dramatically changes the individual bioequivalence test. It simply becomes not an IBE test but becomes a GMR test, a test of the geometric ratios.

The same condition can be expected if one constrains the ABE test down to 1.15 or 1.10. It becomes probably -- because we haven't done these studies -- but probably the consequence is that we would have a GMR test rather than an average bioequivlanece test.

So, we think that the imposition of a very narrow constraint would change the character of the test and would be probably counterproductive. Therefore, we conclude that the acceptance or rejection of individual bioequivlaence can be due to random chance alone, and therefore it's not really a good procedure, in our opinion. Scaled average bioequivalence is much more effective than scaled individual bioequivalence for assessing highly variable drugs. And moderate constraint could be workable for scaled average bioequivalence, not for scaled individual bioequivalence. A strong constraint would probably be counterproductive.

I have additional slides, but no time. Thank you.

DR. LEE: Thank you very much. Any questions for Professor Endrenyi?

DR. MEYER: Laszlo, could you amplify point three a little more for me? Why 1.15 would be probably counterproductive? Because that speaks, it seems to me, to one of the issues of confidence in the FDA's decision and 25 percent difference is larger than we're used to.

DR. ENDRENYI: In this case, consider the scaled individual bioequivalence curve, which is this curve. And consider the geometric limit alone, which is this. What you have is the acceptance of tests as you vary the true ratio of the geometric means. That's further separations.

First of all, you notice that the individual bioequivlaence curve is a very permissive. It permits large deviations.

The general rule is that when two criteria are joint -- in this case, they're individual bioequivalence, and they're GMR criteria -- then the joint criterion has acceptances which are lower than either in the separate criterion. That makes sense.

Now, in this case, when the GMR criterion is so much tighter than the IBE criterion, then the joint criterion actually draws close to the GMR criterion. So, it becomes a GMR criterion rather than an IBE criterion.

The same thing would happen when the ABE, average bioequivalence. When the GMR criterion moves to the left because of time constraint, it's well to the left of the ABE criterion, and the GMR criterion would dominate.

DR. MEYER: If you had scaled average bioequivalence, and you had the 1.15 GMR, wouldn't you still have potentially the larger confidence intervals that would pass?

DR. ENDRENYI: Without the GMR, yes. With the GMR you would have the GMR criterion.

DR. MEYER: So, you couldn't have confidence limits that were beyond 1.25.

DR. ENDRENYI: That's right. Essentially you would have the Canadian Cmax criterion.

DR. LEE: Okay, on that note, thank you.

Mr. Charles Bon?

MR. BON: Actually I'm going to address in part some of what Laszlo said. I want to thank you for the opportunity to address the committee.

I wanted to start with just a brief discussion of what the individual bioequivalence criterion is based on. It starts with the ratio of the expected square of changing a patient from the reference product to a generic test product to the expected squared ratio of that same patient taking the reference product on two different occasions, and then we place some limits on that.

In the development of the criterion, it's an aggregate criterion. It was elegantly developed through mathematical and statistical considerations. In the criterion, you've seen there's a difference in means, a difference in variances, and a subject-by-formulation term, which really talks about the consistency of the test to reference response in the subject studied. In the case of highly variable drugs, it's scaled by the reference variance.

However, what happened to the criterion was just what Marvin had said. There were things that we weren't used to. One of the things was to allow the test-to-reference ratio of means to go outside of the .80 to 1.25, so this was really added without justification. I see in what you're going to be asked to look at this afternoon is to further restrict this as well as to put a restriction on the subject-by-formulation term.

Proposing that the restriction on these individual terms in the aggregate criterion is not supported by the mathematics and the development of the theory. It's not supported by any clinical or good scientific considerations, and it has very undesirable consequences.

I'm going to show you the results of a small pilot crossover study on an immediate-release coated oral tablet that the FDA had approved in the early 1980s. This was just a single one-tablet fasted dose in generic versus brand.

In this we found that the log AUC's were comparable in terms of both the observed ratio and it actually meets the .8 to 1.25 confidence interval. Log Cmax was well outside certainly on the confidence intervals and on the individual ratio, and yet the Tmax's were very similar.

I'll show you a couple of examples here of some of what I call the well-behaved subjects. I'll just show you a couple of these subjects, where the test is in red and the other color is the brand. But we had two subjects out of the 10 that gave very low profiles on the brand, even though their profiles on the generic product were quite consistent with those of the other subjects. In fact, we saw profiles on the brand that didn't really look like it was an immediate-release product.

In going back and looking at the formulation, I'm told by the formulators that there's this coating on the tablet in the reference product which is old technology and is a very poor coating, and in dissolution there were problems with certain units of the reference product.

Now, here is actually the test-to-reference ratios and I've highlighted two problems. Here are 3.9 and 3.2 for test-to-reference ratios on Cmax. We actually had a couple of other high ratios which may be partial problems with the reference.

But I'm going to use this example with some assumptions. I did a simulation of 100,000 replicated trials, assuming that a good test-to-reference ratio that occurs in 80 percent of the brand tablets with this particular generic product would be an acceptable ratio of 1.05. But 20 percent of the brand tablets would not release in an immediate-release fashion and give you this lower Cmax resulting in an expected test-to-reference ratio of 3.5.

Consistent with what we saw, the generic product was actually on an inter-subject basis less variable than the reference, but under the assumptions for the simulation, just to illustrate my point, I assumed a 20 percent within-subject within-product CV for the generic, 30 percent for the brand, and I did a replicated study in 30 subjects.

Less than 30 percent of the time the test-to-reference ratio fell within .8 to 1.25. This is just the geometric mean ratio, which immediately, regardless of what else was happening with the aggregate criterion, this product would be deemed to be bio-inequivalent by individual bioequivalence.

The only recourse that the generic company has is to actually make a bad generic product, and a bad generic product that falls somewhere in between the good ratio of 1.05 and this ratio of 3.05 so that they can overcome the rather arbitrary restraints placed on the difference in means or on the geometric mean ratio. This is one of the side effects of placing constraints on something that is really a good aggregate criteria.

DR. LEE: Thank you very much. Questions?

(No response.)

DR. LEE: And the next one is Mario Tanguay.

DR. TANGUAY: First I would like to thank the committee for this opportunity to present on behalf of the GPhA and MDS Pharma Services some comments on the individual bioequivalence approach.

I would also like to thank the GPhA Science Committee and the CRO Biopharmaceutic Committee for their collaboration, as well as my colleagues from MDS Pharma Services.

The IBE approach offers some advantages compared with the average bioequivalence approach. When we administer the same formulation twice in the bioequivalence study, it allows one to better differentiate the variability associated with each formulation. Contrary to the average bioequivalence approach, the IBE approach takes advantage of the fully replicate design.

The IBE approach may also be advantageous from an ethical point of view, since a smaller number of subjects is required for highly variable drugs. Due to the internal scaling component of the current IBE approach, the widening of the goalposts will depend on the variability of the reference product.

However, there might also be some disadvantages associated with the IBE approach. The main one is that there is some uncertainty regarding the switchability assessment, or, in other words, the subject-time-form interaction.

Bioequivalence studies are designed to compare the relative rate and extent of bioavailability of two formulations of the same active ingredient based on Cmax and AUC calculations. These studies are not primarily designed to rapidly assess switchability.

If a subject-time-form interaction is seen in a study, there is no way to determine clearly the reason for this observation. It is not clear if this could be due to the presence of an outlier, for example, or to a subset of subjects, or if this could be due only to chance. It is also possible that different results with regards to switchability would have been observed if the drug products would be administered more often.

In addition, when the IBE approach is used, it is highly recommended that subjects from a heterogeneous population be enrolled, meaning that people from different age, gender, race and so on should be enrolled. However, this may not be helpful, as these studies are again not designed up front to evaluate differences in pharmacokinetics based on demographic factors. Therefore, even if the subject-time-form interaction is observed, this will need to be proven further by a properly designed study, which would raise other questions.

In the clinical research area, it has been proven many times that conclusion from posteriori analyses were proven to be wrong when verified in properly designed prospective studies. There are many examples of this situation in cardiovascular pharmacology or infectious disease, for example, and these lessons should apply to switchability measurement in bioequivalence studies.

In conclusion, the bioequivalence of two formulations of highly variable drug will be better assessed by giving the same formulation more than once to the same subjects. The IBE approach can then be useful for highly variable drug products. However, there is some uncertainty regarding conclusions that could be drawn from a switchability assessment.

Thank you.

DR. LEE: Thank you very much. Any questions?

DR. SHARGEL: Just a quick comment on the point that you said, ethics. It is a reduction of blood samples in IBE, but you do have more exposure to the drug by the individual subject. You have twice the exposure, so I think that can be considered as well.

DR. LEE: Thank you, Leon.

Dr. Midha?

DR. MIDHA: I'm grateful to the committee for the opportunity to speak to you today. I'm here on behalf of PharmaLytics, which is a nonprofit institute of the University of Saskatchewan.

You have already heard some very good comments. The important consideration is that highly variable drugs or drug products are safe drugs with flat dose-response curves or shallow dose-response curves. That means otherwise they wouldn't have gotten on the market. So, you're dealing with drugs which are safe.

A drug with ANOVA-CV, an average bioequivalence of 30 percent, has been defined as highly variable. If the drug product has the ANOVA-CV in a two-treatment crossover design, it is then considered to be a highly variable drug product, so differentiate between drug and drug product.

The problem with highly variable drug products are that you need a very large number of subjects in order to meet the average bioequivalence criteria preset, which has confidence limits of .80 to 1.25 percent. You need a very large number of subjects, and people have calculated from anywhere 60-fold to over 100 subjects.

I'm just going to show you an example. Chlorpromazine is an example of a highly variable drug. We had shown 10 years ago or 11 years ago that it had an average ANOVA-CV, an average bioequivalence of 34 percent for AUC and 43 percent for Cmax. The test product in the study was given once and the reference product from the same lot was replicated.

The next slide shows the results from that study, which have been published. You're looking at the ANOVA-CV's of 34 percent and 43 percent. What you observe here, that when test is compared to reference, it meets the criteria for AUC. When reference is compared to same lot, it meets the criteria. But when you look at the Cmax, with ANOVA-CV of 43 percent, test compared to reference does not meet the criteria because the upper bound is 1.26 percent, above 1.25.

But look at reference to reference. Here now the criteria is violated to the extent that ANOVA-CV takes it other than the geometric mean ratio of 115 percent -- and that's what Marv's question, we are trying to fix. It is now 136 percent. But this product has been on the market and has been utilized for over 40 years. So, it's clear that two samples from the same lot of the reference product were not found to be bioequivalent with each other because, one, the ANOVA-CV was large and the point estimate for reference to reference was 115 percent, a comment Professor Endrenyi made earlier.

These data and the data which we have done research on demonstrated that the reference formulation was a good quality product, but it was the drug which was highly variable, and the drug has been on the market for 40 years.

Under the new recommendation of October 2000 guidance, when stated a priori, after due consideration with the agency, scaled IBE based on replicate design may be allowed for a highly variable drug and drug product. This in our opinion is a reasonable approach. We were one of the first research groups to make a recommendation to go using replicate design, do average bioequivalence scale, a case which Dr. Endrenyi made again.

But in absence of that, at present the guidance has a very reasonable proposal, and I believe for the trial period it ought to be maintained based on the fact that these drugs are safe and we do not wish to do undue human experimentation when it is not needed.

The use of scaling in a highly variable drug, because you are scaling to the reference variability for the type of variability which already exists in the marketplace, permits the assessment of bioequivalence to be performed with a reasonable number of subjects. Yes, there are replicated measures but they are a reasonable number of subjects without compromising either the consumer risk or the producer risk.

An additional advantage which you heard from the previous speaker is that IBE in a replicate design or average bioequivalence based on replicate design would allow you to look at the pharmaceutical quality of the product. With all the advancement made in pharmaceutical sciences, you would like to see the generic formulations continue to come which have got reduced variability.

The constraint that the GMR must fall within 80 to 125 percent for scaled IBE is reasonable and should be maintained.

Increasing the constraint on GMR in IBE to less than 125 percent is actually defeating the very purpose. We are going back a decade.

Our plea is we should not modify the recommendation in the guidance until scientific evaluation of the scaled metrics are completed.

And I thank you for your attention.

DR. LEE: Thank you very much, Kam.

Any questions? Yes.

DR. MEYER: Kam, you're kind of pleading for more data and more evaluation of existing data. In February of 1998, you and Jerry Skelly and Laszlo and Gordy Amidon published a paper entitled "IBE: Attractive in Principle, Difficult in Practice." You were pleading for more data. It's been 2.5 years now. Surely we have more data. Can't we either decide one way or the other?

DR. MIDHA: No, I'm not asking for highly variable drugs or drug products to have more data. That paper was written in light of the fact that there was a strong move afoot to apply IBE all across, and that's why the plea was, and the plea continues to be.

I would also go, Marv, and make a plea that we ought to also investigate in replicated design studies average scaled bioequivalence because if we are not prepared to widen the bioequivalence limits based on the class of drugs, I think that may be another reasonable approach. But in view of the fact that we don't have average bioequivalence scaled from replicate design in the guidance, the only step forward is the step which we have taken in the October 2000 guidance.

DR. LEE: Dr. Lesko?

DR. LESKO: Thanks.

Kam, on the framework for your remarks today, you indicated that the problem was highly variable drugs or drug products. My understanding of the framework for this problem is that we have a generic product that approximates a GMR of approximately 1, but in order to meet the 80 to 125, we need a large number of subjects. So, it seems to me reasonable to scale in the context of approximating 1 as the ratio, but what you're basically concluding is that it's okay to have a 25 percent increase in bioavailability of generic product and then on top of that go ahead and scale.

I guess I'm wondering why you think constraining the mean to 15 percent, which is approximating what would be allowable under the current standard for average bioequivalence, would interfere with the ability to scale bioequivalence limits to allow for lesser subjects for a highly variable drug.

DR. MIDHA: Larry, I probably have not followed your question, and I think I'm going to spend some time discussing with you. But if I have followed it correctly, the reason is that when you take the constraint down to 115 percent, essentially that becomes the determinant step, not the limits. So, the result is that as Laszlo could not show that, that the GMR becomes the determinant in terms of declaring bioequivalence, then whether it is the limits of average, or in the case of IBE the limit. So, unless you want to take that determination -- and I don't want to take too much time of the committee, but clearly that is the crucial issue which we are dealing with.

In the case of highly variable drugs, you know, and we have had many discussions on it, they are safe drugs, otherwise they wouldn't get on the market. And the fact is, we are trying to force the GMR, asking 90 percent of the time the values are going to exist in that. When reference-to-reference from the same lot can show you those kind of variability. It's already in utilization for over 40 years. So, that's my plea to you and the people, those who are going to consider it.

DR. LEE: Okay, I realize that there are quite a few questions to be posed, but in the interest of time, I'm going to close this morning's session. We have an afternoon devoted to individual bioequivalence, and I saw that 4:30 is the time of adjournment. In order to keep to that I'd like to suggest that we come back here at about 1 o'clock.

Thank you.

(Whereupon, at 12:06 p.m., the committee was recessed, to reconvene at 1:00 p.m., this same day.)

 

 

 

 

 

 

 

 

 

AFTERNOON SESSION

(1:03 p.m.)

DR. LEE: We have a number of guests at the table. I think that we know both of them, but for the record, would you please introduce yourselves?

DR. ENDRENYI: Laszlo Endrenyi, University of Toronto.

DR. YACOBI: I'm Avi Yacobi from Taro Pharmaceuticals.

DR. LEE: And we have Professor Benet, allegedly in his office.

DR. BENET: I am here.

DR. LEE: Okay. Thank you, Les. Les said that he could see us but we could not see him.

The agenda for this afternoon's session is on individual bioequivalence, and we have plans for about 90 minutes on background information. We'll go for a break, and the committee is going to deliberate on the issues at 2:45. And I'd like to draw the attention of the committee to the five topics. Marv Meyer will be leading the discussion and he's going to tell us exactly what to do.

(Laughter.)

DR. LEE: Dr. Lesko, are you ready?

DR. LESKO: Yes.

DR. LEE: Please.

DR. LESKO: Good afternoon, everybody. The purpose of my being up here at the moment is to introduce the topics for this afternoon. I'll provide a little bit of a background to the discussion topics and some of the rationale for bringing these topics to the committee.

Average bioequivalence represents the current and traditional standard for the approval of generic drug products and products post-marketing after some changes in their manufacturing.

It's been used by the FDA to analyze clinical trials for the marketing of thousands of generic drugs. The agency recognizes that in some cases there is a need for other standards or alternative standards and for a few drugs, such as those defined by class I of our biopharmaceutic classification system, in vivo studies are waived and market access is granted on the basis of in vitro studies.

There's a large amount of empirical evidence that suggests that generic drugs are used regularly without serious problems of safety and efficacy, and the agency feels confident in the therapeutic equivalence of these products.

Individual bioequivalence represents an improved standard in the agency's mind, and it was proposed by FDA as an improvement on the study design, the informativeness, and the method of analysis of BE studies. You heard a little bit this morning about the differences between average and individual bioequivalence. This approach takes into account within-subject variability for both the test and reference product. It detects signals that may represent a subject-by-formulation interaction, and it allows for scaling of the bioequivalence limits.

It's been a controversial topic with many debates and public discussions, to say the least. Through these public discussions and debates we resolve many of the issues associated with this approach, but as of today it has not been universally accepted in the scientific community, or by other regulatory agencies.

A less thorough discussion of this topic in front of the group was back in 1999. At that time the recommendation of the committee was that they had concerns with the new criterion, and recommended use of the ABE criterion for market access unless there is a compelling reason not to. This was reflected, I believe, because of some insufficient data at the time to replace the old standard of average bioequivalence with the new one, where there may be some risks that were either unknown or unappreciated at the time.

We subsequently about a year ago came out with a general BA/BE guidance, and the focus of this afternoon will be on one section of that guidance. The section that is in focus is the one that deals with the comparison of BA measures in BE studies. It's section IV of the guidance. And the key words in that section are the ones I've italicized and bolded. It says, however, sponsors have the option to explain why they would use another criterion other than ABE. One of the examples might be highly variable drugs and the use of replicate design studies.

However, what this language allows for is an opening, in a sense, for using individual bioequivalence for allowing market access of a generic drug.

A few sponsors have actually requested a priori in their bioequivalence study protocols that the agency use IBE to allow scaling and to allow access to the marketplace.

So, the agency has a dilemma in a sense in making the decision on market access based on the scientific evidence presented by these replicate design studies, and we'd like to bring some of this data to the committee for their evaluation today.

That leads me to the first discussion topic, and it has to do with, is it reasonable and appropriate for FDA to use ABE for market access unless there is compelling reason not to during an interim period for another year from today until we make the final decision to use IBE for market access.

We're sort of one year post the guidance, and we've acquired about 20 replicate data sets since that time in ANDAs and NDAs, which you'll hear about. We'll be presenting that data to the committee today, and from your look at that data, whether that data provides any new insights into the use of IBE. We feel that this discussion topic in a sense confirms the current situation and doesn't necessarily represent anything new.

The reason we prefer to stay at our status quo is that we still have some concerns about using IBE for market access and some unintended consequences perhaps of this criterion. Of the data sets we've accrued in the past year, most of them pass both average bioequivalence and individual bioequivalence, and as a result are not very informative. We focused on those data sets where one of the criteria passes and one fails. That's where we want to try to discern the differences in the behavior of these criterion.

One of the things that presents a dilemma for us is a situation when ABE fails our current standard but IBE passes, and we have this example in two cases. In the NDA data that we've sent you, drug number 6 represented this phenomenon, and in the ANDA data set, drug number 2 represented this phenomenon.

When we have this situation, it raises some questions. It raises questions, for example, in the ANDA drug number 2. Is this product switchable, and does IBE assure that? In this example the mean test-to-reference ratio was 88.5. We estimate that up to a 15 percent difference in the test-to-reference ratio can pass ABE, so there's nothing remarkable there. This drug in fact may have passed average, had it been powered with more subjects.

The within-subject variability was pretty much similar. The test had a modestly higher variability, but the subject-by-formulation interaction was what we considered important in our guidance, when the value of that SxF exceeds 0.15.

So, some of the concern with the criterion is that it's designed to identify signals of a subject-by-formulation interaction. Unless we have some other evidence to the contrary in this study, one might assume that this is a real signal of a subject-by-formulation interaction. Yet, in the face of that, while we succeeded in detecting it, the IBE criterion says to pass the product.

Furthermore, like most of the studies that are in the new data we sent you, the subject population has been healthy. All male volunteers. We do recommend in the guidance a heterogeneous population, and as a result we feel that the all-male volunteer population may tend to reduce the frequency of the subject-by-formulation interaction.

The guidance states that the mean test-to-reference ratio should fall between 80 to 125, and in this example there was no problem with that.

Discussion number 2, the advisory committee is asked to comment on a proposal that if we were to use IBE for market access -- and this is an important part of this discussion topic -- when there is compelling reason not to, during the interim period, which I've defined as the next year, we're proposing that some conditions would apply.

The first is a new condition. The current guidance has 20 percent allowable difference in the test-to-reference ratio, the GMR as it was referred to. We're proposing that we change that to 15 percent.

We admit there's not a lot of data since last year, or a lot of scientific evidence to recommend that change. However, because of some of the behavior we've seen with this criterion in the data sets, we feel that if we're going to allow something into the marketplace with the IBE criterion, we'd like to have a better constraint on the mean-variance tradeoff that it currently allows.

We're also suggesting as another constraint the subject-by-formulation interaction should be nonexistent if we're going to approve a product under IBE. If it's less than 0.15, we would conclude no significant interaction. Our dilemma is when that appears to be greater than 0.15. We have a question in our mind, is it real, is it due to the test product, or is it occurring by chance alone, and we have no way of determining that and we have some reservations about approving a product with a significant subject-by-formulation interaction.

Furthermore, we'd like to suggest that sponsors follow the recommendation that subjects should be heterogenous, taking into account age, sex, race factors, as appropriate, in conducting the studies in which they would like to gain market access using the IBE criterion. We feel that's necessary or it defeats the purpose of IBE, that is, in asking the question about variances and about subject-by-formulation interactions.

Discussion topic number 3 is a somewhat status quo question. You can see what it is on the slide. It's basically getting to the continuation of our recommendation to conduct replicate design studies for modified-release products and for highly variable drugs. We have no reason to suggest a change in this recommendation. About half of the products that we sent to you as new data were modified-release products. However, the subject population in almost all those studies was a homogeneous population and not a heterogeneous one.

We feel that it's important to continue with this approach in the absence of data not going forward with it. With regard to replicate design studies, it provides us empirical evidence of any problems with ABE, if they exist. It continues to allow us to explore, as we have done, a systematic analysis of the subject-by-formulation interaction to resolve whether its frequency is enough to be of concern. And it allows us, on a case-by-case basis, to assess the clinical significance of differences in variance.

As I said, in order to do all this, we need a heterogenous population to maximize the information that we'll get from these studies in order to make any conclusions or extrapolations from these studies. It's difficult to do it using all males who happen to be young.

We'd like to get to a final destination with this individual bioequivalence and make a final decision to use it, not use it, when to use it, to allow market access. It's a significant scientific and public health issue. We want to be sure that we have the rationale to make the right decision.

So, we feel we need a larger database, recognizing that even one year won't provide us the entire database we need to make the decision, but we need more actual examples, and we hope using a heterogeneous population, coupled with some simulation and other exercises that would allow us to come to a final resolution of this issue of its use.

Finally, our last discussion topic. We provided you with a research plan. We ask for a comment on it. The research plan is fairly comprehensive. We're not sure we have the manpower to accomplish it all. It's important, we feel, to have some priorities in this research plan. It hasn't changed substantially since 1999, but any comments the committee would have on priorities within that research plan would be beneficial to us.

So, that brings us to the agenda, and my introduction as to why we're here. What we'll hear at this point from the FDA speakers will be a presentation of the replicate designs in the ANDAs. This will be primarily new data that hasn't been presented before. We'll follow it up with a presentation of replicate design studies from the ANDA database, and finally we'll hear a presentation on the research plan.

Thanks very much.

DR. LEE: Thank you, Larry.

Does the committee understand what the marching order is?

We have two new guests joining us. Would you please identify yourselves.

DR. BOLTON: I'm Sanford Bolton.

DR. ZARIFFA: And I'm Nevine Zariffa from GlaxoSmithKline Pharmaceuticals.

DR. LEE: Thank you.

Mei-Ling, are you going to make the two presentations separately?

DR. CHEN: I will do that together. Well, good afternoon, everyone. As indicated, there are two parts in my talk, and for the benefit of new members on the advisory committee, I will briefly provide an overview of the background and the concepts of individual bioequivalence, and in the second part of my talk, I will then discuss the results of our statistical analysis for replicate design by bioequivalence studies, with the focus on NDAs in the FDA database.

As most of you know, the current regulatory approach for evaluation of bioequivalence has been based on the comparison of population means between products, and this is the so-called average bioequivalence approach. The agency has been interested in the individual bioequivalence because the new approach appears to offer several advantages over the use of average bioequivalence. The individual bioequivalence compares not only to population means but also the variances between products. This approach considers subject-by-formulation interaction, which is believed to be an important factor in the assessment of switchability between products.

With an appropriate criterion, the individual bioequivalence can establish goalposts based on the reference variability, and this is particularly useful for highly variable drug products. The new approach also creates incentive for both innovators and generic sponsors to manufacture less variable products. Because of the emphasis on the assessment of subject-by-formulation interaction, this approach also encourages the use of a heterogeneous population in bioequivalence studies.

An important principle for individual bioequivalence assessment is based on the distance concept. The principle is to compare the distance between the test and the reference product with the distance between the test-reference and the reference formulations. So, for individual bioequivalence the test and the reference formulation have to be administered to the same individual. If we call this comparison an individual difference ratio, then the goal of bioequivalence demonstration would be to show that IDR is not substantially greater than 1.

So, based on the concept of distance ratio, the agency has developed the individual bioequivalence criterion with a general form like this. It combines the average bioequivalence criterion with the variance terms, which is then normalized by the variance of the reference product.

So, the variance terms are subject-by-formulation interaction, sigma D squared, and difference within-subject variance between the test and the reference product. Those are sigma WT squared, and sigma WR squared. And theta I, on the right-hand side of the equation, is a bioequivalence limit specified by the regulatory agency.

Now, what is subject-by-formulation interaction? In simple language, it is a measure that tells us how similar or dissimilar each individual response to the test and the reference product. On this slide sigma D squared is the subject-by-formulation interaction variance component, and it's the variance of the individual mean differences between the test and the reference products. So, sigma BT and sigma BR are the between-subject standard deviation for the test and the reference product, respectively. Rho is the correlation coefficient between the individual means between the test and the reference products.

So, as you can see from this equation, there are two sources for subject-by-formulation interaction. It may come from the changes in between-subject variability for the test and the reference formulation, and it may be due to the lack of correlation or congruence in individual means between the test and the formulation. Sigma D is zero only if sigma BR equals sigma BT, and rho equals one. So, based on this equation I would like to point out that sigma D is independent of the within-subject variability of the drug products.

Our experience so far has indicated that subject-by-formulation interaction does exist. In some cases we could identify the factors that contribute to the interaction, but in other cases we couldn't identify the factors or subgroups that caused the interaction.

This is an example that illustrates a subject-by-formulation interaction due to an age difference in the population. Two generic products versus a brand name drug, and as indicated on this slide, the test reference ratios for generic 1 are consistently higher in the elderly than for young people, and the phenomenon doesn't occur to generic 2. It's an age-based subject-by-formulation interaction, and the authors of this paper suspected that the higher serum levels of generic 1 might be due to the faster dissolution rate or absorption rate, which in turn saturated the hepatic enzymes in the elderly.

The second example came from the studies on a calcium channel blocking agent. The mean T/R ratios of Cmax and Tmax in male subjects are significantly different from those in female subjects. This is a gender-based subject-by-formulation interaction, and the mechanism of this interaction has been postulated, which is related to a different release rate of the two formulations, and possible gender differences in metabolism and transport along the GI tract.

In the agency, we have seen other examples of gender-based subject-by-formulation interaction, but due to time constraints, I wouldn't be able to present them here.

How do we interpret the subject-by-formulation interaction? There are two approaches. One approach is to estimate the percentage of individuals whose average T/R ratios are outside a range of 80 to 125 percent. Another approach applies to the cases where the subject-by-formulation interaction arises due to the presence of subgroups that have different test-to-reference ratios from the rest of the population. I will explain this further in the next two slides.

This is a graphical representation of approach 1. The x axis is the sigma D value, and y axis represents percent of individuals with mean T/R ratios outside 80 to 125 percent. So, for example, if sigma D is .15, you see approximately 15 percent of the population individuals having their T/R ratios outside 80 to 125 percent. If sigma D is .3, then approximately 46 percent of subjects would have their T/R ratios outside 80 to 125 percent. In this context if we consider 15 percent is a large proportion, then a sigma D value of .15 may be considered as a cutoff for a large subject-by-formulation interaction.

Bear in mind that this figure is constructed with the assumption that test-to-reference mean ratio is 1. So, if the T/R ratio deviates from 1, then the same sigma D value may imply more proportions of individuals having their T/R ratios outside 80 to 125 percent.

The second approach relates to the interaction where the formulations differ in a subgroup but not in the remaining subjects of the population. The x axis is the proportion of subjects in the subgroup, and y axis reflects a sigma D value. Each curve represents a fixed mean T/R ratio for the subgroup. So, ranging from 1.2 to 2. The larger the main T/R ratio, the higher the curve. As such, you can see sigma D value is a function of two factors: one, the proportion of subjects in a subgroup; and the second, the mean T/R ratios in that subgroup.

So, for example if I have 5 percent of the population having the T/R ratio of 2, you see the corresponding sigma D is .15. Similarly, if I have 25 percent of the population having a T/R ratio of 1.4, a sigma D is also .15. But interestingly, if you look at the horizontal line for sigma D .15, this line across the board, then you see this line only intersects with those curves having T/R ratios greater than or equal to 1.4. In other words, if I have 50 percent of the population with a T/R ratio of 1.3, then the sigma D plateaus at .13 and it never reaches .15.

So, in this regard, using .15 as the cutoff for sigma D, it's not really strict. We have subgroups in the population, and it becomes important to choose the appropriate definition when we interpret sigma D.

Derived from the distance ratio, the individual bioequivalence equation ends up to have sigma WR in the denominator. This is interesting in that it actually represents a scaling approach where the bioequivalence criterion can be adjusted based on the variability of the reference product, and the reference scaling takes us away from the one-size-fits-all approach and offers flexible criteria for different classes of drugs.

One of the advantages of reference scaling is to widen the bioequivalence limit for highly variable drug products. It reduces the regulatory burden. In addition, the fact that sigma WR in the denominator is directly derived from the distance concept makes it sensible to have reference scaling using this criterion, rather than the average bioequivalence criterion.

The down side of this reference scaling approach is that we may unnecessarily tighten the bioequivalence limit for the drugs with low variability beyond a reasonable public health need. So, to correct this problem, the current guidance has recommended a mixed scaling approach. In other words, we set a regulatory limit for the within-subject variability, and that is called sigma W0. When the reference variability, sigma WR, is greater than sigma W0, we scale to the reference variance. When sigma WR is less than or equal to sigma W0, we scale to the constant variance.

As you can see from this equation, if the test variance is smaller than the reference variance, then it will be easier for the test product to pass the criterion. This provides an incentive for drug sponsors to manufacture less variable formulations.

In the meantime, it is possible to have a tradeoff between the mean and the variance, since both are in one equation. There was a concern in the past that the tradeoff in the possible -- also, reference scaling may allow a test product with a large average difference to enter the marketplace. To avoid this situation, the current guidance has recommended further constraint on the point estimate of geometric test-reference means, to be within 80 to 125 percent.

Turning to my second part of the talk

DR. MOYE: Is it inappropriate to ask a question about the first part at this time? Do you really want to wait till the end of the second one?

DR. LEE: Is it a clarification?

DR. MOYE: I think it is.

DR. LEE: Please go ahead.

DR. MOYE: Perhaps you're using the word "interaction" differently than I'm used to. When I think of a subject-by-formulation interaction, I'm thinking that there is a dependent variable upon which the formulation can have an impact and the subject can have an impact. To my way of thinking, a subject formulation interaction is a modified effect of the formulation by subject. That is to say, the effect of the formulation differs from subject to subject. Is that what you mean?

DR. CHEN: Correct.

DR. MOYE: So, when you talk about a gender-modified subject-by-formulation, you're saying that the way the subject modifies the formulation's effect depends on gender.

DR. CHEN: No, that's not what I meant. It's actually, like you say, an interaction between the characteristics of the formulation and the individuals recruited in the study. So, the interactions actually should be due to both factors: subjects and the formulations. But here what I illustrated is only on the subject side. In a way I could identify the factors based on the subjects. But I haven't really talked too much about the factors from formulations.

DR. MOYE: Well, I don't want to take too much time, but I did have that question.

DR. BENET: Vince, I'd like to ask a question?

DR. LEE: Sure, Les.

DR. BENET: Mei-Ling, I know that the reference product with the gender-based is on the market. But are there generic products also on the market of that reference?

DR. CHEN: Which one? The gender-based?

DR. BENET: The gender-based product.

DR. CHEN: The gender-based product. My understanding from the Generic Office was that that was a study presented by --

DR. BENET: No, no. That's not the question. The question is, are there generics on the market for that product which you have shown that the reference has a gender effect?

DR. CHEN: I think I don't know at this point.

DR. LESKO: I think I can answer that question, Les. The product Mei-Ling is talking about was never approved for the market. However, there are generic diltiazem products approved in the marketplace. Calcium channel blockers. Sorry.

DR. BENET: What Larry has said is that the reference product is the innovator diltiazem product, and that there are generics on the market of diltiazem. That's what my question was. Is that the correct answer? Is that what Larry said?

DR. LESKO: Yes, it is. That's what I said.

DR. BENET: Thank you.

DR. LEE: Any other questions, since the floor is open?

(No response.)

DR. LEE: I assume not. Mei-Ling, please go on.

DR. CHEN: Now, turning my second part of the talk, I will show you some of the real data from replicate design bioequivalence studies.

For drug submissions, FDA previously collected 27 data sets. In addition, there were 28 data sets analyzed by the industry. After the publication of our final guidance, we have received 9 more studies from new drug applications and 13 more from ANDAs. So, in total there are 77 data sets with the replicate design studies.

Unfortunately, most of these data sets were conducted in healthy, young male subjects, with a few exceptions of having females in the studies. Moreover, most studies in the FDA database have been performed on immediate-release dosage forms.

So, this slide gives you a snapshot of the old database. For the 27 FDA data sets, the frequencies of having a subject-by-formulation interaction of greater than .15 are approximately 20 percent for AUC, and 33 percent for Cmax. Because of the small sample size, some of these interactions did not show statistical significance. However, the confidence intervals with these interactions are wide, and so we couldn't really rule out the possibility of important subject-by-formulation interactions.

If we compare the with-subject standard deviations between the test and the reference product, using T/R ratio 1.2 as a cutoff, then the frequency for the test product having a higher within-subject variability is 33 percent for AUC, and 30 percent for Cmax.

It appears that similar results were obtained by the industry. However, their frequencies for subject-by-formulation interaction greater than .15 is a bit higher for Cmax. It's around 40 percent. These data have been previously discussed and presented at several meetings, so we will not discuss it here.

Our focus this afternoon will be on the new data set collected this year, and as shown on this slide, of the 9 studies from NDAs there are three modified-release, six immediate-release, and six highly variable drugs. Of the 13 studies from ANDAs, we have five modified-release, five immediate-release, and three slow-release, and three highly variable drugs. All the studies were conducted in healthy volunteers, and the sample size ranged from 17 to 93 subjects.

This slide summarizes the results of data analysis for three modified-release products. Bear in mind that all the analyses were conducted on the log transformed data, so the within-subject standard deviation on the log scale approximates the within-subject CV on the original scale.

So, for the three modified-release products, average bioequivalence and individual bioequivalence are in agreement with respect to the conclusion of bioequivalence. That means when the study passed ABE, it also passed IBE. When the study failed ABE, it also failed IBE. This is because there is no substantial difference in the within-subject variability between the test and the reference formulations, and there is no subject-by-formulation interaction in most cases, with the exception of data set number 3.

Data set number 3 is a study of an enteric coated dosage form, and the Cmax of this study failed ABE, average bioequivalence, because of the big difference in the T/R means. It also failed, I believe, because of the combination of the large mean difference in the subject-by-formulation interaction. A further analysis of the individual data has revealed that three subjects have their mean T/R ratios greater than 1.5, that I didn't present here.

This slide shows the immediate-release products. We actually have six IR products, and bioequivalence outcomes are also similar, using either IBE or ABE, with the exception of two AUCs in data set number two and AUC-infinity in data set number three.

As shown on this slide, data set number three has a sigma D .3, and it's a highly variable drug product, with the reduction in the within-subject variability, reference 40 percent and the test 35 percent. And also reference scaling, this study passed individual bioequivalence.

I have to talk about data set number 2. Data set number 2 has a big sigma D subject-by-formulation interaction for both AUC parameters, and therefore, this study passed average bioequivalence but failed individual bioequivalence. After further examination of individual data, we found a subject with extremely low AUC and Cmax values on both replications of reference product. Some people may have a concern that the individual bioequivalence criterion is too sensitive for outliers. However, because of the use of replicate designs, we can actually check if the abnormal values come from the outliers. So, in this case the retest character of the replicate designs tells us that it's unlikely that this is due to outliers, because both values on the reference product are on the lower side. The question, then, is whether this subject represents a subgroup in the population who responds to the test and the reference differently.

I would like to switch gears to talk about the FDA contract studies. There are three studies --

DR. LEE: Mei-Ling, would you give us a quick summary?

DR. CHEN: Am I out of time?

DR. LEE: Yes, you're almost out of time. Because of the questions because of the questions Les asked I think.

(Laughter.)

DR. CHEN: Okay. I guess I have to summarize our contract studies. Ranitidine, metoprolol, and methylphenidate. I will discuss ranitidine and metoprolol together because these two studies were performed in parallel to investigative effect of excipients on the bioavailability of drugs. Both studies compare the bioavailability of candidate drugs in sorbitol versus sucrose solution.

From the literature we know that ranitidine has low permeability, while metoprolol has high permeability. Regarding the two excipients, we know sorbitol has low solubility and permeability. It can increase the osmotic pressure in the gut and reduce the GI transit time. On the other hand, sucrose has high permeability.

The hypothesis was that the bioavailability of a low permeability drug such as ranitidine is more likely to be affected by an excipient such as sorbitol that reduces the GI transit time. And the subject-by-formulation interaction may occur when two syrup formulations contain different sweetening agents.

This is the result with ranitidine studies. You see sorbitol solutions produced lower bioaviability than the sucrose solution. While in the metoprolol study, the excipient effect has much less influence on the metoprolol levels.

Interestingly, we also found a subject-by-formulation interaction in the sorbitol ranitidine studies. In a way a reduction of between-subject variability from sucrose to sorbitol resulted in a subject-by-formulation interaction, and sigma D is about .15. So, the point is that an excipient could also produce a subject-by-formulation interaction.

The last contract study is on methylphenidate. The study was conducted in the 1990s, and the test product was suspected to have poor quality and behave erratically in the clinics. It's a replicate design study, so we analyzed the data recently, using the individual bioequivalence approach.

The table shows the test product not only has a higher T/R ratio for Cmax and also has a higher within-subject variability. It also has a marginal subject-by-formulation interaction. With average bioequivalence, we passed the study, but with individual bioequivalence, we may have rejected the study.

Thank you very much.

DR. LEE: Thank you very much. Sorry to cut you off.

Quick question, Marv?

DR. MEYER: In your old database, page 9, you have 33 percent with a Cmax SxF greater than .15, and some other numbers. Does that imply you would reject or not pass 33 percent of the studies in the old data using IBE?

And then the second question is, under data set 2, page 11, where you have subject 9, to me that just looks like variability because you have, let's say for AUC you have a 727 and a 3680 for the reference given twice, and for the test there's close agreement. So, you have one high, one low. To me that doesn't look like a replicated subject-by-formulation interaction. That just looks like variability in the reference in that subject.

DR. CHEN: Let me answer the first question first. You're saying that if the subject-by-formulation interaction sigma D value is greater than .15, will we reject the study? Is that the question?

DR. MEYER: Yes.

DR. CHEN: No, not really. Because the current criterion is a composite equation, and sigma WD, subject-by-formulation interaction, is just one of the terms in that equation. So, we don't have a separate requirement to say sigma D needs to be less than .15 in order to pass the criterion. The current proposal in the guidance is to treat the whole criterion as a --

DR. MEYER: As a companion you also have your T/R WS SD, 30 percent of the database also had a value greater than 120. So, it seems like two of the components in your IBE are bad, so to speak.

Would you fail a number of those studies? Maybe not all 30 percent, but some percentage? Should they have failed using IBE? Would they fail?

DR. CHEN: This is just to analyze all the data that we have at that point and to give us some appreciation of the performance for the test and the reference products in all the bioequivalence studies. We didn't use the IBE criterion for acceptance or rejection of those studies. Did I answer your question?

DR. LESKO: I'd like to try to answer that question because we have to be careful about an estimated value of sigma D being over .15, and as Mei-Ling showed, the estimated value of sigma D was over .15 in about 30-some percent of the studies. That does not necessarily mean that 30 percent of the studies had a subject-by-formulation interaction. Many of these studies are underpowered to accurately detect sigma D, and there's a possibility that many of those could be due to chance alone because of the low subject numbers in the studies.

So, one of our dilemmas is when we see these high values, and when we start to look at all of these cases, sometimes we can't find any mechanistic reason for subject-by-formulation interaction to have occurred. So, we have no way of sorting out, when the value is large, whether it's real, or whether it's occurred by chance alone because of an underpowered study.

DR. LEE: Kathleen?

DR. LAMBORN: I just want a clarification because if I interpreted what you're proposing to do, which is I think what the question is, the addition of the requirement that the subject-by-formulation interaction be less than .15 -- that's the estimated subject-by-formulation interaction. Right?

DR. LESKO: That's right.

DR. LAMBORN: And so under that criteria, these would have failed. Is that correct?

DR. LESKO: No, they wouldn't have failed because what we're proposing is if one wants to use the IBE criterion, it would have to meet this standard.

DR. LAMBORN: So, in other words, if they had used the IBE criterion for these studies, then these would have failed given the criteria of requiring less than or equal to 15 percent.

DR. LESKO: If the company had come in and said, I want to use a priori IBE for market access, then under those conditions, yes, that would be the case. But certainly there's another route to approval of those products.

DR. LAMBORN: No. I realize that. But we're just trying to understand how this data would have fallen if they had used IBE.

DR. LESKO: That's right.

DR. BENET: Vince, I have a question.

DR. LEE: Les, very briefly, please.

DR. BENET: Mei-Ling, in your analysis of the data set that you call data set 2 but in our tables are data set 3, where you showed that one subject had very high levels of AUC and Cmax, the implication is that that was the reason that this failed IBE.

Have you tested, if you delete that subject, whether the study would have passed IBE? My guess is it will not. Independent of that subject. Have you tested it?

DR. CHEN? I think I tested it, and if we were to delete the subject, this study would have passed IBE.

DR. BENET: Thank you.

DR. LEE: One final question.

DR. ENDRENYI: As suggested two years ago by the expert committee and also before this advisory committee, could various data sets be published on the Internet in detail?

DR. LEE: Who can answer that question?

DR. LESKO: The lawyers, I guess.

(Laughter.)

DR. LESKO: I'll have to check. I don't know. If it's an approved product, maybe. If it's not approved, maybe not.

DR. ENDRENYI: An earlier data set was published.

DR. LESKO: Yes, and that was an old data set, whereas this is a new data set for products that may be under review at the current time.

Vince, if I can make one more clarification for the committee, I think it's important to realize that when a subject-by-formulation interaction appears large, it isn't necessarily the test product that's producing it. It could be the reference product. We have to be careful to not assume that every time you see a large subject-by-formulation interaction, the test product is bad. In fact, in one of the data sets that Mei-Ling showed, which was number 3 on the NDA chart, that one with that large subject-by-formulation interaction, that was the reference product.

DR. LEE: Very well. I think that we should move on to hear about the ANDA situation from Dr. Patnaik. Les, are you available until 3:00?

DR. BENET: I'm available until 4:00.

DR. LEE: Great. Thank you.

DR. PATNAIK: Good afternoon. I am going to present some data from the ANDA side, and as you know, in your handouts there were 11 data sets. We have added two more to those data sets because we received those two additional data sets, so we included that. So, I'll be presenting not 11, but I'll be presenting results from 13 data sets.

As Dr. Chen has already explained, I just put it down in simple words. With average bioequivalence, you evaluate the difference between the test and reference means, and you think that they should be within certain regulatory limits. So, here we are only looking at the difference between the two means.

As a contrast to ABE, the IBE looks at the differences in the mean and looks at the magnitude of the subject-by-formulation interaction and the differences in the within-subject variances. Then you normalize it with the reference variance, within-subject variance, or the regulatory within-subject variance, whichever applies to the reference variability.

So, if it is more than .2, you reference scale. If it's less than .2, you have the regulatory constant within-subject variance to normalize with. This left-hand side must be less than or equal to a regulatory bioequivalence limit. So, it has got three components, as opposed to the single component in the average bioequivalence.

Now, I'll just give you a summary of the studies. These studies were submitted for the approval of generic drugs. There are 13 studies which we will be discussing and the study designs more or less are two-treatment, two-sequence, and four-period crossover designs. These are the two designs, which have been used for these 13 studies.

The number of subjects, starting from the low number of 16 to about 60 subjects, and they are usually a controlled population, mostly young, healthy male subjects.

And there are several types of dosage form which have been studied, immediate-release and modified-release. They are all solid oral dosage forms, and one suspension and one suppository. What I will talk about is only the parent drug. We will not include metabolites.

These are the three bioequivalence measures: the two AUCs and the Cmax.

This is like a global result. What I've done because we're talking about average BE and individual BE, so I just gave a very global view of how much of these data sets pass or fail individual and average bioequivalence. In this column average bioequivalence will pass, and in this column average bioequivalence will fail. In this column individual bioequivalence will pass and individual bioequivalences fail.

As you can see, 11 out of 13 data sets and 12 out of 13 data sets passed both average and individual. Only 2 out of 13 and 1 out of 13 for the AUCs fail IBE while passing average BE.

On the other hand, one data set out of 13 only failed Cmax. None of them failed AUC. None failed both of them, either average or individual. So, all of them passed. None of them failed both criteria.

The lower part shows the range of numbers and data which has been received which has been analyzed, and the results show that. The mean ratio is from 11 percent lower for the test, to over 4 percent higher for the test compared to reference. And the within-subject standard deviation varies from 6 percent CV to over 40 percent CV for AUC, and similar value for AUC-infinity. But for Cmax it had never gone below 11 percent and from 11 percent to about 45 percent is within-subject variability for Cmax.

In terms of ratio of the variances, it varies from 12 percent -- the test variability is 12 percent lower to about 56 percent higher than the reference for test, and the same thing, very large, about 23 percent lower than the reference to about 55 percent higher than the reference.

Here as you can see, the ratios are all over the place, from about 70 percent lower than the reference to about 35 percent higher than the reference. So, it's a very broad range of ratios.

And the subject-by-formulation range is having no subject-by-formulation to a maximum of .2, and that to a very few data sets. So, this is the global picture of the whole 13 data sets.

The next slide. I have just put everything on a bar graph so it's very easy to understand, and maybe it will give a clearer picture. This is the within-subject standard deviation of the reference product. The upper panel is for AUC, the lower panel is for Cmax.

The y axis is the variance term, within-subject standard deviation, and these are the data set numbers on the x axis, and these are what kind of products. I classified them into higher product, one suspension, two slow-releasing products, and five of them are extended-release product, and one is a suppository.

Here you can see that, as we said earlier, anything 20 percent or higher in within-subject variability of the reference, the criterion asked for reference scaling. When the standard deviation is lower than 20 percent, we have to do the constant scaling. So, in this case we have one, two, three, four, five, six, seven, eight. Eight data sets will be reference scaled, and five data sets will be constant scaled.

As you can see, we're talking about highly variable drugs. We call them highly variable when the within-subject standard deviation on CV is more than 30 percent, as Dr. Midha said. So, we have only one, two, and three products, two IR's, and one suppository can be considered a highly variable drug product. This is for AUC.

But for Cmax, under the same reference, we have gotten the same eight data sets will require reference scaling, and another five data sets will be constant scaled. In this case, we have only two data sets which can be considered as highly variable for Cmax.

Now, this looks complicated, but it's pretty simple. What I've done is here in the three panels, I have put the test-reference geometric mean ratio in the top panel, the test reference within-subject standard deviation ratio in the middle panel, and the subject-by-formulation in the lowest panel.

Now, these are the drug numbers, data set number. The left-hand side y axis is the log transformed test-reference ratio, and the right-hand side is the linear, showing 1.04, so the ratio is 1.04.

The reason I did that, when the ratio is below 1, you see a negative number, so anything the bar shows below 1.0 or below 0 in the log scale, the test is lower than the reference, and in the upper part the test is higher than the reference.

So, one can see here that data set number 2 or drug number 2 and drug number 6 have got around .88 ratio. Test and reference is .88, about 12 or 13 percent lower AUC than the reference. So, also the drug number 2.

Correspondingly, one can see for the drug number 2 the test-reference ratio for the variances is about 11 to 12 percent. This test is 12 percent higher than the reference in terms of variability.

There are only two. Number 6 has got a very high ratio. It's about 50 percent of the reference.

Plus these two slow-releasing products, number 5 and number 10, the arrows are showing the ratio of about more than 55 percent of the reference. So, there are large differences in the within-subject variability for these two drug products, and also this number 6.

Now, the yellow and red shows the failing of IBE criterion. So, there are two drugs, number 4 and number 6, failed IBE criteria. Now, they failed here, as you can see, because of the large differences in the geometric mean ratio, the large difference in the within-subject variance, but there is absence of subject-by-formulation. So, these two contributed to the failure of IBE.

In this case, where number 4 is failing, it is because although it is only 4 percent difference in the mean, it has got about 22 percent or 23 percent higher difference in the variance, but it has got a large sigma D, or within-subject variance. So, these two contributed to the failure of number 4.

So, this is just a comprehensive picture of what is happening between the three components for the same drug product.

This one is for Cmax and you can see there are no red bars here, so everything passes IBE for Cmax, and here also there is quite a difference. About 16 percent higher you see in data set number 7 in the test-reference ratio for the mean. And you have one drug product, drug number 2, which shows large subject-by-formulation interaction, but it doesn't flunk IBE because the ratio is not that much. Test-reference variance ratio is not that much. And the test-reference mean ratio is also not very large. So, none of them fail IBE.

To come into the specific examples very quickly, we talked about that one drug which fails average BE. Drug number 2 is an IR product which failed for Cmax. N is equal to 55. And the difference is about 12 percent in the means, and it falls just marginally, and if they would have taken more subjects, probably it would have crossed the 80 percent mark. It passes IBE in spite of this 13 percent difference, as well as it has got a .2 as a subject-by-formulation interaction. It is a highly variable drug, but the ratio of the variability is very comparable, not very large. So, here the reference scaling really helped to pass this IBE. It would also have passed ABE with a couple of more subjects. So, this is why it fails ABE but passes IBE, mostly for reference scaling.

The second example is to pass IBE. Drug number 1 is an IR product with 29 sample size, but it fails IBE. Drug number 4 is the immediate-release. N is equal to about 59. It's for AUC 0 to T.

In one case, these are the two things just comparable. I just put it down to see a comparable observation. They're almost same point estimates, 2 to 3 percent in means. Here in one case there is no subject-by-formulation interaction for drug number 1, but for drug number 4, .2 is the subject-by-formulation interaction. Like drug number 1, it has got very low and similar within-subject variance.

In one case you have got a 36 percent difference in the standard deviation difference, within-subject standard deviation mean, as well as the difference, and here it is 23 percent. So, what is happening, that this 23 percent difference in the within-subject variance, higher, and the presence of this subject-by-formulation interaction, even though it's .4 percent difference in the mean, allows it to fail IBE criteria.

So, these are the behavior performance of the almost similar type of data, showing one passing and the other one failing, particularly for this large subject-by-formulation interaction.

Number 3, which is important, is that it passes IBE. It's a suppository with 57 sample size. Drug number 3 is the last bar in the graph. It's failing IBE. Drug number 6, with extended-release product with 27 sample size. Here point estimate is just like on the dot, which passes. There is no subject-by-formulation in either case. This is a highly variable drug, but here you see the reference formulation has got much lower variability than the test formulation, and that is why the ratio is about 50 percent higher. That makes it to fail.

So, there are two different performances as compared to 2 and 3.

When I looked at these two high subject-by-formulation interaction in this case, that is drug number 2 and drug number 4, which has got a subject-by-formulation interaction, I just wanted to look at each of those data sets. I'll go very, very quickly.

There are three subjects which stand out as abnormal data. Subject number 13, subject number 53, and subject number 38. In one case for the reference, this is the sequence of administration in two different periods. The reference is very consistently low and the test is very consistently high. And in this case also it's also dissimilar.

It doesn't fall in a big pattern in the sense that in one case the test is higher than reference. In this case the reference is higher than the test, and in this case the test is higher than the reference.

Now, once we look at those things, and if you want to look at which one is responsible for this, to a certain extent these affect very marginally, but this subject affects that subject-by-formulation very dramatically.

The other example is for the Cmax. As I said to you, same thing. You have the two tests showing higher than the reference. But in this case there's one treatment out of test and one treatment out of reference are showing abnormal value. Now, which one is the outlier? Is this value an outlier with respect to this, or this is an outlier with respect to this? It's very difficult to say. But this is pretty consistent.

And here also I found out there's marginal effect on removing the sigma D, but there is very dramatic effects of removing sigma D for this.

So, my concluding remark is this, that we have to think about the IBE criterion as an aggregate criterion. We cannot separate the components out. Just to evaluate the performance of the criterion at least. So, the combination of those three parameters, they determine the outcome.

Scaling approaches were seen, and I've shown to you, are particularly helpful for highly variable drugs with very large within-subject variance.

Analysis of the data showed that important subject-by-formulation interaction occurred due to very, very few subjects. At least, it's a very limited number of data sets. But the reliability and the possible cause of such observed interactions need to be carefully investigated. Why it occurs, I don't know. It's very difficult for me to say.

The studies we've received thus far, finally, during this period have utilized controlled populations. We have talked about this. The frequency of occurrence of important subject-by-formulation interaction and the utility of this approach I'm pretty sure will be better understood or evaluated as more BE studies using heterogeneous general populations become available to the agency.

Thank you very much.

DR. LEE: Thank you. I'm going to hold the questions and go right to Dr. Benet, who has been asked to speak on behalf of the scientific community. Les, are you there?

DR. BENET: I'm here.

DR. LEE: Please go ahead.

DR. BENET: My slides will always come delayed, so I'll just assume they're up there.

Thank you for the invitation to talk, and I'm sorry that I can't be there in person today, but I want you to know that you all look very good on televison and I'm enjoying looking at you.

(Laughter.)

DR. BENET: I was asked to make this talk and to select a title prior to seeing the data and information that was provided in the book. So, I had to select a title not knowing what I was going to look at, so I selected this title, and I will talk about that briefly.

I believe that this is the opinion of the scientific community, but it's a group that would be generally favorable to IBE, and that group would say that individual bioequivalence is a promising, clinically relevant method that should theoretically provide further confidence to clinicians and patients that generic drug products are indeed equivalent in an individual patient. I think that's a lofty goal and it would be nice if it was true.

On the next slide, I believe that this is the opinion of everybody. Even today, considering the studies summarized and analyzed by the FDA, the data is inadequate to validate the theoretical approach and provide confidence to the scientific community that the methodology required and the expense entailed are justified. I certainly think that we heard that during the open session.

The next slide I believe would be the opinion of the majority of the scientific community, and that would be that at this time individual bioequivalence still remains a theoretical solution to solve a theoretical clinical problem. We have no evidence that we have a clinical problem, either a safety or an efficacy issue, and we have no evidence that if we have the problem, that individual bioequivalence will solve the problem.

So, that meets the criteria of my title, selected prior to seeing the data.

On my next slide, I have a new title, and that's the title now that I've seen the data. That title is, "Opinions and Recommendations of the Former Chair of the FDA Expert Panel on Individual Bioequivalence."

My overall position is, we don't have a problem with bioequivalence at present, and there is no issue that has been raised that creates a problem that should be of concern to the scientific community in terms of safety and efficacy.

I have maintained for many years that the present plus 25 percent/minus 20 percent average bioequivalence equivalence criteria are extremely tight and that in fact these criteria have sufficiently served us to make sure that we don't have bioequivalence problems for approved drugs.

Now, one way that we look at problems for approved drugs is to see phase IV reports, and I think there is a lack of problems based on this issue. But in reality there have been much more data that has been available that we have never seen because this is a huge financial issue and the innovators have spent tremendous amounts of money and time attempting to show that approved generic products are not equivalent and that they have potential for safety and efficacy issues.

So, these prospective studies, usually carried out in special population subsets, have been carried out to attempt to demonstrate lack of equivalence for approved generics, and of course to demonstrate efficacy and safety issues, but you never see any of these results because none of the studies come out the way that the sponsor would like them to be.

Now, I'm aware of these studies because I've run a bunch of them, and others are aware of them. And what we know is that we have tested prospectively the present criteria numerous times, and there's no issue.

So, on the next slide, I think it's important for us to look at what we are trying to solve, and at least two of these issues have been covered, but the third has not.

The first is the issue that for wide therapeutic index, highly variable drugs, we should not have to study an excessive number of patients to prove that two equivalent products meet preset, one-size-fits-all statistical criteria. And this is part of the driving force of the agency in looking for new approaches that would allow us to approve drugs without studying them in an unreasonable number of subjects where that is required by the present criteria.

The second issue we are trying to solve came about as we were attempting to look at this but was also focused very strongly by the narrow therapeutic index issue drug raised by the brand name industry. For all drugs, but particularly for NTI drugs, a practitioner may transfer a patient from one drug product to another and be assured of comparable safety and efficacy, that is, switchability. So, this is another one of our goals.

And we have a third goal that has not been discussed at all in this advisory committee and that is to give patients and clinicians confidence that a generic equivalent, approved by the regulatory authorities, will yield the same outcome as the innovator product. Not to prove that it does, but to give them confidence that it does. And this is one of our major problems.

Now, I get invited to many conferences that are clinically based, and I am the representative individual that says that the generic product works just as well. Oftentimes I go when the FDA has refused to go, and the FDA refuses to go because most of these clinical conferences are sponsored by the brand name industry for a large fraction of the funding, and the FDA makes the position that this is potentially a setup or a conflict of interest. So, Les Benet gets invited. So, I go to those meetings and I hear all of the clinicians and their very strong concerns about the present criteria and future criteria that we are discussing.

So, in my mind, one of the most important things that we have to do is not only scientifically and statistically prove that these products are equivalent, but we have to have assurance of the clinical community that then is translated to patients that in fact drugs will work the same when they are a generic.

Let's go to the next slide which is discussion of the subject-by-formulation interaction term.

My position is switchability is not a problem for approved generics at the present time under the average bioequivalence criteria. This is based on the fact of the statement I made earlier that our present criteria are sufficiently strict in terms of approval, and in fact, we have no problem. We do have anecdotal reports, and maybe those anecdotal reports are related to a particular switchability, but prospectively those kinds of issues have never been able to be quantitated and demonstrated by the brand name industry. I don't think we have a problem. I think what we have done is create a problem for ourselves by suggesting that we have products on the market that aren't switchable.

Now, I'd like to take two examples from the data that Mei-Ling presented. One is the one I asked the question about. This is diltiazem. The gender effect is on the innovator product, and that gender effect is real and it's a 30 percent difference. There are generics on the market. They probably don't have that gender effect. That was the question that I asked Mei-Ling and Larry. And why? Because we know very well that it is extremely hard to show any difference related to a 30 percent change in plasma concentration that's going to translate into any relevant pharmacodynamic response, both safety and efficacy, and especially for a drug like this that is not a narrow therapeutic index drug.

Now, I'll go on to the third point, and that is, I noted in my meeting yesterday in a telephone conference with the group at the FDA, that when we looked at the new data NDAs, that the high subject-by-formulation interaction terms occur when the reference within-standard deviation is greater than the within-standard deviation for the test. And I particularly asked Mei-Ling if she took out the one subject, would they pass, and my bet is they wouldn't. Mei-Ling says yes, but I would like to see that.

Now, I am very concerned that we have a criterion that basically will fail a generic product because an innovator has high variability, and that's what we have. We have a situation where a product can fail subject-by-formulation interaction because the innovator product has less variability than the reference product.

Now, theoretically we've solved that problem by putting into the equation the difference between the within-subject test minus the within-subject reference. So, there is supposed to be an advantage for the test product to have less variability than the reference. But it is my contention that in fact this turns out to be a negative in terms of sigma D.

I do not agree with Mei-Ling's suggestion that the equation for sigma D has nothing to do with within-subject variability. The equation does not include any terms for within-subject variability. But if we have a reference product that is highly variable and a test product that is not highly variable, it is hard for me to see how you will not have a sigma D that is not influenced by this difference.

And I think this is one of the major problems of the present approach, that in fact you can have a sigma D that is very large because you've got a better generic product, and I think this was demonstrated by some of the other individuals. I was able to hear some of the presentations during the open forum, and I was able to see Professor Endrenyi's presentation. My view is that this is a problem that is not an advantage as it's supposed to be. It's a disadvantage.

DR. LEE: Excuse me, Les.

DR. BENET: Now, I believe in the fourth point it is not reasonable -- and it's not going to happen -- to expect that sponsors will use subjects representative of the general population in their IBE studies. And I don't think we can legislate it appropriately. So, I think we're always going to see some kind of excuses, and if I'm a sponsor, I'm going to do the best I can to have the population be as conforming to a standard as possible.

My view is that the subject-by-formulation interaction term is a red herring. There's nothing valuable about it and we ought to get rid of it because it doesn't solve any of the problems related to the statistics. It creates problems.

When I look down the list of issues where there was a sigma D failure by the IBE, in general those were in situations where you saw the test within-variance being less than the subject within-variance. Now, I'm talking about when there's a difference between passing ABE and passing IBE. You pass ABE, but you fail IBE.

My conclusion is I see nothing to suggest that we have anything useful by including the subject-by-formulation interaction term. I think there's no good data to suggest it's useful and I think we ought to get rid of it.

DR. LEE: Les?

DR. BENET: Next slide.

DR. LEE: Les, I think that we have to sum up.

DR. BENET: No, no. Mei-Ling went forever.

(Laughter.)

DR. BENET: IBE, ignoring subject-by-formulation, should allow sponsors to gain approval for highly variable, wide therapeutic index drugs without using an excessive number of test subjects. This is the reason we should be doing IBE, for this purpose only. And as Mei-Ling and her colleagues have shown in the paper in 2000, it really only becomes useful if you've got a CV greater than 50 percent.

So, my preliminary recommendation is, on the next slide, that sponsors may seek bioequivalence approval using either ABE or IBE and with subject-by-formulation interaction deleted from the equation. If an IBE study is carried out and the test product fails, the data or a subset of the data may not be reanalyzed by ABE for approval.

Now, we have a perception problem that went to the third issue, as seen on the next slide. One of those perceptions is with IBE, that we could possibly allow approval of test products where mean bioavailability may fall outside of the 80 to 125 percent for the reference.

But we also have a perception problem with ABE because we now have a situation that if the products really have reasonable coefficients of variation and they differ and they really do differ, even between 10 and 20 percent, sponsors can get those products approved by just adding, adding, adding subjects. I don't think that is a useful approach.

Now, on the next slide, in March of 1998, I proposed formally this point estimate criteria. And I believe that we need a point estimate criteria. It has nothing to do with statistics. It has to do with credibility of the process. I do not believe that we can go to clinicians and say, these two products on average differed by 30 percent, but they passed our criteria. Therefore, you should prescribe them and you should have confidence. I don't believe they're going to have that confidence, and that was the reason I suggested initially that we need a point estimate criteria.

I now in my final recommendation believe that we should have a point estimate criteria both for ABE and for IBE and that it should be plus or minus 15 percent, as the agency is proposing, for AUC, but higher for Cmax, and that consideration should be given for narrower point estimate criteria for NTI drugs because this is the perception problem.

In my view -- and we have the data that show it -- these are not problems. All the products pass these kinds of things. There are one or two exceptions that don't pass, and so I think it is important. I disagree with Laszlo. I disagree with Kamal. I do believe from a perception point of view that it is important to give the clinical and the patient community confidence that these products do not differ in the means, which is what they understand. They will not understand the statistics.

Thank you.

DR. LEE: Thank you very much, Les. I appreciate your insight, and I think that maybe the committee is ready to vote.

(Laughter.)

DR. LEE: Larry, you have a question.

DR. LESKO: I was going to make a couple of comments, if I can.

DR. LEE: Please.

DR. LESKO: I think Les put a lot of stuff on the table. I can't possibly sort through all of the things he suggested, some of which would involve some of the new methodology.

However, I just wanted to make a statement that the goal of an approval of a generic drug is to approve a product that's similar to a reference product. Similar means it's not going to be better or it's not going to be worse. A patient being switched from a reference product to a test product should expect to have the same safety and efficacy. So, that's just a general statement.

The other thing is, putting aside the subject-by-formulation interaction value of .15 for the moment, we do have data, and Mei-Ling presented some of this with our calcium channel blocker. But we have some other data. Is it overwhelming? No, but we have other examples where there are some subgroup differences in the bioequivalence between the test and reference product when we look at it from a male subject and a female subject standpoint.

For example, one might have a test product that is 35 percent higher than a reference product, and when you begin to look at that, you see that much of that increase in bioavailability is due to the contribution from the male subjects as opposed to the female, or something like that. So, the differences in the bioavailability of the products sometimes will differ with identifiable characteristics of the subjects.

I guess the question that does raise, however, is, are those subgroup differences that we see maybe not necessarily important or unimportant, but are they best addressed through the individual bioequivalence paradigm, in other words, a subgroup difference? And the one Mei-Ling had was identified through a nonreplicated two-way crossover study. So, these things can be identified in alternative ways, but I think they do exist and I think we should pay attention to them and try to get more information on them.

DR. LEE: Thank you.

Let us go to the final presentation before the break, and we do have lots of things to think about. Dr. Machado is going to show us the plan of the FDA.

DR. MACHADO: Good afternoon, everybody. My task is to briefly describe the research plan for bioequivalence criteria, and you have a copy in your packet.

In terms of pertinent background, you know about the guidances that were issued in October of 2000 and in January of 2001. At the Advisory Committee for Pharmaceutical Science, in September of 1999 we discussed the FDA plans for further research and projects associated with the use of ABE and IBE criteria.

The advisory committee endorsed plans for furthering mechanistic understanding of using the IBE criteria, endorsed plans for conducting clinical pharmacology studies, and looking at the influence of outliers on the subject-by-formulation interaction.

At the same time, the committee requested creation of a research document to guide activities during an interim period, and a draft document was sent shortly afterwards for review by the expert panel that was led by Dr. Benet at the time. The draft research document was modified by our Population and Individual Bioequivalence Working Group, and this draft became ready in April of 2000. That is in fact the version that you have in the packet just with the date changed.

Now, the overview of the research program. The overall goal is to provide information to support final regulatory decisions regarding criteria for comparing bioavailability and bioequivalence studies. The research plan has three components. First of all, to further investigate the criteria for bioequivalence comparisons. Second, to study issues related to data analysis and the statistical methodology. And third, to gain greater mechanistic understanding of any mean and variance differences that might be found between the test and reference products and also subject-by-formulation interactions.

Now, replicate design studies conducted by drug sponsors will be the major source of data for our evaluations. We're beginning to be ready to do some computer simulations to beef up our working data set, but right now the major source is from sponsors.

The general guidance recommends replicate designs for highly variable drugs and modified-release dosage forms. And that's been much discussed this afternoon.

Now, as far as criteria for bioequivalence comparisons, in our plan we plan to determine which criteria are appropriate for particular types of regulatory submissions, INDs, NDAs, ANDAs. And for the moment, we are using the ABE criterion for regulatory purposes. As you've seen, we are analyzing replicate design data sets as we receive them -- 25 so far -- and interpreting these in light of the recommendations in the guidance. And the analyses we're doing will add to our knowledge base for evaluating the performance of the statistical and the other approaches and provide support for future decision making.

Also in the plan was that we would identify and evaluate clinically important test-to-reference differences in within- and between-subject variances and evaluate subject-by-formulation interactions. We will assess the importance and impact of the mean/variance tradeoff that's been commented on and look at other outcomes based on selected disaggregate criteria. We'll also study the discontinuity aspect of the individual bioequivalence method and possible resolution.

As far as the data analysis and the statistical methodology project, our main task is to evaluate the methods as laid out in the guidances, and we believe, after many years of work, that they're valid and reasonable. I should say we are open to new approaches, but we see our main task really to evaluate the characteristics of what we have and we've not seen anything so far that would make us abandon those methodologies.

Now, our objectives are to assess the estimation methods in the presence of missing data. That's an important topic that hasn't been touched on. To further assess the statistical properties of estimates of parameters that we're most interested in, and to assess the impact of apparent outlier data on the properties of the aggregate individual bioequivalence criterion.

Other issues that we intend to work on are to monitor and assess possible carryover effects using data from replicate designs, but that depends on the actual drug being studied. And a fairly important objective is to assess the proper numbers of subjects and good study designs for heterogeneous populations that include both genders, possibly different ethnic groups and different age ranges, and consider what information we can draw from these studies.

Now, the third project is the mechanistic understanding. If we do find differences in means and within-subject variances, what might this arise from? And this would be done for the highly variable and modified-release drug products. Also, the subject-by-formulation interaction needs to be well studied in terms of mechanism.

So, our focus for the immediate future is to continue evaluation of the data from the replicate design studies as we're receiving them. In addition to the database that's accumulating, the interim period has about another year to go, and we've received 25 studies. Possibly there will be another 25 coming in over the next year, and that isn't a huge database.

Now, what we seriously will consider is addressing the design issues, numbers of subjects, behavior characteristics of the various statistics. We can do this by computer simulation studies, and this will be based on the information in the databases to get realistic sets of parameters.

We will be evaluating the impact of possibly changing the constraint on the mean difference or imposing a constraint on the subject-by-formulation interaction to study the performance of the individual bioequivalence approach.

And last, but definitely not least, is we shall respond to the recommendations of the advisory committee.

So, finally, to summarize where we are, we see ourselves in the phase of evaluation using these data sets and simulations to understand the performance of the estimation methods for the remainder of the interim period. Just as a note, some of the issues that were laid out in the research plan, which was not changed since April of 2000, were in fact thought about, worked on, and addressed in the guidance.

Thank you.

DR. LEE: Thank you very much, Stella.

There is time in the discussion period to provide you with input on the plan you proposed. So, I'm going to suggest we hold the questions and go to a short break. Please, we will reconvene at about 3 o'clock.

In the meantime, what I'd like to do is to ask to have the four issues shown on the screen, and I think that the first issue is quite straightforward. We'd like to spend lots of time on the second issue, the third, and the fourth.

So, with that thought in mind, please take a break and come back at 3 o'clock.

(Recess.)

DR. LEE: I'd like to reconvene the meeting. I think this is where the fun begins.

Larry has posed four issues to the advisory committee, and those will be shown on the screen momentarily.

I would like to inform the group that we have the benefit of participation from several guests at the top of the table, and I invite them to contribute as they see fit.

Dr. Marv Meyer has kindly agreed to state some positions for the committee to react to, and I would just like to begin by introducing discussion topic number 1. Les, are you there?

DR. BENET: I'm here.

DR. LEE: Good. So, Les, when you wish to say something, just give me a sign.

(Laughter.)

DR. LEE: Discussion topic number 1. Is it reasonable and appropriate for the FDA to use ABE for market access, unless there is a compelling reason not to, for an interim period of another year until a final decision is made to use IBE for market access?

Marv?

DR. MEYER: Vince asked me to kind of give my opinion, and then everyone can shoot at that so that the FDA group won't take all the heat.

I have personally a lot of concerns about IBE. I think some of the data that's been analyzed so far -- for example, Rabi's work showed I guess five studies that they've analyzed, and IBE and ABE disagreed in 40 percent of those. And that's a cause for concern that I have, and I don't see a lot of difference in the products that ABE should not have passed them and IBE should have not passed them. So, I do have some concerns and that's where basically I'm coming from.

My opinion, relative to this discussion topic 1, is that I guess I would object slightly to having another year in that we've had a number of years already, although the data seem to be coming in slowly. So, I would be kind of neutral on another year, but I think that we should definitely continue to use average bioavailability for market access unless a company wants to come in and make a case for IBE. Highly variable, in my view, is the only reason to use IBE at the present time.

DR. LEE: So, Art is ready to have a counterpoint.

DR. KIBBE: I don't know whether I'm counterpointing, but I think we could take the first topic and put a period near the end of the second line. "Unless there is a compelling reason not to," period, and cross out the rest.

I'm not excited about the thought of converting all future submissions to IBE. I don't think that there's justification for that. I think there might be justification for allowing some submissions to follow an IBE methodology.

There are a couple of things that come to mind that we haven't talked about yet, and I want to just put on the table. If the agency goes forward and says that ABE is no longer acceptable and IBE is the method, is it in fact saying that the vast history that we've used ABE to get products on the market are not acceptable and how much retrofitting are we going to have to do? If you remember all of the committee work to get pre-'36 drugs and all the OTCs reviewed, I don't know whether we really need to go back and do any of that. I think we are implying that we might need to if we go 100 percent for IBE.

DR. LEE: Jurgen, do you have an opinion?

DR. VENITZ: I guess I'm one of the scientific community that Dr. Benet quoted as thinking of this as a solution to a theoretical problem. So, I have no problems in saying that the current system, ABE as it is, works.

I'm very much like Marv. I'm neutral about collecting additional data. I'm not sure that additional data would help us to make a better decision next year or two years from now than it would be to do now.

DR. LEE: Thank you.

Dr. Barr has his hand up first.

DR. BARR: Yes. I'd like to take a different opinion I think. First of all, I take issue I think that we don't have a problem in terms of subject-treatment interaction. I think that we are just beginning to appreciate the extent of the problem and we don't know at this point in time how best to study that. Whether the aggregate approach is the best approach or whether an alternative approach is best, I don't know.

I'm also concerned about the aggregate approach, looking at too many things all at the same time to the point you're not sure what the result is when you get it. So, we've attempted to go to the aggregate approach because it penalizes a company if they have to pass three studies, for example, or three criteria rather than one, and that was the reason that ultimately that I think the committee went to the aggregate approach. But we're finding that in collapsing all of that information into one number, that that may not be the appropriate way to go.

But on the other hand, to throw out, again, the baby with the bath water, like we did when we went to the 75/75 rule a long time ago, in which we had a method to look at individual bioequivalence, but it wasn't statistically sound, so we threw it out completely. And we now have no way of looking for subsets. And to go back and make that same mistake again for statistical reasons doesn't make sense to me.

DR. LEE: So, you said it's premature to throw it out.

DR. BARR: Oh, I think it's premature to throw it out and not look at ways to look at the subject-formulation interaction or the subset problem.

The problem of highly variable drugs we've already addressed in at least three meetings that I'm aware of in the past, and we always came to the conclusion that we ought to treat highly variable drugs different than we do normal drugs that aren't as variable and extending the goalposts and allowing those to get through. So, that solution is already there. We don't have to have IBE in order to do that. We do need to address it. But I think that the real issue is how best to look for real subsets.

There are drugs that have been recently withdrawn, for example, a cyclosporine, in which people who eat had different bioequivalence for one product than they did with another. That would be a subset. If people want to look to phase IV kinds of withdrawals, they are out there.

People say that we don't know whether there are any subject-treatment interactions. I recently did a study that wasn't intended to find a subject-treatment interaction that found that there was a significant treatment interaction for levothyroxine products. I went back and found other studies that found the same thing, but ignored it by looking at an alternative way of evaluating it simply because they didn't want to see that. And I think that these things have not been seen because they haven't been looked for.

We certainly wouldn't see the gender effect because most of the studies in the past have been done only in males.

So, I think we ought to be sure not to make the same mistake of throwing that out again and not looking at it carefully.

DR. MOYE: But to pick up on that last point, it seems to me this is a highly unusual way to look for a demographic subgroup effect. There are established stat methodologies which allow you to specifically look for subgroup-treatment interactions, and they don't use this approach. It seems to me this approach is a new novel way to work out an effect that perhaps is not of the greatest interest after all. If we're really looking for a demographic, be it ethnic or be it gender or be it age, treatment interaction, then there are other ways to go with more established methodology with clearer track records than this.

So, I'm all for the development of stat methodology, but I suppose I'm just not clear on what problem, what question this particular stat methodology is trying to address. If it's trying to address an interaction which is a subgroup interaction, then I am in favor of rejecting this for the more traditional, standard approaches for looking at interactions.

DR. LEE: Larry, would you like to respond to that?

DR. LESKO: With regard to Bill's comments about cyclosporine, I think we have a situation there where the problem with the formulation in a physical environment was the issue that was a problem there. That is to say, there was not necessarily an interaction between a subject's physiology and the absorption of the drug as much as there was a problem between the formulation of the product and when you admix it with a food environment, represented by juices basically. So, I'm not sure that's by definition a subject-by-formulation interaction as much as it's a food effect on bioavailability issue.

With Lemuel's comment, I think we sort of moved from an individual subject-by-formulation interaction idea where the methodology looks for a fraction of people in the test population that might demonstrate some unusual behavior with regard to either a test or a reference formulation. We sort of moved from that, which was the original concept of the IBE criterion, to the subgroup effect. And I think we did that because it's very easy to identify the subgroup in these studies where there's a retrospective analysis.

So, it isn't the intent of the approach to look for subgroup differences because I tend to agree with you, there are better ways to do that. In fact, the differences that Mei-Ling showed with the calcium channel blockers and with the verapamil came from non-replicated studies. And one could do that under the current standard of average bioequivalence. But those are the known identifiers that might identify a population who would interact differently with the test and reference formulation.

What this approach was intended to do was to look for other factors that may be related to the range of physiological variables within a subject's gastrointestinal tract that somehow might distinguish between a test product and a reference product in a way we don't understand, although we can hypothesize on it, but we haven't really explored. That was sort of the difference between the subgroup and the individual.

DR. LEE: Kathleen?

DR. LAMBORN: I had sort of two thoughts. One is on the subject-by-formulation interaction and that criterion that was proposed of the 15 percent. My concern is, on the one hand, Les I think is quite right, if we allow things either in terms of the estimated ratio or the estimated subject-by-formulation interaction to be too large, even if they could be due to chance, we're going to have a perception problem which needs to be addressed.

On the other hand, putting in a criterion which says we're going to estimate this and it must be less than 15 percent and then you look at those that we think are really equivalent and you see that because we know statistically that there's a great deal of variability in those estimates, given the sample sizes that we're talking about using, we're going to fail an awful lot of cases. And if we assume that in most cases they are equivalent, then your false positive or false negative, depending on which way you phrase it, is just going to be too large. It becomes an unacceptable situation. So, that was one comment.

The other is I think we have to realize that we're in the situation where we've got small sample sizes and with the individual bioequivalence with replicate design you're talking about further decreasing the sample sizes. So, any thought that we're going to reliably pick up interactions, unless it's just sort of luck of the draw, I think becomes a real question.

So, finally, with regard to the discussion topic 1, I would suggest that the period either be where it was suggested or we simply add, "unless there is a compelling reason not to, for an interim period of another year." But I certainly don't think we're in a position to say that "until a final decision is made to use bioequivalence." I think it would be as to whether or not to use it and, if so, in which situations.

DR. LEE: Thank you.

Dr. Bolton and then Professor Endrenyi.

DR. BOLTON: When we first made these recommendations a year ago, I personally expected to see more data than we're seeing, and we were supposed to do that so we could look at the data and decide what's going on. Well, it's pretty clear that we still don't know very much what's going on.

So, there are a couple of recommendations I would make. One is that we continue to do this the way we've done it for the next year, just to see if we can get something more, until we can make a decision one way or the other.

One thing that I was very interested in -- I know that FDA has taken that topic up with Larry taking it up on looking for mechanisms because these interactions are very fuzzy. I mean, other people have said that too. The .15 is sort of very arbitrary. It depends on sample size. It depends on the assumptions of normality. If you have lack of normality, you can induce some of these things. So, it's very hard to take them seriously unless we can find a reason why they've happened. And I know that's what the FDA is trying to do, and you did it in a couple of cases I saw in the handout.

But I'd like to see an interaction statistically and then tell me why that happened. That should not be difficult to do. If you have a strong interaction, by looking at the formulation and knowing the physiology, one should be able to find that with some degree of reliability. I'd like to see more of that.

And I'd like to see the committee or somebody make maybe new recommendations for this next year based on what we've seen now on what to do about the things that are popping up here.

DR. HUSSAIN: The studies you saw of what Mei-Ling presented is the work we did trying to understand the mechanism of subject-by-formulation interaction.

But before I talk about that, let me share with you a formulator's perspective on this in the sense, yes, we're talking about subject-by-formulation interaction, and if you've identified something, we can correct for that.

With that in mind, we started very simple experiments. We created formulations for three components: water, drug, sucrose or water, drug, sorbitol. Two different excipients, two different attributes. We did the work at the University of Tennessee. We had an hypothesis of what might happen with respect to GI physiology. But when you go through that analysis, even with that simple formulation, it's not easy to identify what the root cause is. In fact, what we had anticipated, I think the mechanism is probably very different from that.

The point I'm trying to make here is this. You can't get much simpler than that formulation, and if we anticipate or we expect we have a mechanistic understanding of the basis for this interaction for complex formulations, I think that's not really feasible at this time.

DR. BOLTON: To answer that, I understand the dilemma you're in, but I think this is an exercise in futility. The whole thing. Because interactions are going to pop up and we're never going to know are they real. We don't have big sample sizes. One person may have caused this. It's going to be very frustrating.

DR. LEE: Laszlo?

DR. ENDRENYI: I would like to follow up on Dr. Barr's consideration about aggregate criterion. At the Montreal meeting, several statisticians -- and they did not include me on the roster -- argued against the aggregate criterion. They suggested that even if IBE is to be studied, it could be done much better by a disaggregate procedure. But to study an issue such as subject-by-formulation interaction, IBE is not needed at all. So, I really question this aggregation of the two issues.

Secondly, I obviously do have the reservation whether subject-by-formulation interaction can be studied from these small sample sizes.

Thank you.

DR. LEE: Thank you.

Bill, you've been pretty quiet.

DR. JUSKO: I think it's pretty clear that the FDA should continue using the average bioequivalence. I have concern that the IBE criteria has a number of artifacts within it and concerns that are separate matters from the underlying science that we want to unravel. I think, as Bill Barr indicated, that there are many opportunities that we should take to try to understand reasons for variability and keep that foremost, but perhaps not throw out the baby with the bath water. More needs to be investigated in this area about variability, but perhaps this criterion has too many faults within it to be used in the manner proposed.

DR. LEE: Are you proposing to hold off throwing out the IBE?

DR. JUSKO: No. It seems like alternatives need to be investigated that allow one to characterize reasons for inter-subject variability in the context of repeated design BE studies.

DR. LEE: So, IBE is not suitable.

DR. JUSKO: That's my impression.

DR. LEE: Avi.

DR. YACOBI: I think we have heard great presentations this morning and this afternoon. I know that many of us think that IBE has definitely use and the use has been as it has been discussed, since the early 1990s, how to do bioequivalence of highly variable drugs and highly variable drug products.

But now also we have had concern and the concern was maybe in the mid-1990s that there is subject-by-formulation interaction. Many of us thought that this is really a theoretical concern, and there were proposals to come up with data in order to prove that this real factor, subject-by-formulation interaction, is for real.

It's very nice to see new data, and my feeling is that we are hearing, even the agency, that it has a fresh look at this subject-by-formulation interaction. While it is there and we are recommending a factor of .15 or greater, but not always that should be a criteria to reject an IBE study.

My point is if subject-by-formulation interaction is not for real and we have not been able to substantiate it, then there is no really need for individual bioequivalence study. The individual bioequivalence study has been proposed from the practical standpoint in order to test the highly variable drugs not in a large number of subjects of 50, 60, 70, 80 or 100, but rather, as I recall, that well, we will do the studies in 12 subjects or 16 or 24, four-way crossover studies, two-period, two-sequence, four-way crossover study in order to come up with data and simplify matters.

So, I hope that we are going to get to that situation where we are going to implement or we are trying to recommend that people will do IBE for highly variable drugs in a smaller group but implement the true IBE analysis because doing replicate analysis without IBE benefit, it doesn't make sense to me.

We wanted to do a highly variable drug. I wanted to do an IBE study. People came to me and said you need somewhere between 54 to about 68 subjects, four-way crossover study. So, I asked the question if I want to do just the average bioequivalence study, how many subjects do I need, and they said about 70, maybe a few more. So, I said, what's the logic of doing the replicate design analysis when I can do it for less with average bioequivalence studies? Because a replicate design study also is going to introduce additional variability in this study, and I feel it is not needed. In some of the studies here we have seen, we are seeing 50-60 subjects in the replicate design. So, from the practical standpoint, I think we have to think about it and we have to put some common sense in what we are doing and how we are going to approach this subject.

DR. LEE: Thank you.

Leon?

DR. SHARGEL: Yes. I'd like to address the first topic about whether it's reasonable to use average bioequivalence. I certainly agree with most of Les Benet's comments.

One thing. Generics have been on the market for over 20 years using average bioequivalence since Waxman-Hatch in 1984 more formalized the approach of ANDAs.

Being in the academic, as well as in the generic arena, I am very much aware that our innovators have looked at differences among the generic and the branded. They have not published and they have not pushed it out that widely because they haven't found as much, and they spent a lot of energy with the products coming off the market right now. Obviously, they're looking at a lot of differences between formulation effects, drug substance effects, clinical effects, and everything else. And we've had these arguments with NTIs as well.

We've also had the arguments going back 20 years, and one thing about being older is that we did originally use normal, healthy males, usually nonsmokers. We were worried about enzyme induction and things of this sort. So, many of these older products were based on the fact that we were really looking at differences between drug performance in terms of bioavailability between the two products, not so much as clinically. The argument was there that we didn't use old people, we didn't use women, we didn't use people who had the disease itself. For example, if we use Claritin, we don't go to people with allergies. We just look at loratadine and generic loratadine in those products.

So, I know that I've always been told, well, just because there are no dead bodies out there, we can sort of find them if we look hard enough.

Subject-by-formulation interaction I think is very interesting. As an academician, I think I'd love to give it to a Ph.D. student and have him look at it or her look at it. And it's something we should know about our products and formulations. I'm in accord with that.

But I'd like to say that the use of average bioequivalence has been very successful, but there is that perception. And I agree with Les Benet because my own mother says -- you know, I'm the pharmacist of the family -- are generics any good? We all know it's cheaper, but that's the issue I guess, that we really want to know here, that we have a quality product that performs appropriately when we switch between generic/generic, generic/brand, or brand/brand.

So, I'd like to put a plus that we continue with ABE, average bioequivalence, and only consider the individual bioequivalence for those cases that we may think it would be more appropriate such as a highly variable drug.

DR. LEE: Let me take three more questions. Then I would like to sum up what I heard. I think Marv Meyer had his hand up, and then Sandy and Laszlo.

DR. MEYER: This is quick. It seems to me maybe we have a nomenclature problem with it that's raised expectations. We talk about individual bioequivalence and we talk about subject-by-formulation interaction, and I didn't hear a single presentation that really identified subject X or individual Y and said this really means for sure that there's an interaction or I know anything about them. I think we would like to think that we're going to somehow identify that my grandmother is going to be different than your 12-year-old son in these studies, but it ain't going to happen. Until we figure out a way to utilize IBE better or to study that phenomenon better, it's not going to be very useful.

DR. LEE: Sandy?

DR. BOLTON: I just have a comment about sample size. Number one, some of the reasons why one passed and the other failed, using IBE and average, might be just a function of sample size. Number one.

The other thing is sample size for variable drugs -- I want to expand what Avi said. I agree with him 100 percent. First of all, you're limited to very, very highly variable drugs, which is a small subset of drugs, and even then I am not sure that we do better on individual bioequivalence. I wish somebody would look into that a little further to see if we really have an advantage with variable drugs using individual bioequivalence and where that cutoff point is. Once we were told it was 30 percent is an advantage. Then it was changed to 45 or 50. My sense is it's even bigger than that.

Finally, I'd like to say one thing about Les' final comment about reducing the limits for a public relation point of view. I'm against that because I think that the generic industry, if they made an effort, could make a campaign to explain in lay words to the doctors and the public that, indeed, these generics are not 50 percent different than the brand name, which many doctors think they are. So, that could be done without having to change the limits.

DR. LEE: Thank you.

Laszlo?

DR. ENDRENYI: Just to clarify on this point to Avi, Leon, and now to Sandy, for highly variable drugs that is necessary and does the job is scaling, reference scaling. It's not individual bioequivalence. It's scaling. And scaled average bioequivalence does a much better job at that. So, I don't see the role in this, for highly variable drugs, of individual bioequivalence.

DR. LEE: Well, it seems to me that there's a consensus to continue using the ABE.

Larry, you would like to make a comment?

DR. LESKO: Yes, I'd like to make a comment.

DR. LEE: Just very briefly.

DR. LESKO: Just briefly comment? All right. That will be harder.

DR. LEE: One minute.

DR. LESKO: I wanted to talk about the current situation, and the current situation as the agency has to make a decision when given an application to review.

We have in our current guidance that sponsors have the option to explain why they would use another criterion other an average bioequivalence. The most logical extension of that is the sponsor that requests to use IBE for a highly variable drug.

We've heard today and some of the data we presented was that the aggregate criteria under IBE gets you to a win under that scenario with many different combinations of numbers representing the means differences, the variance differences, and the subject-by-formulation term. And you can mix those all up and come up with a win with different combinations.

Some of the combinations create concern in our mind where we give a tradeoff on the mean difference with an increase in variability to test and maybe even a subject-by-formulation interaction, and it says pass. That doesn't seem acceptable. So, some of the combinations of numbers don't seem to make sense intuitively to prove a product using IBE.

So, under the current situation, if the sponsor were to come in without any constraints getting to this discussion topic number 2, we would then have a situation where we can approve a product that may differ from a reference product having up to 125 percent of the bioavailability or as little as 80 percent of the bioavailability if there's an appropriate reduction in the variance of the test product.

We can also have a product which we might reject that would have 90 percent of the bioavailability, but we would reject it because the within-subject variance for the test product is a little bit higher than the reference.

So, it gets kind of confusing. But the point is, without constraints, I'm concerned that we'll be in a position to make a decision on a product that has a different bioavailability than the reference and may even be exhibiting a subject-by-formulation interaction when they scale it and it'll pass. That's why we put in the constraints.

And there's something illogical about that. We created a method where we all agreed, at least in 1999, to look for subject-by-formulation interactions. Now we have a method that's picking them up, and we're saying let's pass the product.

So, I think we need a constraint under the current situation if we're going to implement the individual bioequivalence in our current guidance. If we're going to retain the scaling benefits of that equation, which we can do with the constraint on sigma D, it will make it a bit harder, but you can still retain the scaling benefits of the equation. Then I think it makes sense to put that constraint in there.

I think also we want to bring the differences in the test-to-mean ratios down to 15 percent, and there is a sort of quasi scientific reason to do that. It's to pretty much bring the differences in mean under the average bioequivalence scenario that we would approve under average in line with the IBE so that at least in the short term, we don't make any decisions we might regret in the long term when we have more data and make a final decision on using IBE for the marketplace.

So, I think that's why the constraints are important, and because when we leave today we have to make a decision on that guidance in the face of these replicate design studies, I think we have to come to some resolution of that because if you say don't put any constraints on there, then we're going to be faced with a difficult decision of making that decision for the marketplace.

Now, if you think a bit further, if we let this occur with generic product number 1, using scaling and using these larger mean differences to occur, what do we say about two generics in the marketplace? Are they going to be more inequivalent than they might possibly be under an average bioequivalence scenario? Well, I don't think we want that. But this criteria without constraints I think will create that probability that two products on the generic side could be more different than they might be under the average bioequivalence scenario.

DR. LEE: Discussion topic number 2.

DR. LAMBORN: Could I ask a clarification question?

DR. LEE: Yes.

DR. LAMBORN: The comment was made that we can do scaling using average bioequivalence. In today's environment with the existing guidance, is there a scaleability criteria outside of the individual bioequivalence situation?

DR. LESKO: I don't believe we've explored that. We'd have to explore that as a possibility.

DR. LAMBORN: So that the statement that that could be done, it is not currently part of the guidance.

DR. LESKO: It is not currently part of the guidance, and our working group has not spent a lot of time looking at that.

DR. LEE: So, I was going to say that there is consensus that we continue to use the ABE. What I heard around the table is that there is a lack of consensus about what IBE is all about. Dr. Moye suggested there are other ways to look for that, and there's some suggestion we should throw out it entirely. There's some sentiment that maybe it's premature.

Are we ready to propose to consider the option until we understand under what conditions would IBE be appropriate for market access?

DR. BARR: Excuse me. Are you asking whether or not we think that IBE ought to be allowed as an alternative criteria or whether or not it ought to be allowed to be continued to be studied? What is the question, Vince, that you're asking us?

DR. LAMBORN: Is this question 2, discussion item 2 that you're on now?

DR. LEE: No. Question number 1 is that we need to come to some decision, provide some advice to the agency about how they should proceed. The proposal is should ABE continue to be used for another year until a final decision is made to use IBE for market access.

DR. LAMBORN: Could I suggest that in order to answer question 1, perhaps we need to discuss discussion item 2 because I think that what's being expressed is a concern about -- there's an implication in 1 that we would continue to allow IBE to be used for the exceptions. And I think to say that we would continue to study with replicate designs, implying that they could use the IBE, I think we need to address Larry's concerns that he just raised. So, I would propose that need to address discussion item 2 and then come back to the vote.

DR. LEE: Okay.

DR. KIBBE: Larry, just getting back to the concerns you raised, I only spoke to item 1, and my issue basically is I think one year from now I'm not going to be comfortable converting everything over to IBE.

DR. LESKO: We're not proposing that. No. We haven't proposed that we're going to convert everything to IBE. We recommend replicate design for two classes of drug products.

DR. KIBBE: The statement, if I read it, reads that we will do it for another year until a final decision is made to use IBE for market access. My point is the statement ought to read that we're going to use ABE for market access unless there's a compelling reason to use a different system.

And then my question to you is -- and I'm following up on what Kathleen has said about topic 2 -- are the criteria that's currently listed in topic 2 good enough, or do we need better ones, in your opinion, than that in order to make IBE a viable alternative to ABE?

DR. LESKO: To clarify the first point, because I think it's important, the context for discussion topic number 1, is the current guidance in which we recommend replicate design for two classes of drug products, modified-release, and highly variable. I think in 1999 and then when we subsequently put out the guidance, we made the decision that we would not recommend replicate design for the other classes of drug products, and hence IBE would not be the way to market access. So, that's discussion topic number 1.

On discussion topic number 2, we think those constraints on the IBE criterion would make us comfortable to approve a product on IBE, which would include a measure of scaling, but it would exclude approving a product that deviated in its mean ratio to a degree greater than we currently allow under average bioequivalence. It would also signal to us that if we had a high value for sigma D, which could indicate a true subject-by-formulation interaction or perhaps a group-by-formulation interaction, that would not be then an IBE criterion for market access. One would go back and use average bioequivalence if it passed under the criteria.

DR. LEE: Well, I guess the discomfort is that there's a perception on Art's part that the IBE would eventually be replacing ABE.

DR. LESKO: It's looked at as an alternative for a sponsor to make a choice a priori whether they want to use average and IBE. We don't envision it as a replacement for average bioequivalence, at least not at the present time. In each case, whether one picks the average or the IBE, there's going to be both a producer of risk of success and failure and another risk of success and failure in terms of a patient risk.

DR. KIBBE: Your concern about criteria was that you thought you heard us saying that we were going to change the criteria as listed in 2?

DR. LESKO: No. That wasn't my concern. The criteria listed in number 2 is what the working group is recommending for consideration as the prerequisites to utilize IBE for market access.

DR. LAMBORN: To clarify, I think some of the items in 2 would be changes from the existing guidance. Is that correct?

DR. LESKO: That's true. The existing guidance, for example --

DR. LAMBORN: So, the issue is are we prepared to support the proposed changes in the existing guidance.

DR. LESKO: That's correct. The main changes on the test-to-reference ratio is constrained to 15 percent rather than 20 percent. The current guidance does not have any constraint on the value of sigma D, subject-by-formulation interaction. All of the other things on there, the other four bullets, if you will, are in the current guidance. That's nothing new. So, there are two new bullets on there compared to the current guidance.

DR. LEE: Laszlo, are you going to help us out of this dilemma?

DR. ENDRENYI: On discussion topic 1, if it would state, as already suggested, that is it reasonable and appropriate for FDA to use average bioequivalence for market access unless there is a compelling reason not to, period, end, I think that would still permit the investigation of IBE under discussion topics 2, 3, and 4.

DR. LESKO: That's logical to me. It's removing a time frame.

DR. LEE: Is the committee comfortable with that?

DR. KIBBE: Yes.

DR. LEE: So, we just put a period where?

DR. KIBBE: After "to."

DR. LEE: "A compelling reason not to."

DR. KIBBE: Period.

DR. LEE: And then period. That would still allow us to go and discuss item number 2.

Discussion topic number 2. Yes, Laszlo?

DR. BOLTON: I just have to a question to clarify. Are you saying that you have an option here? If it doesn't pass these, you can use average bioequivalence. If that passes, then you're stuck with this.

DR. LESKO: No. We're not saying do the study and play a winner.

DR. BOLTON: Yes. If you choose this, you must pass.

DR. LESKO: The guidance is very specific in saying that the sponsor should choose a priori in their study protocol which methodology they're going to use.

DR. BOLTON: And they will use these criteria as new criteria.

DR. LESKO: That's correct.

DR. BOLTON: Okay.

DR. LEE: Now we're on discussion topic number 2 on the criteria.

DR. ENDRENYI: Could I take a rain check because the gentlemen handling the slides just went out?

DR. LEE: All right.

Please.

DR. ZARIFFA: I'm looking at discussion topic number 2, and I'm framing it in my mind as how do we collect more replicate design data sets while disallowing concerning patterns under IBE? So, there are two points that follow from that. The first is, how much more will we gain from an additional 10, 20, 50, X number of replicate design data sets? And two, do the additional constraints that we're putting on to disallow concerning patterns actually make sense?

So, there are two pieces that follow on from the question. The first piece has to do with what is the value of the additional data, and Marv asked this earlier. Don't we know enough? Don't we have enough? Haven't we simulated enough? And that comes to discussion topic 4. So, I'll leave that to one side.

The question of whether or not the additional constraints make sense in the short term -- we're talking about possibly just a year -- is something that we should keep in mind. Personally I was swayed by the arguments that Laszlo and Kam put forward regarding the geometric mean ratio, and I would hate to see this community take several decades back in time by going to essentially what comes down to look at means in small data sets. I don't like that.

And the question about the constraint on sigma D being .15, it's been demonstrated over and over again that that is not valid under a number of different assumptions which arise quite naturally in practice.

So, those would be the two points, and the rest I'll table for discussion topic 4.

DR. LEE: Okay. Let me take the chair's prerogative and put the microphone back to Marvin Meyer for us to hear his opinion.

DR. MEYER: From what I understand, we have something like six bullets under topic number 2. I don't think there's any debate on it should pass IBE criterion for a study done under IBE, although I'm not real clear what criterion we're going to use, but whatever that must be, then we will use it.

24 subjects is fine.

I think there's debate whether there should be no significant subject-by-formulation interaction. That shouldn't be a reason to dump a study, I wouldn't think, if it's above .15. Rabi showed some data that suggested that didn't mean a heck of a lot.

A constraint? Personally I believe we ought to have one. Laszlo, I think it was, presented some data. Les recommended I think a 20 percent for Cmax. Some constraint. Now, whether it's 15 percent, it's 20 percent, I don't think we want to go above 20 percent, and maybe not above 15 percent because I think the perceived differences -- now, we're going to have to set ourselves back perhaps, but at the same time, we don't have to worry that the agency has approved some products that shouldn't have been approved because we have too lax of an approval process. I think we can expand that, make it up to 20, 25 percent, if necessary.

I object a bit to the heterogeneous population. If you think about it, what does that really mean? That means blacks/whites, males/females, old/young. That's eight permutations. With 24 subjects you could have 3 of each of those subgroups, and I don't know what that will tell you. So, I don't know we're going to achieve that objective. I wouldn't think we should allow all young, healthy males. We should have a little more diverse population, but to mandate some prescribed heterogeneous population I don't think will work.

So, those are my comments.

DR. LEE: Laszlo, you want make a comment?

DR. ENDRENYI: First of all, I would like to repeat Kam's plea. That was the most important one. Do not introduce a new regulation until you've studied fully the science, please. So, thinking of new criteria before they have been studied I think would be deadly, disastrous.

Slide 9.

DR. LEE: And this slide would address topic number 2?

DR. ENDRENYI: Yes.

DR. LEE: Okay.

DR. ENDRENYI: As already indicated, I'm very strongly against the 85-117 percent limitations. As Kam says, that takes us back. Furthermore, I believe, as far as I can make out without additional studies, it will be not an individual bioequivalence criterion, but a GMR criterion like Canada for Cmax.

We haven't talked about modified-release formulations. The sigma D criterion. The .15 is not appropriate. It's true that in the model it is sigma D and sigma W -- that's the within-subject variation -- are independent. When they are estimated, estimated interaction, estimated variance are not independent. They are directly related, linearly related in fact. So, a simple set criterion is not appropriate. It will do absolute injustice to highly variable drugs.

Furthermore, there are some other problems like sensitivity and what we already talked about, being able to have the sensitivity to be able to detect an interaction in small groups. It has a problem, but there is a basic problem with the sigma D for .15.

We haven't talked about modified-release formulations, and I think there are some basic points here. The modified-release has subgroups. Delayed-release with lag time, usual kinetics; extended-release, usual kinetics, slow absorption. For these, there is no reason whatsoever to require replicate design studies. For sustained-release, controlled-release, there may be for investigational purposes. But why?

DR. LEE: I think we got your point.

DR. ENDRENYI: Actually there was one other point.

Replicate design. And I think we talked about individual bioequivalence, but there is also a point about the replicate design study. Why do we want it? My sense is that we want it apparently for the sake of data collection. Question: For regulatory purposes, is this a need to know for regulatory approval or is it nice to know to get data? It would be useful to clarify this point.

Thank you.

DR. LEE: Thank you.

May I have the committee express the opinion first?

DR. BENET: Vince, can I make a comment?

DR. LEE: Yes, Les.

DR. BENET: I want to come back to both what Nevine said and what Laszlo said about the GMN and Kam's position on the point estimate, the GMR. Basically we are not asking for any new criteria. This is not an untested criterion. Nightingale and Morrison in 1987 looked at 224 products, one of which was out of plus or minus 15 percent. Gene Haney summarized a couple of years ago since then what the rule of change -- none of them were out of plus or minus 15 percent. So, we're not adding any new criteria because the present criteria have always maintained it within that area.

Why I want plus or minus 15 percent on the IBE is because exactly in opposition to what Nevine and Laszlo and Kam said, this is new. We are doing something new with IBE. We are not doing something old. So, it is not that we're doing something that was different than the past; it is that we have a new way that we're going to approve drugs.

And I think it's important, as a number of other people have said, to make sure that the clinical community and the patient community -- I know Nevine, as a statistician, says that's not important, but I can tell you it is important and it's important for the people in the United States that they believe this.

And Sandy, you're crazy if you think that the generics can get these clinicians and make them believe because the generics don't put the money into the pocket of the clinicians. So, you've got to deal with reality.

And I do not believe that this is something new. I believe it's exactly what we've been doing in the past.

Thank you.

DR. LEE: Thank you, Les.

I think that we do need to move along, and I would like to ask the committee to express their opinion about topic number 2. Of all the criteria proposed, which one might need some more discussion?

DR. LESKO: Vince, could I clarify something?

DR. LEE: Sure.

DR. LESKO: I'll give it to my colleague.

DR. HUSSAIN: Well, I think the constraint on the data Professor Benet talked about was essentially historical data that we have looked at. Mean differences between approved generic products and so forth are very tight. I think that's what he was referring to.

DR. BOLTON: Can I just say one quick thing? If you start adding these restrictions -- I'm against adding those restrictions -- then the whole properties of this metric are changed. So, now we have to reevaluate what that metric really means with these new conditions. I don't think it's fair to just arbitrarily do it, to just throw it on there and say, well, that's good. You're making up numbers. That metric came from scientific basis, whether we like it or not, and now we're making it a completely different thing. It's not the same anymore. So, why not come up with a different criterion that makes more sense to everybody?

DR. LESKO: Vince, I think it's important to clarify one other thing, if I could, on this debate about the equation. I've heard it a couple of times, but I still don't understand we're going back 10 years.

But that aside, if you think about the IBE equation, what we're saying is we're putting a constraint on what's in the parentheses comparing the mean of the test to the mean of the reference. We're not changing the right-hand side of the equation. The right-hand side of the equation stays as it is, natural log of 1.25 or whatever.

Laszlo made the comment in his presentation, by putting a constraint on that, that's going to eliminate some of the width that would be allowable for scaling, but you did say you're not sure whether it would be a GMR or would it be a true scaling.

Now, if we converted that to a linear scale and we have to do that, I don't know what the tradeoff would be by putting in a constraint on that one part of the IBE. I mean, we do it now. We have a 20 percent constraint in our guidance on that parentheses, and what we're saying is let's make that difference in the parentheses 15 percent, not changing another part of the equation. It may change the properties of the equation. We can explore that, but I don't think it changes them significantly.

DR. BOLTON: [Off microphone] how that changes. Do a little study and then say, listen, doing this doesn't change things very much. It might be more appealing, but we don't know that.

DR. LEE: May I consult with the statisticians on the committee? Yes, Kathleen?

DR. LAMBORN: What is the statistical question you were going to ask? I was going to comment on something a little different.

DR. LEE: Whether or not this is statistically sound.

DR. LAMBORN: I'd like to sort of split this thing into two parts. I think if the statement is in moving from average bioequivalence to individual bioequivalence, we don't want to allow to pass products that are further from a ratio of 1 than we had before, then I would say that's just a comfort level with regard to what we're doing. Clearly by adding an extra constraint, it will reduce the likelihood that something is going to pass. From the sounds of things, it shouldn't make a case where something that would have passed under the old rules would not pass now because under the average bioavailability, they're passing anyhow.

I guess the thing that I'm coming down to is the agency is seeing, now that they've had a year of experience with the individual bioavailability, that they're not comfortable with the guidance as it stands. It's almost like we've got a choice. We either say withdraw the option of using IBE until it's been studied more fully, or put some constraints on it so that there's a comfort level until you've had a chance to do the additional study to see what the impact is.

But I think clearly if the people who are seeing the data coming through are not comfortable with what they're seeing and feel that it could potentially be allowing something unsafe through, something has to change. So, that's partly a statistical answer and partly just my own personal opinion.

DR. LEE: Does your colleague next to you have a comment?

DR. MOYE: If I understand the statistical question, I would say that this new methodology is unsound for identifying what it has claimed to identify, that is to say, for identifying a subgroup-formulation interaction.

I would say that it is sound methodology to identify something that so far, to my knowledge, hasn't been detectable, and that is this notion of a subject-by-formulation interaction. So, if we're looking for demographic and subgroup interactions, then I think this methodology should not be used.

What it has been specifically designed to evaluate is an effect that I understand has not yet been identified and that is this ephemeral subject-by-formulation interaction that is exclusive of, separate and apart from ethnic or gender formulation interaction.

DR. LESKO: I'd like to respond to what Kathleen said. Without trying to rephrase it, I think she put it in perspective. It's exactly what we're worried about and it's exactly why we want to put the constraint as we've suggested it.

I just did a quick look also down the table of data that was new that we presented to the committee under the ABE column, which shows the ratio of test to reference means. They're all very tight. We're not even close to 15 percent on any of them. So, there is a lot of worry about this, but the reality may be we may never invoke it because data to date in these 21 or 22 data sets don't create a problem necessarily in this respect. But I think going forward, we want to assure that we don't have to face that problem if it were to come up.

On the second point Dr. Moye made, I think again these studies are powered to demonstrate bioequivalence and not to demonstrated differences in the subgroup. So, what you said is true, but again just going back to the primary hypothesis, it's to show equivalence between two products, and then the subgroup analysis is kind of a secondary feature of the methodology and maybe it's a hypothesis-generating feature of it.

DR. LEE: Jurgen?

DR. VENITZ: Yes. I'm looking at the criteria. Personally I don't have a problem with putting additional constraints on the ratio. As you just did, looking at the data that you included with the handout, none of the products would have failed on that criteria. So, it might not be something that would place an undue burden.

As far as the interaction term is concerned, I still think we should get rid of it, meaning not even include it in the model, let alone putting some constraint on it, given the fact that we don't really know what it means, statistically it's very difficult to estimate. So, putting a constraint on something that we don't know what it is and we don't know how precisely we can estimate it, that makes no sense to me. So, I would just rather get rid of it.

DR. BARR: Can I respond to that?

DR. LEE: Bill?

DR. BARR: I want to respond to the idea of getting rid of the SF term entirely. If you go back to the whole concept of individual bioequivalence, it really came about -- the concept of switchability, and the idea that on the average, these drugs are, in fact, bioequivalent but may, in fact, not be bioequivalent for a given individual.

Then the question comes up, how many individuals does that mean? Is it 1 out of a million? Is it 1 out of 1,000, 1 out of 10? We need to find out if, indeed, there are, say, 10 percent of the population that, when they take two drugs that are not interchangeable, is that relevant. Is that something that the clinician or the professions ought to be concerned with? And I think most people felt that it was. What number you come up with, whether it's 20 percent or 1 percent, or whatever, we really hadn't decided on, but in terms of that being something worthwhile to investigate, it was considered to be important.

Now, what happened is that we never had any examples of that until somebody decided that we ought to look for it.

Now, what has happened, as I understand it from the people at the FDA, is they have looked for this. They are seeing examples of it. We're seeing that whenever you do the replicate design -- and that's the only way that you can find that information is to take the replicate design and find out that when you give one individual one drug product and you come back and then give them that same drug product again, that that is always higher than if you use another product and you give that. And the only way you're going to find that out is from replicate design. I don't know how you'd find that out unless you have the subsets already identified. Until we identify those subsets, we're going to have to look for that somehow.

Now that we've looked, my understanding is we have identified some subsets. We have found out that women are different than men for some drugs. We have found out that elderly are different than young for some drugs. We've seen an example with some of the drugs that we've looked at that we think is due to transit time. There are many drugs -- in fact, the ranitidine example in my mind was probably transit time dependent drug, in which people who have short transit times will not absorb certain kinds of drugs and certain kinds of formulations, depending on the formulation, as well as they will other formulations.

So, I think to throw that out and to not continue to investigate this would be scientifically and I think clinically incorrect.

DR. BOLTON: I'd like to pursue that too. If the purpose is to collect data on these replicate designs so we can see what's going on, if we put all of these constraints on there -- I think you mentioned that -- people are not going to want to do the study. I mean, if somebody came to me and said, do a replicate study because you have a variable drug, and I see all these constraints, I'm going to say, you know, it's not worth it. So, I think it's going to reduce the amount of data you're going to get in and you're not basically approving it on that point.

One of the selling points of the individual bioequivalence from the very beginning was this tradeoff. If you make a very good drug and it's less variable, you'll have a better chance of passing, et cetera. We're just putting it aside for our convenience in a certain way, and I can see your point why you want to do it, but I think we're barking up the wrong tree. We're taking something that we produce for a certain reason, good or bad, and now throwing it around to make it convenient for us. It bothers me somehow.

DR. LESKO: Again, we didn't pull it out of the air. There was some logic to it. Again, when I go down the table of ANDAs and NDAs, if we look at the table of ANDAs -- I think you have the same one on the committee that I have -- and I add up all of the metrics in all of the studies that we provided you, there were only 2 out of 39 cases where sigma D was over 1.5. So, I don't think it's an onerous standard to have.

If you look at the NDAs, only 4 of 26 of the metrics had a sigma D over .15. So, it's not like we're going to use it in a way that would disqualify a large number of submissions that had requested this methodology. So, again, in light of the data we have, it would seem to make sense, given the frequency of something over .15 to look at.

DR. ZARIFFA: Larry, I think you're forgetting all the old data where Mei-Ling herself showed 33 percent and 39 percent in the FDA old data, and the industry old data actually had data sets which exceeded sigma D .15. So, what you have now is actually considerably less data upon which to make that assessment.

DR. LEE: Kathleen?

DR. LAMBORN: Could I go back? You were asking for statistical comments. Whereas I'm comfortable with the mean test-reference ratio because that's retaining something that you would have expected them to pass under average, I think recognizing the variability in the estimates and the fact that there is definitely an indication, I would concur, that there are too many instances where something historically would not have passed, I am uncomfortable with that constraint on the subject-by-formulation interaction. I think statistically that's just too poor an estimate to be basing much on.

DR. LEE: So, you agree with Les Benet's proposal to throw it out?

DR. LAMBORN: I would certainly throw out the constraint. I'm not as convinced that we should throw out the parameter in the model itself. I think that the whole concept that's been put together of the model -- I think that we shouldn't be sort of taking bits and pieces and throwing them out. So, I would propose that we leave the model as it's been studied. I would propose that we do not put a constraint on that subject-by-formulation interaction if the overall model meets the criteria.

DR. LEE: Yes, Marv?

DR. MEYER: I agree with Kathleen totally. I would look at the ANDA data, study 4 I believe. This was 59 subjects, and it looks like the point estimate was 1.04, confidence intervals -- these are ABE data -- 97 to 110; within-subject variation, .27 for test, .22 for reference; between-subject, 1.13 and 1.20. To me, all those numbers sound just about the same. Yet, there's an SxF of .20. So, that study would fail if we had a constraint.

And if we look on the next page of Rabi's handout, he goes into it commendably. I think what we need to do is to look subject by subject. And he shows subject 13 where both references are low and the two tests are high but quite variable.

Yet, he also shows a case where the two references, subject 38, are quite high, and the two tests are more reproducible and quite low.

So, we have two that go one way and one that goes the other way, and a .2. And yet, everything else that we talk about looks okay. And to put a constraint of .15 and not pass that study I think would be a tragedy for the sponsor, as well as the American public.

DR. LEE: Well, is that a position that this committee is comfortable recommending? Yes, Laszlo?

DR. ENDRENYI: It seems to me that the sense of the discussion topics 2 and 3 is whether the committee encourages FDA to study IBE on the one hand and to apply it for market access during an interim period, and for discussion topic 3, whether the committee encourages FDA to conduct replicate design studies for various purposes. I think that is a proper role of the committee.

I think it would be a disaster if the committee would set regulatory conditions, regulatory criteria, details, based on rather slender evidence, what we have got now. So, I think the committee shouldn't go into the detailed points.

DR. LEE: We're on topic number 2. I would like to come to closure.

DR. YACOBI: One little comment because I think we ought to be consistent and also we ought to be concerned about what we are going to tell the public. On the one hand, we tell them that there are no issues, they have no problems with the present bioequivalence testing, and then we say we want to introduce the new method, which is IBE, and that we are going to constrain the ratio to plus/minus 15 percent. My concern is, what are we going to tell the public? To say, okay, plus/minus 15 percent is better. Why don't you apply it to the rest of the bioequivalence testing? I think we are going to send the wrong message to the public, and the public is just going to be more concerned about generic products.

DR. LESKO: I guess I don't see an inconsistency because what it amounts to is the availability to get into the marketplace using either an average or and individual bioequivalence approach. With a constraint on the IBE, what you said about the public, I guess my answer would be for the other drug products one could not get approved with a difference of 15 percent. There may be an exception, for example, with an extremely low variability drug and a high overpowered study. You might have 15.1 percent and get approved.

So, I don't see an inconsistency in explaining this to the public. I think in either scenario, whether it's average or IBE -- in fact, I think this is the benefit of putting a constraint on here -- the differences between a test and generic product are virtually the same on the mean ratio.

DR. YACOBI: Larry, I accept your point based on the science. I accept it fully, but I think the average consumer is not going to be sufficiently knowledgeable in statistics to figure that out. They're going to remember two numbers. One set of products are being approved based on plus/minus 15 percent; the others are going to be based on plus/minus 20 percent. And I don't know how we are going to help them out to figure this one out.

DR. LESKO: I think we're not approving in the IBE case on a plus or minus 15 percent. That's a constraint on the part of the equation. We're approving on the aggregate equation. I guess in some ways, depending on the numbers, one could argue it might even be better to tell a consumer that we know more about this product approved in this way than we do in the current way. So, it could be argued it might be an improvement in the information that that study would provide us on variance and maybe the absence of a subject-by-formulation interaction.

DR. LEE: Bill, do you want to make the last comment?

DR. BARR: Just one comment. We're not looking at a point estimate when we're saying plus or minus 20 percent. That was a confidence interval. So, those are different. In fact, in most cases, as Les pointed out, the actual ratio was in fact less than that. The point estimate was less than that. So, I don't think that we're doing anything inconsistent.

I think the plus or minus 15 percent is good for the confidence because if you tell people you have plus or minus 20 percent with a point estimate, then you can go through the exercise and show that there could be some drugs, when you go from A to B to C, that A and C may differ as much as 50 percent when they're both being compared to B. And I think that that kind of information is very detrimental, and this prevents it.

DR. LEE: Larry, let me ask one question to clarify. IBE is an option for the sponsor.

DR. LESKO: Yes. It's an option for them to select it.

DR. LEE: So, were the sponsor to choose the IBE as the criterion for market access, these are the conditions that you would like the study to be --

DR. LESKO: That's correct.

DR. LEE: So, is the committee ready to provide some guidance to the agency? Kathleen, you seem to come up with a modified version, and I would like you to repeat it.

DR. VENITZ: Vince, can I ask you something? Do we have to come up with a unanimous recommendation, or are the comments that we've generated so far going to be helpful to the agency? Because I do agree with Laszlo's comment that I don't think we should be in the business of setting regulation limits or specifications. But I think you received multiple comments, some of them in agreement, some of them not, and you take it from there. You meaning FDA people.

DR. LESKO: I think what I've heard, if I can sort summarize what I've heard, is that in discussion topic number 2, there was more of a consensus to constrain the mean and there was not a consensus to constrain the subject-by-formulation interaction.

DR. LEE: What I would hope to do is to get a sense from the committee. I think Marv mentioned about the heterogeneous population. I'd like to have a little discussion about that. It seems to me that this is something that my own experience with clinical studies is through my experience with NIH. When we write proposals now involving clinical components, we've been asked to make sure that we consider that particular issue. What do you have in mind, Larry, when you said heterogeneous population?

DR. LESKO: I think in many ways it's separate and apart from how we assess the bioequivalence of a product. I think we've made the recommendation, whether one uses average bioequivalence or individual bioequivalence as the way to get to the market, it's desirable to have a heterogeneous population. I think it better assures the extrapolation of the bioequivalence from the test situation to the general population, and I would think it would be an improvement on the science. So, it's a recommendation in the guidance. It's not a requirement. We strongly recommend it for either average bioequivalence or individual bioequivalence.

DR. LEE: So, we will talk about the next two points. Should we just move on? About the 24 subjects, what is the committee's --

DR. LESKO: That currently is in our guidance right now.

DR. LEE: So, are we ready to talk about this topic number 3? Good. What is topic number 3?

Are there scientific, technical or other reasons not to continue with the recommendations in the General BA/BE Guidance to conduct replicate design studies for modified-release dosage forms and for highly variable drugs, and B, to use a heterogeneous study population, at least 40 percent male and female, and/or young and elderly subjects?

Marv, what is your opinion?

DR. MEYER: Two opinions. First of all, I'm not sure what the other group is if you have 40 percent male and female, what the other 60 percent are.

(Laughter.)

DR. MEYER: But anyway, that's English I guess.

My personal belief is that replicate designs for modified-release will generate more data than if you have it excluded, but it's not really necessary. It ought to be applied only to those cases such as highly variable where it will do some good because, for example, theophylline. That's very reproducible, clean kinetics. It would be kind of a shame to have to do a four-way crossover when you could do a two-way just as easy on a modified-release.

DR. LEE: Jurgen?

DR. VENITZ: I'm trying to understand topic 3, and I guess those are recommendations. That means the sponsor can choose not to follow.

DR. LESKO: Generally, that's what all guidances say.

(Laughter.)

DR. VENITZ: So, with the word recommendation here, it really means that it's a mandate.

DR. LESKO: No. That's a recommendation. Well, let's put it this way. A study would not be rejected if an attempt was made to include a heterogenous population and it wasn't heterogeneous, but the reviewer wouldn't smile.

DR. VENITZ: Currently the recommendation then would be that for any modified-release dosage form, a replicate design should be used.

DR. LESKO: Again, discussion topic 3 is consistent with our current guidance. We're not recommending any change. We're asking is this a sound recommendation that we've carried forward from 1999 to our current guidance. Now we're going forward again. Is there anything in the new data? Is there any new information that you feel makes this recommendation or this discussion, which currently is in our current guidance, invalid or unrealistic?

DR. VENITZ: Then I would say it should be up to sponsor to decide whether they want to use a replicate design for modified- and controlled-release dosage forms.

I feel strongly about the heterogeneous study population because I'm not sure how you're going to enforce that unless you're rejecting if only 35 percent of the population are of one gender. Does that mean the study is invalid automatically? I'm not sure how you're going to enforce that. Or will you reject offhand? You won't even look at the results of the study if the gender or age distribution is not met?

DR. LESKO: The rationale for originally including this recommendation in the guidance was primarily to answer the question about the rate of subject-by-formulation interactions that we debated back in 1999. We made the hypothesis at that time that if these interactions were real and if they were going to occur, they would tend to occur with the more complicated dosage form which, in combination with a variety of subjects, would tend to send a signal out that something real is there. That was the rationale for it.

DR. LEE: Bill?

DR. BARR: That was actually the recommendation I think of the Individual Bioequivalence Committee. Having sat on that committee, that was the conclusion that we came to. That would give it the maximum opportunity to evaluate this scientifically, and I think for another year, it certainly wouldn't hurt.

DR. LESKO: It also went in conjunction with another recommendation at the -- I don't want to say Blue Ribbon Committee, because I'm not sure, but it was another recommendation that we reduce the multiple dose requirement for modified-release dosage forms. So, it was a bit of a decision that this was a more informative study and the modified-release multiple dose studies were not adding much to it. So, that was another rationale for this recommendation in the guidance.

DR. BENET: Can I follow up on that as the chairman of the committee? Larry is exactly right. We felt that that was a good tradeoff and that it would allow the agency to get additional information.

But I'd have to say I don't think today it's justified, and so I would still like the recommendation for highly variable we recommend, but I'm not sure that -- well, I don't agree anymore that it's a recommendation that modified-release dosage forms be carried out with replicate design.

DR. LEE: Les, why don't you think it's appropriate today?

DR. BENET: At this time?

DR. LEE: Yes.

DR. BENET: For the whole reason that I basically do not believe that we're gaining the kind of information that we -- I mean, my whole presentation before. I don't think we gain anything additional in these kinds of studies from modified-release dosage forms that makes it unique and therefore I am not a supporter of it because I don't think inherently modified-release dosage forms are highly variable. And where the IBE equation has its most advantage to the generic industry is for highly variable products. So, I do not believe that modified-release are highly variable in general.

DR. LEE: Thank you.

DR. BOLTON: In the data that you showed us for the new ANDAs, were there heterogeneous populations in those studies? I thought somebody said they were all male subjects.

DR. LESKO: Many of the studies were in male subjects, but many of them also had a mixture of men and women in the studies, not necessarily 50/50.

DR. BOLTON: But, you know, when I looked at those studies, I didn't see any evidence of interaction that would bother me. There were a couple that were .2 out of 13, a bunch of 0's. Of course, those are bias kind of estimates because you limit to 0 interaction. I would not be disturbed at all by seeing data of that sort, and so I agree with Les. We haven't really seen anything there, and I'm not so sure how much more we're going to get from that.

DR. LESKO: Well, I think we get a lot from looking at data. I mean, if we go back to, say, 1999 when we didn't have these 20-some data sets, I think we've learned a lot by looking at them.

We don't have a lot of data to make the conclusion on the modified-release dosage forms. I think we have five, perhaps, data sets in both the ANDA and NDAs. On the other hand, if we're looking at actual examples that we've tried to look for in subgroups effects with dosage forms, some of those have come from the modified-release dosage form category.

So, it's really a question. I agree if I look down the ANDA and the NDA table at how many 0's we have in there for subject-by-formulation interaction and look at how many of them are extended-release or modified-release dosage forms, nothing jumps out at this point, but it is only 20 data sets and it may or may not be consistent with the old data sets. I don't remember how many controlled-release were in that old data set. So, my feeling would be we need to again collect more data.

The other side of that coin is we did have, as Les mentioned, discussion of a tradeoff with the multiple dose by looking at the dosage form more intensely in a single-dose situation. I don't know, Les, what your feeling is. If you're saying that you no longer are in favor of the replicate design, does that mean then you're in favor a multiple-dose design?

DR. BENET: No. I think we can adequately do it with single doses. I don't see any reason why we can't approve products that are useful with single-dose design in modified-release and our present criteria.

DR. LEE: Other points or opinions about the replicate design? Leon?

DR. SHARGEL: I'd like to address the heterogeneous population. Usually when we read something in a guidance, even though the agency says it's only a recommendation, it generally means a requirement in our industry. And to try to justify doing anything outside the box is awfully difficult.

When a heterogeneous population is required, it's awfully difficult to meet. You can try different parts of the country and you'll find the Midwest will generally be young caucasians. You may find another area of the country that may be more Hispanic or African-Americans. You may find more males. I think to have it as a suggestion and certainly try to get a diverse population is there, but from a real practical recruitment, it is often difficult. So, I wouldn't want to see that as a requirement, but I think in good faith, the industry should try to get a diverse population. So, I'm concerned about requirement versus a good faith effort in this.

The second is I agree with Les in terms of modified-release and a comment by Laszlo saying is the recommendation for replicate design on modified-release more to gather more information, more data, or is it really needed in terms of the approval process under average bioequivalence, that we could do just a crossover study.

DR. LESKO: Just again, it's a recommendation we make. We believe it's important. The way we handle deviations from a guidance, if the sponsor doesn't follow what we've recommended in the guidance, we do ask them to explain why, but it's not a requirement. So, if Marv, for example, presented a case where I had a simple formulation and the rationale for using a homogeneous population was that I had this simple formulation, had excipients that I don't expect -- there would be at least some scientific rationale to say why I did not do this.

DR. MEYER: I think every state has about an equal number of males and females, so that could be a good place to start. It's hard to argue my state has no females in it.

(Laughter.)

DR. VENITZ: But, Larry, if that's the case, how come that all the new data that we looked at were predominantly healthy, young males? If you're saying this guidance with this recommendation has been in effect for what? Two years now?

DR. LESKO: One year.

DR. VENITZ: One year. So, over the past year, you've got studies to review that used exclusively males despite the fact that you're recommending here that they should use a gender mix.

DR. LESKO: I think we're disappointed with the heterogeneity, but I don't know the breakdown in each and every study. As Mei-Ling was showing her data, I remember seeing n equals 10 females, n equals 10 females, and the total n may have been 30, 36. So, there's probably some heterogeneity based on gender in these studies, but I doubt if there was much heterogeneity based on age.

DR. LEE: Bill?

DR. BARR: Can I ask Mei-Ling to review with us again the percentage of studies that you found in which you did have females and males that you could look at, how often that was of significance?

DR. LESKO: Bill, you were saying would Mei-Ling?

DR. BARR: Yes.

DR. LESKO: I don't know the answer to that question, so I can't address it, but you might want to rephrase it for her.

DR. BARR: The question was when you did have the opportunity to look at studies which had both males and females, a significant number to be able to do some statistical evaluations, how often did you find that that made a difference in terms of the bioavailability, that there was a subject-formulation interaction in which females were different than males?

DR. CHEN: Well, honestly I haven't analyzed the data based on the gender. At this point, I only used the whole criterion and evaluated the outcome based on the new criterion.

DR. BARR: I guess the reason I'd ask that is that I recall at one time, whenever you first looked at the first data sets and first made a presentation, that there were, I think, eight data sets that you found in which there were replicate design. In two or three of those, it seemed to me, you identified that the female population was in fact the reason for the interaction. Am I mistaken? This was, I think, about two years ago in one of the first presentations that you made regarding the new evaluation of the data sets in which you had replicate design.

DR. CHEN: I can't recall which data set.

But we are planning to do more analysis for the data sets that we have received so far and look into the gender factors or other factors, but we haven't really analyzed those.

DR. BARR: [Off microphone.] Everyone knows it's much easier to do these studies with all males, and most people do it just simply because it's easier to recruit males, it's easier to have a single population n. And if it makes no difference, then that's fine. I think that we ought to know that. It would make it easier to do the studies. But somehow that seems to be almost a Taliban approach. We're now disenfranchising 50 percent of the population. If in fact it may make a difference and we don't know, it seems to me that we have to include females in there somehow in order to make that decision.

DR. CHEN: Right. That's the plan that we are going to have for the future research plan, if we the time and the resources.

DR. LEE: Okay. I think that we are beginning to lose the committee because the cabs are here. Larry, have you heard enough to move forward?

DR. LESKO: Yes. I think we've heard some excellent comments and in some ways consensus on some of these points. So, we're happy with some of the input we've gotten here. And some of it we have to debate more.

DR. LEE: Okay. It's very clear about that.

Topic number 4, which is on the plan presented by Stella. Does anybody want to provide some advice on the proposed research plan?

DR. VENITZ: I'm not sure whether that's included in your proposal, Stella, but I would suggest that you look at disaggregate criteria as well as the current criteria where you separate out the variance component, the test and reference variance component and the mean components, separate as opposed to in one big glob that we call aggregate criteria.

DR. LEE: Yes. It seems to me that in light of the discussion this afternoon, that perhaps this research plan is to be modified in some way. How essential is it for the agency to hear the comments from this committee?

DR. LESKO: I think the comments can wait for the time being. I think the prior comments on the other three discussion points has given us a lot to focus on, and I also think we've had some discussions with Nevine and others about conducting some simulation studies that we think will answer some of the questions we've had about the interrelationship between the components of the IBE criterion.

DR. LEE: That's my sense.

Sandy, you want to suggest something?

DR. BOLTON: I just had a comment on doing human research for this interaction. Maybe if you found a study where you really saw a high interaction, that you could somehow design your own study with those products and see if you could tease that out, I think that might be useful. Because so far, except for these few examples and concocted studies, we really haven't seen anything that's very, very disturbing yet. And if you could show that in a pointed direction, then we'd know where we're going. It's like trying to get bin Laden out, you know.

DR. LEE: You're on record.

(Laughter.)

DR. LEE: Is there anything else that this committee would like to --

DR. BENET: I'd like to see, from a statistical point of view, the addressing of when there are large differences between the within-subject variation between the test and the reference, when one of them is low and the other is high. I'd like to see simulations that do that, but the means are the same.

And I'd like to see simulations that address the passing or failing with and without the subject-by-formulation interaction because my feeling is that even though the equation said it isn't, that the subject-by-formulation interaction is affected by the withins being very different from the test and the reference. So, I think that would be useful data to see. Maybe Stella or we could look at that.

DR. LEE: Since you're on the subject, I'll make a suggestion. I feel that the excipient effect is an important one to examine. I think that the initial study on the sucrose/sorbitol is moving in the right direction.

So, may I suggest that the agency takes the discussions into perspective, the modified research plan, and then maybe you can determine whether or not at the next meeting to bring it back or in a meeting after that, in a future meeting.

DR. LESKO: We agree. When we have some data.

DR. LEE: Right. I think it's very clear that we need some more data, but at some point we need to say this is the time to call the question.

Is there anything else?

(No response.)

DR. LEE: If not, thank you very much for your participation and the meeting is adjourned.

(Whereupon, at 4:37 p.m., the committee was adjourned.)