- - -











8:33 a.m.

Monday, September 10, 2001












Versailles Ballroom

Holiday Inn - Bethesda

8120 Wisconsin Avenue

Bethesda, Maryland




Associate Clinical Professor

Oncology Associates, P.C.

Helen & Harry Gray Cancer Center

Hartford Hospital

85 Retreat Avenue

Hartford, Connecticut 06106


Advisors and Consultants Staff, HFD-21

Food and Drug Administration

5600 Fishers Lane

Rockville, Maryland 20857


Professor of Medicine

Division of Hematology/Oncology

Loyola University Medical Center

Cancer Center, Room 109

2160 South First Avenue

Maywood, Illinois 60153


Wilshire Oncology Medical Group, Inc.

50 Bellefontaine Street, Suite 304

Pasadena, California 91105


Professor of Medicine

Division of Hematology and Oncology

University of Alabama at Birmingham

1530 3rd Avenue South

Birmingham, Alabama 35294-3280


Professor of Biostatistics

Department of Biostatistics and Bioinformatics

Box 3958

Hanes House, Room 219

Trent Drive at Erwin Road

Duke University Medical Center

Durham, North Carolina 27710

ATTENDEES (Continued)



Chief, Gastrointestinal Oncology Service

Memorial Sloan-Kettering Cancer Center

1275 York Avenue

New York, New York 10021


Professor of Medicine and Cancer Prevention

The University of Texas M.D. Anderson Cancer Center

Department of Clinical Cancer Prevention

1515 Holcombe Boulevard, HMB 11.192c, Box 236

Houston, Texas 77030

JODY L. PELUSI, F.N.P., PH.D., Consumer Representative

Phoenix Indian Medical Center

4212 North 16th Street

Phoenix, Arizona 85016


Associate Director

Stem Cell Transplant Program

Center for Cell and Gene Therapy

Baylor College of Medicine

6565 Fannin Street, M964

Houston, Texas 77030


Associate Professor of Internal Medicine

Division of Hematology/Oncology

University of Michigan Comprehensive Cancer Center

7216 Cancer Center

1500 East Medical Center Drive

Ann Arbor, Michigan 48109-0948



Departments of Medicine and Pathology

Indiana University School of Medicine

Indiana Cancer Pavilion

535 Barnhill Drive, Room 473

Indianapolis, Indiana 46202

ATTENDEES (Continued)



Professor of Medicine

Medical Director, Palliative Care Services

Division of Clinical Oncology

University of Kansas Medical Center

3901 Rainbow Boulevard

Kansas City, Kansas 66160-7353




New York, New York




3 Heritage Hills Court

Skillman, New Jersey 08558-2340




Professor of Oncology - Experimental Therapeutics

School of Medicine

Johns Hopkins University

East Baltimore Medical Center

1650 Orleans Street

Baltimore, Maryland


School of Medicine

University of California at San Francisco

Breast Cancer Program

Box 1710

San Francisco, California 94143

ATTENDEES (Continued)










Professor of Medicine

M.D. Anderson Cancer Center

Houston, Texas



Clinical Trial Design for First-line Hormonal

Treatment of Metastatic Breast Cancer



By Dr. Templeton-Somers 8


By Dr. Aman Buzdar 11


Clinical trial designs for first-line treatment

of metastatic breast cancer

By Dr. Susan Honig 18

Hormonal treatment of metastatic breast cancer:

approval overview

By Dr. Patricia Cortazar 21

Statistical considerations in clinical trial

designs for first-line treatment of

metastatic breast cancer

By Dr. Rajeshwari Sridhara 36




(8:33 a.m.)

DR. NERENSTONE: I would like to thank everyone for joining us this morning. As you can see from your agenda, we're going to be starting with the discussion about the clinical trial designs for first-line hormonal treatment of metastatic breast cancer.

If we could, could we please go around the table and introduce ourselves. Dr. Henderson, if you would like to start.

DR. HENDERSON: Craig Henderson, University of California, San Francisco.

DR. DAVIDSON: Nancy Davidson, Johns Hopkins.

DR. OHYE: George Ohye, nominee for industry representative.

DR. KELSON: Dave Kelson, Sloan Kettering.

DR. ALBAIN: Kathy Albain, Loyola University, Chicago.

MS. MAYER: Musa Mayer, patient representative.

DR. LIPPMAN: Scott Lippman, M.D. Anderson Cancer Center.

DR. CARPENTER: John Carpenter, University of Alabama at Birmingham.

DR. PRZEPIORKA: Donna Przepiorka, from Baylor Houston.

DR. NERENSTONE: Stacy Nerenstone, community oncologist, Hartford, Connecticut.

DR. TEMPLETON-SOMERS: Karen Somers, Executive Secretary to the committee, FDA.

DR. SLEDGE: George Sledge, Indiana University.

DR. PELUSI: Jody Pelusi, Phoenix Indian Medical Center and community representative.

DR. GEORGE: Stephen George, Duke University.

DR. REDMAN: Bruce Redman, University of Michigan.

DR. TAYLOR: Sarah Taylor, University of Kansas.

DR. BLAYNEY: Douglas Blayney, Wilshire Oncology Medical Group, Pasadena, California.

DR. SRIDHARA: Raje Sridhara, statistician, FDA.

DR. CORTAZAR: Patricia Cortazar, FDA.

DR. HONIG: Susan Honig, FDA.

DR. PAZDUR: Richard Pazdur, FDA.

DR. TEMPLE: Bob Temple, FDA.

DR. TEMPLETON-SOMERS: The following announcement addresses the issue of conflict of interest with respect to this meeting and is made a part of the record to preclude even the appearance of such at this meeting.

Based on the submitted agenda and information provided by the participants, the agency has determined that all reported interests in firms regulated by the Center for Drug Evaluation and Research present no potential for a conflict of interest at this meeting with the following exceptions.

In accordance with 18 U.S.C., section 208(b), full waivers have been granted to Dr. Douglas Blayney, Dr. John Carpenter, Dr. Scott Lippman, and Dr. George Sledge. Further, in accordance with 21 U.S.C. 355(n)(4), Doug Blayney, M.D., Bruce Redman, D.O., and Sarah Taylor, M.D., have been granted waivers that permit them to vote on matters related to this morning's discussions.

A copy of these waiver statements may be obtained by submitting a written request to the agency's Freedom of Information Office, room 12A-30 of the Parklawn Building.

In addition, Dr. Kathy Albain and her employer, the Loyola University Medical Center, have interests which do not constitute financial interests in a particular matter within the meaning of 18 U.S.C., section 208, but which could create the appearance of a conflict. The agency has determined, notwithstanding these interests, that the interests of the government in Dr. Albain's participation outweighs the concern that the integrity of tha agency's programs and operations may be questioned. Therefore, Dr. Albain may participate fully in this morning's discussions and vote.

With respect to FDA's invited guests, Dr. Craig Henderson and Dr. Nancy Davidson have reported interests that we believe should be made public to allow the participants to objectively evaluate their comments. In 2000, Dr. Henderson received a consulting fee from AstraZeneca and he has received speaker fees from Bristol-Myers Squibb for lectures on paclitaxel. Dr. Davidson received unrestricted research support from AstraZeneca more than three years ago. She has also received consulting fees and speaker fees from AstraZeneca.

Lastly, we would like to note for the record that George Ohye is participating in this meeting as an industry representative, acting on behalf of the regulated industry. As such, he has not been screened for any conflicts of interest.

In the event that the discussions involve any other products or firms not already on the agenda for which FDA participants have a financial interest, the participants are aware of the need to exclude themselves from such involvement and their exclusion will be noted for the record.

With respect to all other participants, we ask in the interest of fairness that they address any current or previous financial involvement with any firm whose product they may wish to comment upon.

Thank you.

DR. NERENSTONE: We are going to do the open public hearing now. And, Dr. Buzdar, we need you to have a financial disclosure in terms of who is paying for your talking here.

DR. BUZDAR: Actually, I would like to make it public that I have research grants from AstraZeneca, from Bristol, from Lilly Pharmaceutical, and also from Roche. But I have no agreements or any other financial conflicts with any other pharmaceutical company except I have participated in talks with different pharmaceutical companies, but I have no agreements or any stocks or shares in any other pharmaceutical company.

I would like to take this moment to thank the committee for the opportunity to express my thoughts on this issue. It is an important issue, at least in my judgment, whose time has come to address at this point.

Since the availability of front-line data, at M.D. Anderson we have changed our thinking. And following the availability of data, the treatment scheme that we have adopted at our institution is shown over here. The sole place held by anti-estrogen as the initial therapy of postmenopausal women with hormonal receptor positive disease was revised, and AIs have been moved from the second-line therapy to the first-line therapy.

This slide shows the structure of three AIs which are available for clinical use in this country. Anastrozole and letrozole are nonsteroidal aromatase inhibitors, and exemestane is a steroidal aromatase inhibitor.

This slide summarizes the clinical pharmacology of these agents. The recommended doses of the three agents are different. Two are competitive inhibitors and exemestane is a suicidal or irreversible inhibitor. All agents are very effective in suppressing the estrogen. There is limited data that letrozole was superior in inhibition of estrone sulfate and anastrozole, but clinical significance of this remains to be defined. Estradiol suppressions were similar in this one small study.

The summary of the second-line randomized trial is shown over here, which illustrates that median survivals were longer with AIs than progestin in these trials. In two out of the four studies these differences were significant. Median time to progression was similar in all studies. The letrozole initial study findings demonstrated a dose-dependent antitumor activity and a higher response rate with 2.5 milligrams, which was not reproduced in the second independent study with a similar design. These data, in spite of these differences, I think have more similarities and similar patterns in time to progression, survival and response rates.

Let me say a few words now on the first-line therapy. The study design of two anastrozole trials is shown over here. All the front-line therapies essentially have a very similar design. In the North American study, which had almost 90 percent of the patients who were hormone receptor positive, anastrozole showed superior antitumor activity. A higher fraction of patients got also clinical benefit, and the duration of the control of the disease, or time to progression, as shown graphically, was also in favor of anastrozole.

In the second European study, both drugs had similar antitumor activity but only 45 percent of the patients in this trial were known to be receptor positive. Time to progression for both therapies was similar in the European trial, as graphically shown over here.

If one looks at the hormone receptor positive patients in both trials, the data looks similar to the North American trial. In known receptor positive patients, anastrozole treated patients had longer time to progression compared to the tamoxifen.

The letrozole study had a similar design as the anastrozole study, but there is one major difference in this, that there is a planned crossover to prospectively evaluate the efficacy of anti-estrogen following letrozole therapy. The letrozole study had also shown, from an efficacy point, that all the efficacy endpoints were superior, and the antitumor activity of this drug was superior compared to the tamoxifen in this study, as shown on this table. The time to progression was also superior in the letrozole arm compared to the tamoxifen.

One subgroup of patients, i.e., the patients who had prior tamoxifen exposure, the tamoxifen arm of the study had only 8 percent antitumor activity in the letrozole study. This subgroup represents a sizable fraction of the patient population in this study. A poor response rate was observed in spite of a long interval from discontinuation of the earlier tamoxifen therapy in this subgroup.

In the anastrozole trials, if we look at it, prior tamoxifen therapy had no adverse effect, contrary to the letrozole study in which you saw much a lower response rate in the tamoxifen arm.

Stratified analyses of time to progression for anastrozole and letrozole studies are shown over here. Another subgroup of patients in which the tamoxifen arm and the letrozole trial also had a very short median time to progression -- which was the rest of the world -- this represents another large segment of the trial population in which efficacy of tamoxifen was very low. Efficacy of AIs in this study was similar, which stands out that the only difference is that tamoxifen did somewhat poorer in the letrozole study compared to the other two drugs.

A side-by-side look at the data of anastrozole and letrozole with intent-to-treat analysis. These findings show more similarities in time to progression, fraction of patients who achieve clinical benefit from these therapies.

A similar side-by-side look at the data of both drugs in ER-positive patients shows similar findings to suggest the superiority of AIs over tamoxifen therapy.

Exemestane is a steroidal aromatase inhibitor. Phase II data which is available show similar findings in a small study. Phase III studies with this agent are ongoing in front-line therapy.

From this clinical efficacy point of view, I think aromatase inhibitors are better agents than anti-estrogen therapy and could be considered as an initial therapy new standard. To determine if one agent is better than the other, I think blinded prospective studies are needed to evaluate their efficacy and safety in this type of setting.

In the year 2000, also we should only do these type of studies in ER known positive patients so that we don't have the problem which we have seen in the earlier studies, where we have half or more than half of the patients who are not receptor positive or unknown.

Just to illustrate this point, if we look at all the data which is available, studies with a higher fraction of patients with known receptor positive disease show higher clinical benefit in favor of the AI arm of the studies across the board, all the studies which are available in the literature.

Similarly, if you look at the review of earlier studies and look at the time to progression in favor of AIs, it is again related to the hormonal receptor status of the study population.

Future studies also need to take into account the fraction of patients who had prior tamoxifen or prior endocrine adjuvant therapy, as I tried to make the point, that it can modify the impact of one arm or the other.

I think it is high time to look at different clinical endpoints besides survival, as survival is a composite sum of all therapies offered to the patient population in the course of the disease or their illness.

Also, we need to keep in mind when we look at these endpoints that there are drug-drug interactions, like AIs that could affect the other pathways which could adversely affect the levels of these drugs which may be utilized in subsequent therapies.

Finally, the last two words I wanted to say about the safety profile of these agents, and there are subtle differences between the safety profile of these agents. And if we look at it just over here, AIs definitely have fewer thromboembolic complications. I just use one of the studies, but if we look at it across the AIs, that is a similar pattern which emerges.

Last but not the least, there are also differences in the selectivity of these agents. Some of the AIs I've shown over here can cause subtle changes in the other steroid synthesis pathways, which we must be aware of, as these effects may become important if patients are under acute stress or therapies are offered for a longer duration like adjuvant therapy.

I appreciate this opportunity to express my thoughts, and I believe we are all on the same page to find better therapies and safer therapies for our patients. Thank you very much.

DR. NERENSTONE: Thank you very much, Dr. Buzdar.

Is there anyone else who wanted to speak during the open public hearing?

(No response.)

DR. NERENSTONE: There being no one else, Dr. Honig is going to start the FDA presentation.

DR. HONIG: Good morning. This morning's session is entitled "Clinical Trial Designs for the First-Line Hormonal Treatment of Metastatic Breast Cancer." Dr. Cortazar, Dr. Sridhara and I are going to provide a regulatory history of approvals for this indication and then present some issues for discussion by the committee.

The three of us would first like to acknowledge everyone listed on this slide, all of whom made major contributions to this presentation this morning.

The purpose of this meeting is to discuss the rationale and the basis for the past approvals of hormonal therapy for metastatic breast cancer, and then to solicit input from the committee in order to improve and standardize their approach to approval of similar drugs in the future.

Traditionally, the division has distinguished the approval of hormonal drugs for this indication from cytotoxic drugs, predominantly on the basis of the different toxicity profile between the hormonal agents and traditional chemotherapy. The basis for approval of hormonal agents in the more modern era has been based on the original approvals for Megace and tamoxifen. And so I would like to take a minute just to review these particular applications.

Megace was first approved in 1976 for the palliative treatment of advanced carcinoma of the breast. Approval was based on response rate reported in several phase II studies, and a total of 116 patients were treated. No information was available about Megace's effect on time to progression or survival, and the response rate was interpreted in the context of that for historical controls.

Tamoxifen was first approved in 1977, and of course has been the basis of many supplements since that time. However, the original approval was based on response rate from 14 phase II clinical trials, as well as the response rate that was reported in the literature for 9 additional studies. A little over 1,100 patients were treated in these studies. And again, a point that you will hear echoed several times this morning, no information was available about tamoxifen's effect on time to progression or survival, and the response rate was interpreted in the context, again, of historical controls.

Well, clearly, in the modern era we have required randomized clinical trials for approval, but we have continued to use response rate as the primary endpoint for approval in this particular setting. It is a surrogate endpoint, but was considered to be acceptable for treatments with modest toxicities, like the hormonal-type treatments. Response can be attributed to drug effect, as cancer rarely shrinks without some form of treatment. And again, just to underscore this point again, in the first-line hormonal setting, it has been used as FDA's primary endpoint for traditional, or full, approval, not for accelerated approval under subpart H.

We have not required submission of survival data as a primary endpoint. We have looked at it. But as we've talked about, there is a lack of a demonstrated survival advantage for the control compared to no therapy. And so survival has been used as a safety rather than an efficacy endpoint. And similarly, time to progression data have been submitted and reviewed, but they have not been used as the sole basis of approval.

Using response rate as the primary endpoint, as you will hear in a minute, most of the drugs have been approved on the basis of non-inferiority. And the definition of non-inferiority that has been most frequently used is listed here, that the lower limit of the two-sided 95 percent confidence interval for the difference in response between the two drugs should be less than or equal to 10 percent. So that, in other words, a new drug should have a response rate that's not lower than 10 percent than that of the comparator.

We have required submission of time to progression and survival data and have asked for similarity and have also asked generally for a total database of approximately 1,000 patients.

Again, as you will hear in a minute, the comparator in the first-line settings to date has been tamoxifen. And overall, the response rate for tamoxifen in these studies has been 20 percent.

The difference in response rate that has been used as the definition for non-inferiority can be interpreted in several ways. The first is that we are ruling out inferiority of a new drug by an absolute difference of 10 percent. A simple subtraction of 20 percent minus 10 percent equals 10. But another way to interpret this difference is that we are ruling out a loss of half of tamoxifen's effect. In this particular case, we get the same answer either way. But, as you will see, if you use different comparators, the different interpretation could lead to different response rates that are desired and has an impact on trial design and sample size.

What I would like to do now is to stop and Dr. Cortazar is going to summarize the recent approvals.

DR. CORTAZAR: Thank you, Dr. Honig.

Good morning, Dr. Nerenstone, members of the advisory committee, colleagues, ladies and gentlemen. I am going to present a summary of the FDA approval of hormonal drugs for metastatic breast cancer. I am briefly going to comment on the hormonal drugs approved in second-line metastatic breast cancer, and then I will spend most of my talk on the first-line setting, which is our topic of interest for today.

This slide shows the hormonal drugs that the FDA has approved for second-line metastatic breast cancer. Megestrol acetate was approved in 1971. Dr. Honig already described the basis of approval for this drug. It was almost 25 years before additional hormonal drugs were approved for this use. But in the last six years, the FDA has approved three additional drugs: anastrozole, letrozole, and exemestane.

The study design of these hormones have been very similar. We have generally required randomized trials in order to compare response rates. In these studies, the aromatase inhibitors were non-inferior or better than the comparator, megestrol acetate. Anastrozole and letrozole trials were designed for superiority. However, neither hormone achieved their protocol specified primary endpoint of demonstration of superiority, and each was approved for similarity.

This slide shows the hormonal drugs that the FDA has approved for the initial treatment of metastatic breast cancer. Tamoxifen was approved in 1977. As Dr. Honig already mentioned, the basis of approval was a favorable effect on tumor response in nonrandomized phase II studies. Tamoxifen has never been shown to have a favorable effect on time to progression or survival in this setting.

There is a gap of 18 years between the approval of tamoxifen and the approval of additional hormonal drugs for this use. However, in the last five years the FDA has approved three additional drugs -- toremifene, anastrozole and letrozole -- for first-line treatment of metastatic breast cancer.

The FDA requirements for approval of new hormonal drugs for first-line treatment of metastatic breast cancer is non-inferiority or superiority to tamoxifen for tumor response rate in randomized control trials. This is conditional that the new hormone is not worse than tamoxifen for time to progression and survival. Usually, by the time the applications are submitted, the survival data is not mature. Therefore, FDA requires a phase IV commitment to submit follow-up survival.

Toremifene was approved for first-line metastatic breast cancer in October 1995. Registration trials were three randomized phase III studies, comparing toremifene with tamoxifen in postmenopausal women with metastatic breast cancer who were tamoxifen-naive. Over 1,500 patients were enrolled in the three trials. The U.S. trial was the largest, with 648 patients, while the Nordic and Eastern European trials had a similar number of patients, around 400 each.

In the Nordic and Eastern European trials, inoperable primaries were allowed. This was not specific to stage, and there might have been some bias in terms of who does the investigator consider inoperable.

Most patients were estrogen receptor positive in the U.S. trial, 60 to 66 percent, and the Nordic trial, over 50 percent, while most of the patients in the Eastern European trial were estrogen receptor unknown, 66 percent.

The primary endpoints of the three trials were response rate and time to progression. The trials were designed to show non-inferiority in response rate.

Non-inferiority was defined in the protocol in terms of the lower bounds of the confidence intervals for response rate and time to progression. For response rate, non-inferiority was to be met if the lower limit of the two-sided 95 percent confidence interval for the difference in response rates, toremifene minus tamoxifen, was not more than 10 percent worse than tamoxifen. For example, if tamoxifen has a response rate of 50 percent, a comparator might have a response as low as, but no lower than, 40 percent. If tamoxifen had a response rate of 20 percent, a comparator might have a response rate as low as, but no lower than 10 percent.

For time to progression, non-inferiority was assessed in terms of the two-sided 95 percent confidence intervals for the hazard ratio of tamoxifen to toremifene. If the lower limit was fixed at 0.8, then it could be concluded that toremifene was at least non-inferior to tamoxifen.

I would like to clarify that 0.8 is a number that was chosen arbitrarily. This is not appropriate by today's standards. Now we base the margin on the control effect. Dr. Sridhara will discuss this issue later.

In addition, non-inferiority of a new hormonal agent to tamoxifen for time to progression is not adequate for approval because tamoxifen has never been shown to have a favorable effect on time to progression in this patient population.

This slide shows efficacy results for toremifene in first-line metastatic breast cancer. The response rates in the U.S. and Eastern European trials were non-inferior. Both have a lower limit of the confidence interval of less than 10 percent. The Nordic trial did not meet the protocol definition of non-inferiority. The lower confidence interval is greater than 10 percent. The reasons for the difference in the results are not clear, since there were no imbalances for prognostic factors compared to the other trials.

Time to progression was non-inferior by protocol definition in the U.S. and Eastern European trials. The two trials meet the lower limit of 0.8 in the confidence interval of the hazard ratio. However, we consider this result uninterpretable since the comparator has not been shown to have a favorable effect on time to progression.

The Nordic trial results did not meet the protocol-specified definition for non-inferiority to tamoxifen. In fact, there was a significantly worse time to progression in patients who received toremifene. The upper bound is less than 1.

In summary, the Nordic trial did not meet the protocol definition of non-inferiority. This trial has significantly worse time to progression with toremifene. There was a concern with the lack of explanation for the deviance of the results in the Nordic trial. Therefore, toremifene was approved because of non-inferiority in response rate in two of the three trials.

Anastrozole was approved for first-line metastatic breast cancer on September 2000. The registration trials were two double-blind, well-controlled clinical studies of similar design: 0030, a North American study, and 0027, a predominantly European study. The studies compare anastrozole 1 milligram to tamoxifen 20 milligrams once daily, in over 1,000 patients: 353 in the U.S. trial and almost double the number of patients in the European study. Most of the patients in the U.S. trial, 88 percent, had positive receptors, compared to less than half of the patients, 45 percent, in the European trial.

The primary endpoints of the trials were objective tumor response and time to progression. The trials were designed to show non-inferiority. Non-inferiority was defined in terms of the lower bounds of the 95 percent confidence interval. The margin for the response rate was defined in the protocol as 10 percent. The lower 95 percent confidence interval of the difference, anastrozole minus tamoxifen, should not be more than 10 percent worse than tamoxifen.

FDA does not have a general policy on how much of the tamoxifen response rate may be lost by the new hormonal drug and still consider it non-inferior to tamoxifen. This margin has been determined on a case-by-case basis.

The margin for time to progression was defined in the protocol as 20 percent. The lower 95 percent confidence interval of the hazard ratio, tamoxifen to anastrozole, should be greater than the fixed margin of 0.8. Again, this definition is not adequate since the margin is not based in the control effect. In addition, non-inferiority and time to progression is problematic because the comparator has not shown a favorable effect in time to progression.

This slide summarizes the efficacy results of anastrozole in first-line metastatic breast cancer. Arimidex and tamoxifen tumor response rates are statistically non-inferior in both studies. The lower limits of the confidence intervals are less than 10 percent.

Arimidex time to progression is statistically superior to tamoxifen in the U.S. study, with a p value of 0.006, and similar in the other study. The lack of difference in time to progression could be attributed to the increased number of patients with unknown receptor status -- 55 percent in the European study.

This slide summarizes the basis for approval of anastrozole in first-line metastatic breast cancer. In summary, anastrozole was approved because of non-inferiority in response rate in both trials, and superiority in time to progression in the U.S. trial.

Letrozole was approved for first-line metastatic breast cancer in December 2000. The registration trial consisted of one randomized, controlled double-blind multinational clinical study, comparing letrozole at 2.5 milligrams with tamoxifen 20 milligrams orally once daily in 916 women.

The design of the trial changed over time. Initially, there was a third combination letrozole-tamoxifen arm, but this arm was dropped after the results of a pharmacokinetic interaction study. There was also a crossover feature to the study at the time of progression in 43 percent of the patients, but the data was too premature.

Two-thirds of the patients were estrogen receptor positive, and one third had unknown receptor status. The primary endpoint of the trial was time to progression. The trials were designed to show superiority by demonstrating a 20 percent reduction in the risk of progression with an 80 percent power.

This slide summarizes the efficacy of letrozole in first-line metastatic breast cancer. The median time to progression for letrozole was 9.4 months, versus 6 months for tamoxifen. This result is statistically significant, reducing the risk of progression by 30 percent; a hazard ratio of 0.70.

Response rate was significantly higher with letrozole treatment, 30 percent, compared to 20 percent for tamoxifen, with 71 percent higher odds of responding to letrozole than tamoxifen; a p value of 0.0006.

So, in summary, letrozole was approved because of statistically significant superiority in time to progression and response rate. This is the first hormonal drug that has shown superiority to tamoxifen.

Issues to consider with primary endpoints in future trial designs will be discussed by Dr. Honig.

Thank you.

DR. HONIG: So, as we have just heard, anastrozole was approved on the basis of non-inferiority to tamoxifen, and letrozole was the first to demonstrate statistical significance in terms of superiority to tamoxifen.

However, there has never been a direct comparison of these two agents. And these data raise the question as to whether or not letrozole is uniquely superior to tamoxifen of whether in fact there is a class effect raised by the superior time to progression seen in one small study for anastrozole, in which a high percentage of the patients were known to be estrogen receptor positive, with a higher likelihood of responding.

These data raise some issues that we hope you will help discuss with us this morning. The first one concerns the choice of the endpoint. Should we, instead of using response rate, now use time to progression as the primary endpoint for future studies of the first-line hormonal treatment of metastatic breast cancer?

Before we discuss the pros and cons of this approach, I would like to just briefly review the information that would be needed to design such a trial. First, of course, there would need to be an estimate of the treatment effect of the comparator from historical data. Here are some ways in which this could be performed. You could choose the point estimate of the response rate. You could look at a 95 percent confidence interval boundary. You could choose a more conservative or more liberal boundary, which would of course affect sample size, as you will hear from Dr. Sridhara. And also influencing these outcomes would be a determination of what fraction of the effect should be retained.

Well, in favor of switching to time to progression would be some discussion from the committee that would suggest that time to progression is intrinsically more meaningful than response rate. Against using time to progression is the fact that neither of the aromatase inhibitors may be acceptable to design a non-inferiority comparative trial, since neither has reproducibly demonstrated a time to progression advantage. For anastrozole, it was seen in one study; for letrozole, although it was a large, well-controlled study with a convincing statistical outcome, it is a sole study. And as we have heard several times this morning, we don't have data available for time to progression for other comparators. And as Dr. Sridhara will review for you shortly, the sample size needed for a non-inferiority study using a time to progression endpoint may be large.

Should we instead continue to use response rate as we have? This would assume that response rate still sufficiently identifies efficacy in this setting.

If we continue to use response rate, then another issue for discussion is how we should design the trials. Is non-inferiority to tamoxifen or another approved first-line agent still an acceptable basis for approval?

In favor of this approach, again, is the fact that in most cases FDA does not have a comparative efficacy standard. Against it is the finding in this one study that letrozole's response rate was statistically significantly superior to tamoxifen. And some discussion from the committee about this finding would be helpful, as well.

Alternatively, should new drugs now be required to show superiority to tamoxifen? And this could be done in one of two ways, either by a direct comparison to tamoxifen in a superiority study or by a non-inferiority comparison to letrozole.

The issues to consider if we decided that we should use response rate as the primary endpoint and that we would need to demonstrate superiority to tamoxifen, either directly or indirectly, are listed here. We would still need to estimate the treatment effect size. For this example, I've just used the point estimate of the response rate for letrozole of 30 percent. What fraction of the effect should be retained?

You have heard this morning that our frequent definition has been to rule out an absolute 10 percent difference. This would mean that a new drug should not be 10 percent worse in its response rate; so we would be ruling out a response rate of less than 20 percent in this setting.

However, we mentioned before that that 10 percent absolute difference could be interpreted as retaining at least half of the effect of tamoxifen's effect. If we took that approach here, we would want to retain half of letrozole's response rate;,and that would mean that a new drug should not have a response rate less than 15 percent.

However, the third approach is that we would want to retain some fraction of letrozole's advantage over tamoxifen. So that 30 percent minus the 20 percent response rate of tamoxifen in most of these trials is 10 percent and that we would want to retain some fraction of that -- say half, for example -- and that we would be then asking that a new drug have a response rate that is not less than 25 percent. So, again, you can see that you can get three different lower bounds for your response rate, and again, you would need to power and design your study accordingly. And Dr. Sridhara will give you some more concrete examples of what that effect is on sample size.

Now, in addition to these specific concerns about endpoints and comparators we just wanted to mention some of the concerns that we always have about these studies. If we continue to use response rate, we exclude patients with bone-only disease. And clearly these are the patients that tend to get this drug in clinical practice and are likely to benefit from it.

If we instead use time to progression, I would refer to the ODAC discussion session in June of 1999. At that time, the topic under discussion was the use of time to progression for cytotoxic drug therapies. But there was a lot of discussion by the committee of the limitations of measuring time to progression, and those limitations or difficulties would still be applicable in the hormonal setting. Certainly it would be strengthened by the use of blinded trials in assessing time to progression.

An additional concern is about non-inferiority trial designs, and I think you have many times heard Dr. Temple say that sloppiness obscures differences. But the practical implications of that are that, again, independent substantiation of results are particularly important in the non-inferiority setting, and that we need to pay special attention to the study conduct. Dr. Buzdar mentioned some of these points in his presentation earlier.

Inclusion of patients with estrogen receptor unknown status can contribute to the lack of an observed difference between studies. If we include patients with bone-predominant disease, it may make response assessment somewhat more difficult. And again, we need to be willing to adapt inclusion criteria, as science moves forward, to be sure that we are selecting a patient population most likely to benefit from these treatments.

Some broader concerns that we have mentioned in the questions for your discussion later this morning also are: Would any of the discussion this morning impact ongoing trials of new hormonal agents under development? And, finally, what about the possibility that overall survival with another hormonal agent might be superior to that observed with tamoxifen? Would that influence or change our thinking on these topics?

Dr. Sridhara will now present some of the statistical considerations that go along with these questions.

DR. SRIDHARA: Thank you, Drs. Honig and Cortazar.

I am here today to lay out some of the statistical considerations that need to be examined in designing future clinical trials for first-line hormonal treatment of metastatic breast cancer.

The outline of my presentation is as follows. First, I will go through the active control comparators that are under consideration, then present clarification regarding the terminology used, move on to lay out the assumptions that are made in designing non-inferiority trials, then discuss the different designs under consideration, then present estimates of sample sizes that are required under each design. All designs considered here are planned with 80 percent power and a one-sided alpha of .025. I will then highlight points to be kept in perspective in conducting non-inferiority trials in this setting. With this background, I will then put forward the issues that need to be discussed for designing future trials.

The future drug product X could potentially be compared to the old standard tamoxifen; the debate here today is: Should we use letrozole as the active control comparator instead of tamoxifen, as it has shown convincing superiority over tamoxifen in one randomized trial?

Regarding the terminology that we commonly use, there is little confusion regarding superiority, where we mean that drug X is superior; that is, statistically significantly better in efficacy with respect to active control.

However, non-inferiority is a more recent misleading terminology. There are basically three types of non-superiority trials. They are that X works and that X is not much less effective than the active control, and that X and active control are equivalent. In the studies that we are considering here, by non-inferiority we mean that X is not much less effective than the active control.

In earlier approvals we have used terms like "was not different," or that "the two drugs were similar." These are incorrect terminology. However, it should be kept in mind that the earlier studies were not really designed as non-inferiority studies, and hence the terminology was not as rigorous.

For the purpose of design considerations of non-inferiority trials and illustration only, we will consider letrozole as the active control comparator in the remainder of this presentation.

The basic assumptions in designing non-inferiority trials are that: One, the active control -- in this case letrozole -- is effective compared to placebo. Secondly, we can reliably estimate this effect size.

If response rate is the endpoint under consideration, then all the effect can be attributed to the treatment. However, for the time to progression endpoint, we have a comparison of letrozole to tamoxifen and not to placebo. And we also do not have an estimate of the tamoxifen effect size with respect to time to progression.

Another important assumption, which is generally termed as constancy assumption, is that the active control effect in the historical studies is carried over to the future study.

Before designing any trial, including a non-inferiority trial, we need to be certain about the final outcome of interest. The two endpoints under consideration here are response rate and time to progression. Both of these are surrogate endpoints, and we have no proven data which is a better surrogate of the gold standard, final outcome survival, in the first-line metastatic breast cancer setting. Future studies of other agents and updated data on letrozole study may shed some light on this aspect.

The next issue to be considered is the estimate of active control effect size, as we never know the true effect size. In the letrozole study, the point estimate of response rate was 30 percent, with a 95 percent confidence interval between 26 and 35 percent. The point estimate of hazard ratio of tamoxifen versus letrozole was 1.4. And the two-sided 95 percent confidence interval for the hazard ratio was 1.24 to 1.56. We have to make a decision on which of these estimates is to be used as the control effect size for computing sample sizes for the future studies.

We also need to know how much of the effect we are willing to give up or, putting it another way, what proportion of the active control effect should be preferred that is deemed clinically meaningful.

As a first step, we need to estimate the sample size of the active control effect. From a given study or studies, we generally describe the effect by a point estimate and a two-sided 95-percent confidence interval. That is, we can say with 95 percent confidence that the true effect is anywhere between these two limits.

Potentially, we can consider four methods to estimate the true control effect. If we choose the point estimate as the estimated active control effect, then this will inflate the type 1 error. On the other hand, if we choose the other extreme, the lower 95 percent confidence limit as the estimated control effect, then the type 1 error will be very small.

A compromise is to use a lower gamma percent limit as the estimated control effect, which will ensure type 1 error to be .025. Choosing a fixed margin approach, such as less than or equal to 10 percent, or any other fixed margin, is quite arbitrary. Whatever we choose as our estimate of the control effect, we have to then decide on which of that effect we are willing to give up or, in other words, how much of that effect we feel compelled should be retained by the new drug.

In the next few slides, I will present estimated sample sizes under different design criteria, with response rate as the endpoint. In this slide, I'm using point estimate as the estimate of true letrozole effect. As I mentioned earlier, I'm using letrozole for illustration purposes only as the comparator here. That is, the point estimate of control effect size of letrozole is 30 percent. The first column gives the sample sizes required, retaining delta percent of the 30 percent.

For example, if 50 percent of the effect, or 15 percent response rate, should be retained, then a sample size of 300 is necessary. When simulations of studies designed where the point estimates are conducted, it can be shown that in fact the type 1 error alpha is always greater than .025. This can also be proved mathematically, and therefore this is a less than optimum design and not recommended to be used. The purpose of presenting this approach here is only to illustrate the concept and not to use in the future designing of the trials.

Suppose we want to retain some fraction of letrozole advantage over tamoxifen. That is, we define active control effect as the difference in effect between letrozole and tamoxifen. And assume letrozole has a response rate or 30 percent and tamoxifen 20 percent. Then, for example, to retain 50 percent of the effect -- that is, the response rate of at least 25 percent with a new drug product X -- the total sample size required is 1,319.

If we consider the lower 95 percent confidence limit as the estimate of active control, then the letrozole-tamoxifen study, the lower two-sided 95 percent confidence limit of response rate was 26 percent. In order to retain 50 percent of this effect, a total sample size of 360 patients is required.

However, using a fixed margin that we have used historically with tamoxifen as the active control -- that is, the lower limit of the 95 percent confidence interval for the difference in response between drug X and letrozole to be less than or equal to 10 percent -- the sample size required is a total of 660 patients.

In the next few slides time to progression is considered as the endpoint of interest. In this slide, the effect size used is the point estimate of hazard ratio of tamoxifen to letrozole. Note that this is not the placebo versus letrozole effect size. The point estimate of the hazard ratio of tamoxifen to letrozole in the tamoxifen-letrozole study was 1.4. If, for example, we retain 50 percent of the letrozole effect over tamoxifen, then the total number of events, and not patients, required is 944.

Incidently, the point estimate of the hazard ratio of tamoxifen to anastrozole was also 1.4 when a meta-analysis of the two registration studies was conducted using point estimate, again, as a suboptimal approach and not recommended, as it inflates type 1 error.

On the other hand, if we consider the lower 95 percent confidence limit of the hazard ratio of tamoxifen to letrozole as the estimate of the active control effect, with time to progression as the endpoint, then in order to retain, for example, 50 percent of letrozole effect over tamoxifen, a total of 3,542 events -- and again, not number of patients -- are required, as presented in the table under the title "Letrozole" here.

This is a conservative approach. Simulations of such designs show that the type 1 error is much less than .025. For example, the type 1 error can be as low as 0.007.

For purposes of illustration only, a meta-analysis of two registrations studies of anastrozole was conducted. We do not recommend meta-analysis of these two studies. Just to recap Dr. Cortazar and Dr. Honig's remark, there were two randomized trials -- one conducted in the U.S. and the other in Europe -- of anastrozole compared to tamoxifen. Both studies demonstrated non-inferiority with respect to response rate. And the smaller of the two studies, the U.S. study, demonstrated superiority in time to progression.

The patient population characteristics were different in the two studies, particularly with respect to number of ER-positive patients. Thus, we do not recommend conducting a meta-analysis of these two studies. And, once again, it is presented here for purposes of illustration only.

Using the results of this "meta-analysis," and using the lower 95 percent confidence limit of hazard ratio of tamoxifen to anastrozole as the estimate of the active control effect, sample size estimates are presented in the second table under the title "Anastrozole." The sample size estimates using anastrozole as the active control are much larger than the estimates using letrozole as the active control.

A less conservative approach but one that preserves type 1 error of .025 is currently under development and testing by the CDER Oncology Statistical Reviewers Team. In this approach, for example, in order to retain 50 percent of the letrozole effect over tamoxifen with respect to time to progression and preserve type 1 error of .025, the total number of events -- again, not number of patients -- required is 1,427. This translates to using a 55 percent lower confidence limit as an estimate of the control effect instead of the conservative lower 95 percent confidence limit. Because of the fact that in this approach type 1 error is fixed depending on the percent of effect retained, the percent confidence limit varies, as listed in this table.

Similarly, using the meta-analysis results of the two anastrozole studies for the purposes of illustration only, the sample size estimates are presented in the table under the title "Anastrozole." Again, we do not recommend this meta-analysis, and the sample size estimates using anastrozole as the active control are larger than the estimates using letrozole as the active control, as presented here.

In summary, with response rate as the endpoint, and say, for example, we decide that 50 percent of the active control effect should be retained, then the sample sizes, using the different estimates of control effect, are presented in this summary slide. The sample sizes range from about 300 to about 1,300. It should be kept in mind that, generally, using point estimates are not recommended, as they tend to inflate type 1 error.

This slide summarizes the sample sizes using different estimates of control effect, with time to progression as the endpoint, and assuming 50 percent of the active control effect should be retained. The sample sizes vary, approximately from about 1,000 to 3,500 events -- and not patients.

Again, the point estimate approach tends to increase type 1 error. The lower 95 percent confidence limit approach is a conservative approach. And also, retaining 50 percent of control effect is not set in stone, and it depends on specific disease setting and the control effect that we are willing to give up.

We should also seriously consider conducting superiority studies with tamoxifen or letrozole as the active control comparator. Here are sample sizes estimates when tamoxifen is used as the comparator, and assuming new drug product X will have an effect similar to letrozole. If response rate is used as the endpoint, then to demonstrate superiority, a total sample size of 586 patients are required. Whereas, if time to progression is used as the endpoint, then a total of 200 events are necessary to detect a significant difference. In both cases, a power of 80 percent and a one-sided alpha of .025 is assumed.

The important points to be kept in perspective in designing future first-line metastatic breast cancer trials are that the active control letrozole effect is estimated from a single, large, well-conducted randomized study, which has shown convincing evidence of superiority. However, we do not have information on between-study variability, and it is possible that the effect size is overestimated.

Secondly, the effect size with respect to time to progression is letrozole versus tamoxifen, and not the way we generally define active control effect, which is comparing control to placebo. In general, when the active control effect size is estimated from a single study, if a non-inferiority study is being considered, then it is advisable to use a conservative approach.

Furthermore, we do not have data to estimate tamoxifen effect versus placebo with respect to time to progression.

Also, when we are considering especially non-inferiority trials, replication of well-controlled randomized studies are mandatory. To prove non-inferiority compared to an active control, a large number of patients are necessary. And if time to progression is used as the endpoint, then even more patients will be necessary.

In conclusion, the issues that need to be discussed for designing future trials in first-line treatment of metastatic breast cancer are: Should we conduct studies where the new drug product X is superior to letrozole or drug X is superior to tamoxifen? Or should we conduct future studies as non-inferiority trials, comparing to letrozole, since letrozole has shown superiority over tamoxifen?

If in fact we are considering non-inferiority trials, should we preserve 75 percent of the active control effect or 50 percent of the active control effect? That is, how much of the active control effect are we willing to give up?

The other important issue is regarding selection of endpoint, given that both response rate and time to progression are surrogate endpoints. We are waiting for updated data on survival from the letrozole-tamoxifen study. If in fact letrozole demonstrates superiority over tamoxifen with respect to survival, then should we consider survival as the endpoint?

Finally, I have presented to you the approximate estimated sample sizes using different approaches and endpoints. Given this data, are non-inferiority studies feasible?

Thank you.

DR. HONIG: So, in summary, some of the things that we want you to think about and discuss with us this morning are the fact that, as you've seen, tamoxifen has been the comparator most frequently used in the first-line setting. Is letrozole superior? Are all aromatase inhibitors superior? Should we consider some change in the comparator standard? What about the endpoints? We've traditionally used response rate; should we continue to use it? Should we change to time to progression?

If we do, because of the data set that you've heard described several times this morning, it would require non-inferiority to letrozole or superiority to tamoxifen, because of available data, and probably a larger sample size.

Finally, what about the trial design? Can we continue to ask for non-inferiority to any first-line hormonal drug, or does superiority to tamoxifen need to be demonstrated, either directly or indirectly?

So, with that perspective, we would like to then turn the session back over to Dr. Nerenstone for some discussion and questions to the committee. Thank you.

DR. NERENSTONE: Why don't we start with actual questions from the committee to FDA about the presentation.

Dr. Blayney?

DR. BLAYNEY: I would like to make a statement. I congratulate the speakers on their archaeologic investigation.

But I think I was a little disappointed that the discussion was not framed more broadly because we're really talking, as I see it, as focusing on anti-estrogen therapies. But this discussion could have, as a prototype for drugs and agents, which are given with the intent of targeting a defined receptor which has a high affinity ligand, like the estrogen receptor in this particular instance, but I think much more broadly, these agents are orally available and given chronically. So, this says something about how the endpoints are determined. Often these agents are naturally occurring or analogs of naturally occurring agents. They have minimal acute toxicities and can have substantial long-term or cumulative toxicities and difficult-to-measure endpoints.

And as I see this committee's work over the next two or three years, there are a lot of agents we're going to be asked to render advice on that could sort of fit that construct, and it goes much more broadly than just the anti-estrogens or the compounds that target the estrogen receptor. I'm wondering if you're using this as a prototype for those kinds of regulatory discussions or not?


DR. PAZDUR: No, we're not using it as a prototype for future agents. We really have a concrete example or a concrete discussion here on hormonal therapy of breast cancer, particularly first-line therapy.

Obviously, we don't develop drugs in a vacuum and interpret the results. So, could the results of this discussion potentially have effects on future clinical trials of agents that may be more in the cytostatic area? Possibly they could. But our real attention now is to focus on the hormonal therapies, and that's why we brought this to the committee.

DR. BLAYNEY: You could substitute herceptin for everything that we heard about this morning. And you chose to view that as a chemotherapy agent. Is that right?

DR. TEMPLE: Here, again, that's handled by CBER. So, you will have plenty of time to question them tomorrow on their approvals on the drugs that they regulate. However, we really want to focus on the hormonal areas here.

Again, I can't say that this will never impact what we do in other areas, but there are specific issues with which comparators we use, how do we look at sample sizes in hormonal therapies. And breast cancer and other diseases probably have to be taken in some type of perspective. The larger number of patients that have this disease would perhaps reflect on the sample size that one would be willing to commit to.

So, I think to just make broad statements regarding classes of drugs without any definition of disease has some limitations. And for this purpose we really would like to focus on the breast cancer issue and that's why we made this quite specific.


DR. TEMPLE: This is also something of a historical oddity. With the continuing advice and counsel of this committee for a wide variety of agents, whether cytotoxic or cytostatic, we've been told over and over again that meaningful clinical endpoints -- such as survival or symptomatic improvement -- are what is needed. There has been even skepticism about time to progression. When we have put that forth as a possible endpoint, we've mostly had our head handed to us, I would say.

But here is a longstanding practice of approving based entirely on response rates. Well, for the first time, one drug seems to have been shown to improve something that some people would say is more clinically meaningful. I'm sure you could have a debate about that, too. And so we are asking about what to do now that the ground may be shifting a little bit. How does the committee feel about what we used to do? And we are also trying to point out what the alternatives involve in terms of sample size assumptions and difficulties. Because, as you can see, the studies get up to pretty large numbers pretty quickly once you leave response rate.


DR. GEORGE: I have a question about the survival. There was a statement made somewhere that survival was being considered as a safety issue, not an efficacy issue. And I didn't understand that comment, so I need a little explanation of that. And just to be clear, what would happen if the letrozole results come in with inferior survival at this point?


DR. HONIG: Let me take the first part of your question first. When we said it's a safety endpoint, it's more that because we've considered response rate to be a sufficient endpoint in and of itself, we have not required the very large studies that would be needed to show that survival was not inferior in a strict statistical sense. When we said they have been submitted as a safety concern, it has been to make sure that they are approximately similar and, as you mentioned, that one drug is not clearly worse than the other.

If letrozole were inferior, I think we would be analyzing the data carefully and probably coming back to the committee. That is not impossible, I suppose, but would be a little bit different from what we have observed in most studies, where you're ahead on response rate and time to progression and it would be unusual to see a survival decrement.

DR. NERENSTONE: Dr. Przepiorka?

DR. PRZEPIORKA: Dr. Honig, just for the record, could you please let us know how the FDA deals with deaths when determining time to progression?

DR. HONIG: I'm not sure that we have a blanket standing on that. Survival is analyzed separately. And generally, what I think most of us have done for time to progression is we have tried to use -- and other people can chime in if they wish -- tried to use the date that patients were last evaluated. It's a little bit difficult if someone is lost to follow-up and then you get a death data that's significantly longer and then use that as a progression date. You don't know what's happened to them in the meantime. If there has been careful follow-up, it doesn't appreciably affect your analysis of that outcome. But that's, I think, what we have generally tried to do. They're censored at the date of last evaluation.


DR. TEMPLE: They're censored at time of the last visit. They are not considered to have progressed. People could argue about that. They could say, well, at least consider them to have had the event when they are dead. But, as Susan said, if you don't have good follow-up, that may be giving them a little extra credit for time to event. So, there is a controversy about that, I would say. It's much better to follow everybody well.

DR. PRZEPIORKA: So, potentially, if someone could be dying of an effect of their cancer, despite the fact that they don't have an objective increase in tumor size, that would be censored, that would not be considered progression.

DR. TEMPLE: Yes, that's right. That's why there is an argument about it. That doesn't seem entirely satisfactory either.

DR. PRZEPIORKA: And historically, have you not considered failure-free survival rather than time to progression? And what are your objections to using failure-free survival, i.e., progression or death?

DR. HONIG: We have always considered them to be really two separate things. I mean, we're looking at survival/death versus time to progression. Time to treatment failure, which is a little bit different, we have tended not to look at so much because we think that it combines a number of different aspects that can be difficult to sort out. Did someone fail treatment because of an adverse event? Did they progress? Did they just not want to be in a clinical trial anymore? We have tended to limit it to time to progression and survival.

And again, time to treatment failure actually came up at our cytotoxic time to progression meeting where, again, it was discussed and felt to not be a particularly valid endpoint. I think Dr. Swain specifically mentioned that in her talk.

DR. TEMPLE: But progression-free survival, if you thought you had reasonable access to people and would know if they progressed, would be a very attractive endpoint. I don't think there is any doubt about that.

DR. NERENSTONE: Other questions? Dr. Sledge?

DR. SLEDGE: Typically, response has been defined as CR plus PR. But in the breast cancer literature for most of the last decade, the literature refers to CR plus PR plus stable disease for, say, 6 months or longer as sort of a clinical benefit endpoint. Does FDA, in analyzing these studies, consider stable disease greater than 6 months as a meaningful endpoint?

DR. HONIG: We haven't to date.

DR. TEMPLE: Of course, time to progression endpoints capture some of that.

DR. NERENSTONE: I have a question. Just on the basis of the letrozole studies, which a lot of the committee has seen in some detail, the number of patients involved was quite large. In fact, almost as large as two independent studies with some of the other aromatase inhibitors. How strongly do you feel that what I thought was quite a significant and powerful improvement over the tamoxifen, in terms of time to progression and response rate, has to be repeated before the new drug can be used as the new comparator?

DR. PAZDUR: I think this is why we are bringing this to the committee, to get your opinion regarding the data that we presented or was presented previously. So, this is open for discussion.

DR. NERENSTONE: So, you don't have a preconceived notion? Because some of the presenters did say something about a single randomized trial is not enough, or the implication in terms of regulatory requirement.

DR. TEMPLE: This is an ongoing debate we have all the time. There is a lot of situations in which people no longer want to use placebos or no longer want to use therapies that are considered inferior. So, the question is, how can you use the available data to set a non-inferiority margin? Well, when you only have one study, that's a formidable task. You have to make a lot of assumptions about constancy and all kinds of things.

As you saw, though, there are more and less conservative ways to use the data you have. If you use the point estimate, with its variance, that does not take into account any change really. So that sometimes you resort to a relatively conservative measure like the 95-percent lower bound, which in this case isn't very far from the point estimate because the study was very large. And the response rate lower bound is 26 percent and not 30; that's not so far. That is a more conservative way to use a single study to set your non-inferiority margin. But there is very little experience with this in either the oncology or non-oncology world and it's an important current problem.

DR. NERENSTONE: Dr. Sridhara?

DR. SRIDHARA: I was just going to complement what Dr. Temple was saying just now, that whenever we are designing non-inferiority trials and we have just one study, then we don't have between-study variation that we can get from several studies. So, the effect probably is there, but is it really as big as it is seen in this one study? That we can never tell. And so we will have to take the conservative approach, which will really blow the sample sizes quite high. But certainly I think there is effect, and the p values were pretty strong in that study.

DR. NERENSTONE: Yes. Dr. Henderson.

DR. HENDERSON: I have two questions. The first one, which maybe you should ask you before I ask the second one, is what you are proposing here is a real sea change based on one trial. And I think you have acknowledged that. But is there any other time that you can think of in regulating any of the cancer drugs where such a huge change has taken place on the basis of a single study? Have we done that before?

DR. TEMPLE: Well, you could say that some of the tamoxifen adjuvant therapy places a new burden on everybody to do something. Of course, there were multiple trials, even though we relied on one or two ourselves. But that sort of changed everything. You really had to be at least as good as tamoxifen from that point on. But, to be fair, that's more than one trial.

DR. HENDERSON: That's an interesting answer, because you have switched. In that particular answer, there are two things that are quite different, of course. One is that you are talking about survival in a population where many patients are going to have potentially very, very long survival, and that is really the only endpoint in that particular setting.

DR. TEMPLE: Well, oddly enough, not initially. Our initial approval was based on time to progression. It was only the meta-analyses that allowed one to conclude that you actually had a survival effect. So, I don't know how important you think that distinction is.

DR. HENDERSON: Good point. I think that is relevant.

The second thing, of course, is that in the metastatic disease setting, where you are dealing with patients, 98 to 100 percent of whom are going to die, the major issues become much more complex in terms of the way physicians go about making decisions. Which kind of leads into the second question that I wanted to ask. And that is the implication of what you're saying is that you are saying that drugs could be approved in one of two ways, either by equivalence to letrozole or superiority to tamoxifen. So, you're saying that if we have a new hormone therapy that is equal to tamoxifen, at this point it wouldn't be approved.

Now, if it were equivalent, and let's say you had very tight confidence intervals. Let's say you are losing only 3 or 4 percent of your control or 10 percent of your control, but really tight confidence intervals and a robust data set. You are saying, if that were equivalent to tamoxifen, even though tamoxifen is on the market, and even though anastrozole is on the market, that drug wouldn't be approved because it is equivalent to tamoxifen.

But under those circumstances, wouldn't the drug fulfill the fundamental requirement that I always come back to, that it is effective, it's as effective as tamoxifen, and it is safe? As a class, all of these hormone drugs -- as a cancer doctor -- are remarkably safe, compared to everything else we use. Would we want to be in a position to say that that drug could not be used by patients?


DR. HONIG: That is really what we are asking today: Is it still enough? Do you think that this finding in the letrozole study is so clinically convincing that you don't think it is appropriate to compare it to tamoxifen, that you really do need to be better? It also gets into some, I guess, ethical questions, really, but that is what we are asking you.

DR. HENDERSON: That's what you are proposing, either/or one of those two approaches; so there are many more options that we could get to?

DR. HONIG: That's right.


DR. TEMPLE: Can I just add something. If the only thing you had was the difference in response rate, we might become quite uncomfortable saying, oh, well, you have to achieve the somewhat better response rate, because you wouldn't really know how much that mattered. It is the improvement in time to progression that raises this issue most strikingly, because, in some ways, that's the point and seems more important. If you had an increased survival, it would be almost obvious that you would use the better drug, unless it had unacceptable toxicity. So, there is some graded response.

I mean, as an agency, we don't generally try to impose relative effectiveness requirements, but we make an exception when relative effectiveness has something to do with things like survival and other important endpoints. Whether time to progression is in that category is part of wlook at all AIs as being equivalent. So, I guess my answer is no.

MS. MAYER: I would vote no as well for the reasons articulated by Dr. Lippman.

DR. ALBAIN: I strongly vote yes, because I believe that the North American trial for anastrozole and the letrozole trial are giving you nearly identical results. You have, I said it earlier, a time to progression that's identical for the tamoxifen arms and an almost identical time to progression for the two AI arms. So, I vote yes.

DR. KELSEN: And I would also vote yes, because they are at least equivalent to, and may be superior to, tamoxifen and less toxic.

DR. NERENSTONE: Dr. Blayney?

DR. BLAYNEY: And I vote no.

DR. NERENSTONE: 4 yes, 9 no.

Dr. Temple?

DR. TEMPLE: Well, I have a practical question. What I hear from the people who thought any one of the members might be reasonable, or at least more than one might be reasonable -- does anybody want to just briefly comment?

My presumption is that if time to progression is going to be the endpoint -- that's a big question, of course -- then we'd have to do some sort of pooled analysis or meta-analysis and estimate a class effect based on the letrozole study and the favorable anastrozole study, presuming that must be the one that's true and the other one must be wrong, and devise a sort of combined rating. Because you have to have a margin to do a non-inferiority study. You have to. You can't do one without it. So, we have to develop one somewhere.

I just wondered if people had thoughts on what that would be and how to do it. But maybe you are presuming response rate should still be the response. So, I don't know that.


DR. ALBAIN: You would have to, though, if you are going to do a -- I hesitate to use the term "meta-analysis," because it's really not a proper use of the term when you have two trials. If you are going to do some sort of pooled analysis, though, you'd have to focus on the receptor-positive ER or PR in both of those trials, not just take it by trial. And it was based on that I voted yes, from those analyses of those trials.

DR. TEMPLE: Okay, but the thought would be that somehow we could -- and it's not clear whether we could but we would try -- to develop an effect size for time to progression that would then be the basis for a non-inferiority trial using at least more than one of the aromatase inhibitors.

DR. PAZDUR: But if we move ahead in future discussions with companies in limiting the inclusion of patients to only ER-positive patients, we are really going to have to look very carefully at what is the effect size, and go back in many of these trials to take a look at that population. I think that needs to be real clear here.

DR. NERENSTONE: Dr. Lippman?

DR. LIPPMAN: I would strongly urge against using pooled analyses to get these endpoints. Although I'm a big believe in molecular targeted therapy, no question about it, the more we know about different agents that work on the same target, the more we find out that they are not all the same. Not all cox-2 inhibitors are the same, not all things are the same. And I just think if we abandon the normal way that we develop drugs because we get kind of sucked into this, oh, they all work on the same target, we are going to get burned.

DR. NERENSTONE: Dr. George, do you have any comments about the statistical questions that were raised?

DR. GEORGE: Yes, I think the one that Rick raised is especially important with respect to the features or the eligibility requirements on the subsequent trials -- are they going to be different than past ones? -- that has to be definitely taken into account in how you do things. Receptor status is an obvious one, but there may be other ones, too. And so that means what you are doing is going back and not only pooling but doing various types of subgroup analyses.

It would be pretty tricky to support what the effect size is, which you do have to do to do a non-inferiority trial. It is going to be subject to even more uncertainty than what we have right now, I think. So, that's the tricky business.

DR. NERENSTONE: Dr. Sridhara?

DR. SRIDHARA: Just so that it's clear, in the letrozole study, 65 percent of them were ER positive; and in the ODAC presentation, they did show separately in the ER-positive patients what was the time to progression and what was the time to progression in the unknown category. It was almost identical. So, the differences were very similar in the two groups.

DR. NERENSTONE: Dr. Carpenter, you had a question?

DR. CARPENTER: Would it be helpful if we voted on an either/or, either superior to tamoxifen or non-inferior to an aromatase inhibitor?

DR. NERENSTONE: That's the next question.

Dr. Albain?

DR. ALBAIN: I may be totally out of order, but might we hear from our esteemed colleagues to the left, how they might have voted had they been voting members on these questions? Or is that out of order?

DR. NERENSTONE: In other words, you want to hear what persuasive arguments they would have?

DR. ALBAIN: Yes, I would like to hear how they came out on this, too.

DR. NERENSTONE: That's fine, sure.

DR. DAVIDSON: I've already changed to page 2. But going back to page 1, I would have voted no for both. I actually came in thinking I would vote yes to the notion of using the aromatase inhibitors as a class as a comparator, but I was persuaded by Dr. Lippman that we don't know as much as I think we might know.

DR. HENDERSON: I would have voted no for both of them. And probably the single most important factor is the point that I made, that is, that I distinguished regulatory from non-regulatory functions. And I think that the FDA serves a great purpose by forcing us to generate the kind of information that helps physicians and patients make decisions. So, anything that allows them to do that and makes them more powerful in that regard I'm for.

Things that restrict our ability to utilize drugs or to take that information, or that possibly might bury a future drug that would have promise -- because I think it's even after the initial approval that we really find out about most cancer drugs. It's 5 years, 10 years and even 30 years afterwards, when the patents have run out and so on that we still continue to learn. So, I would like to make sure there are as many drugs out there as possible, but that there is also a lot of data to support them.

DR. NERENSTONE: Okay, let's go to question number 2: If the committee believes that any first-line hormonal agent is an acceptable comparator, should new agents be required to demonstrate superior efficacy to tamoxifen, either by direct comparison to tamoxifen or by non-inferiority analysis compared to letrozole? What are acceptable comparators for non-inferiority designs?

DR. ALBAIN: I have a question about the question.

DR. NERENSTONE: Yes, Dr. Albain?

DR. ALBAIN: So, not anastrozole, only letrozole. Do you mean to say letrozole only there?

DR. HONIG: The problem again is that to use anastrozole for a non-inferiority comparison, it's going to be even more difficult to set the margin and find the effect size than it is for letrozole, where at least there's one large significant study. But then we asked, what are acceptable comparators for non-inferiority design? So, the first part would be voting and the second half of that would be some discussion of what's appropriate.

DR. NERENSTONE: And you want us to vote?

DR. HONIG: At least on the first part.

DR. NERENSTONE: On the first part. And maybe if you want to comment then on the second question at the same time, we can go around. Dr. Kelsen, would you like to start?

DR. KELSEN: Since we said no-no the first time around, then I think any hormonal agent would be acceptable. That would be tamoxifen or any of the aromatase inhibitors.

The implication I get from the second part of that, though, would be that you would no longer approve a drug like Arimidex because it would be not superior to tamoxifen. So, a new Arimidex would not be approved unless it showed superiority to tamoxifen. And you would only approve a new agent of this class if it was at least as good as letrozole. And I think that that's the way that is written. Am I right?

DR. HONIG: The first part is asking what do you think. Must everything be superior to tamoxifen, or would you still accept non-inferiority to tamoxifen?

DR. KELSEN: I think, from our answer to 1b, the answer could be not inferior to tamoxifen, as good as tamoxifen. That's the only way you can interpret the implication of that. Otherwise you are saying that it has to be superior to tamoxifen, but it only has to be as good as letrozole. The logic of that seems to me to be that if you're as good as letrozole, you get approved; if you are inferior to letrozole and only as good as tamoxifen, you don't get approved. That Arimidex would never pass this bar. Maybe I'm misinterpreting.


DR. TEMPLE: Again, although we didn't ask -- in retrospect, perhaps we should have switched the order of these. But there isn't any way to have a non-inferiority study to tamoxifen on time to progression. That's not meaningful. We don't know what the effect is.

DR. KELSEN: Because you don't have the data.

DR. TEMPLE: So, the only interpretable result there would in fact be superiority. Response rate you still could if that were the right endpoint.

DR. KELSEN: Wouldn't that mean that we would, for the purposes of discussion, be saying that because the only data for time to progression is from letrozole, that will be the new standard for time to progression; am I misinterpreting that, just for the statistical reasons?

DR. TEMPLE: No, you could still show superior time to progression -- maybe anastrozole could be used too; that's been something we've been discussing. But you could also be superior to tamoxifen. That would be informative about time to progression. That's an interpretable study. That doesn't mean necessarily you're as good as letrozole, but it probably actually does. But that would be an interpretable finding.

What's not interpretable is a time to progression endpoint using tamoxifen, because you have no good estimate of the effect of tamoxifen on that endpoint.

DR. KELSEN: How are you going to design the trial? You're going to have to make some estimate to have the superiority assessment.

DR. TEMPLE: Oh, you just make it up, the way people do for power calculations. You say a 20 percent improvement, then you run the trial.

DR. KELSEN: To what you actually see?

DR. TEMPLE: Well, the results give you the answer. The power estimates are how you choose your sample size, and we all know people just make those up.

DR. KELSEN: Don't we have really good time to progression data from the aromatase inhibitor trials for tamoxifen in that control arm? You showed us 6.4 months or something like that in two trials.


DR. TEMPLE: Yes, you know that, but you don't know whether that's better than nothing, because there is no placebo in these trials.

DR. NERENSTONE: Just a clarification. You haven't asked the question: Should tamoxifen non-inferiority still be acceptable? Isn't that the first question you need to get answered?

DR. TEMPLE: Well, in retrospect, yes.

DR. KELSEN: That's what I'm trying to figure out.

DR. NERENSTONE: Well, maybe we should give you a new question 2a.

DR. TEMPLE: It's really on what endpoint is acceptable now. That's really what determines that. If response rate is acceptable, then you could do a non-inferiority study to tamoxifen. It would be interpretable easily, just like all the past ones have been. If time to progression is needed, then really you can't do a non-inferiority study except to something where you know the effect size, which includes letrozole and, conceivably, anastrozole.

DR. NERENSTONE: Do you want to hear the answer to the first question in terms of a poll? Do you want to simplify the question?

DR. HONIG: Would you rather answer question 3 first, which asks the endpoint question?

DR. NERENSTONE: No. I think the first question is: Is tamoxifen still an appropriate standard? Before talking about what the endpoint is, maybe we should just talk about the drug, the specifics.

DR. KELSEN: Yes. My answer to that was yes, based on our saying no-no to the first one.

DR. NERENSTONE: Well, wait a minute. Let's ask the question. The question then is: Is tamoxifen non-inferiority still acceptable as an endpoint for new drugs?

DR. PAZDUR: Yes, that would be fine. And we're talking about response rates here, because obviously you can't look at time to progression.

DR. NERENSTONE: So, non-inferiority of tamoxifen?


DR. NERENSTONE: In first-line metastatic breast cancer, is the use of tamoxifen as a comparator in a non-inferiority study with response rate acceptable for new drug comparison?

DR. KELSEN: I'll try to explain what I'm going to say. Because our answer to 1a and 1b was no, then I think the answer to this has to be yes.

DR. NERENSTONE: It doesn't have to be yes, because you could say it has to be better. This is saying non-inferior to tamoxifen. It's not saying that it's tamoxifen or nothing; it's saying non-inferiority.

DR. KELSEN: Right. So, we're changing the way 2 is written. I voted yes for 1b, and I would vote again here that it has to be better than tamoxifen. We already have two drugs that are at least as good or better than tamoxifen.

DR. NERENSTONE: So, you're saying?

DR. KELSEN: It should be superior to tamoxifen.

DR. NERENSTONE: Superior. So, you would not accept non-inferior to tamoxifen?

DR. KELSEN: I would not accept not inferior to tamoxifen. Does that make sense?

DR. NERENSTONE: So, just to make sure that everybody is clear: Is tamoxifen as the comparator in a non-inferiority trial acceptable as the comparator?

We're not talking about the endpoint. Right now let's just talk about the drug. And the first vote is no, it's not.

Dr. Albain?










DR. GEORGE: No. But my answer is tied up with the endpoint.

DR. REDMAN: No. And I have difficulty with the endpoint.



DR. NERENSTONE: Do you want to explain that, Dr. Blayney?

DR. BLAYNEY: As I said, tamoxifen is a drug that we have used for 20 years, that is well accepted in clinical practice. We know what its side effects are. Those side effects often emerged 10 years into its use. And I would be reluctant to abandon it as a comparator even for a non-inferiority trial, given the amount of time it takes for long-term toxicities to emerge.

DR. NERENSTONE: So, the vote is 1 yes, 12 no.

Could we go to perhaps number 3 now? Does that make more sense? We've accepted in general that tamoxifen is still an appropriate comparator, and now this really addresses the endpoint that we are going to be looking at.

Just to read it: Tamoxifen has not been demonstrated to affect time to progression or survival in randomized control trials. Because approval of subsequent hormonal therapies was based on non-inferiority or similarity to tamoxifen, time to progression data are not available for these agents. Anastrozole demonstrated superior time to progression to tamoxifen, but in a single small study; no difference was observed in a second larger study. Letrozole demonstrated superior time to progression compared to tamoxifen; however, data from only a single trial are available to estimate the effect size. A change in primary endpoint in the regulatory setting from response rate to time to progression would require trial designs that demonstrate superiority to tamoxifen or non-inferiority to letrozole.

As we talked about, that's the only one that has data that people think is reliable.

For the first-line treatment of metastatic breast cancer with hormonal agents, should the traditional endpoint of response rate be replaced by time to progression?

Dr. Kelsen?

DR. KELSEN: So, my answer would be I would replace it with time to progression. And I do think that since we voted against tamoxifen non-inferiority, and said it has to be superior to tamoxifen -- and it sounds to me like you have good data for time to progression from the two previous studies -- you do have baseline numbers to draw your statistics from, since it must now be superior to trials in which tamoxifen was the control arm and you have at least two fairly large trials with time to progression in the 6- to 7-month range. And that will be your "got to be better than that."

DR. NERENSTONE: We can go around. That's a yes?

DR. KELSEN: That's a yes.


DR. ALBAIN: I would replace it also, a yes. Because I think that with this class of drugs, you can do so much with prolonged stable disease, into several years sometimes, with truly hormone-dependent cancers. And response rates do not capture that value for this class of drugs. And I think if you can design trials with placebos, that will add rigor to the time to progression endpoint.

MS. MAYER: I would vote yes.

DR. ALBAIN: I'm sorry. Strike "placebo." Double-blind is what I meant to say.

MS. MAYER: I would vote yes. I think it's crucially important that these trials look at response rates in women who have bone-only metastases. This is a very large group of women who, as Dr. Albain says, often respond for significant periods of time and cannot be measured in response rate studies.

DR. LIPPMAN: I'm assuming that when we talk about endpoints here, we are talking about the primary endpoint of the study.

I have mixed feelings. I'm leaning towards voting no because by making time to progression as the primary endpoint, I think we've heard that the sample sizes become very large. And I think that clearly time to progression should be a prespecified major secondary endpoint.

I would be comfortable approving drugs as non-inferiority to letrozole based on a design of response rate. And as we always do, we look at these other endpoints to make sure that they're consistent with that, although maybe not reaching statistical significance. So, in that context, I would vote no.

DR. NERENSTONE: Scott, except I'm going to remind you that we lost. So, the comparator is not going to be letrozole; it's going to be tamoxifen. And under those circumstances --

DR. TEMPLE: It could be.

DR. NERENSTONE: It could be. And if it is tamoxifen, is response rate still appropriate?

DR. LIPPMAN: If it is tamoxifen, I may take the approach Dr. Blayney took earlier on this and think about it a little longer. I'm going to defer my vote.

DR. CARPENTER: I would vote yes, because I think while there are problems with sample size and there are problems in the precise way that you define non-inferiority, that time to progression comes closer to the benefit that we're trying to measure than response rate does.

DR. PRZEPIORKA: I think there are two questions here posed into one, and I can answer one of them but I can't answer the other. One of them is: Is time to progression a more clinically meaningful endpoint than response rate? To which I would answer yes. And so my answer to 3 would be yes.

And the other question which I am inferring is: Are there statistics available to do that sort of a study? And I don't know that all statistics have been provided to the FDA or that all data has been published. And so I don't know that the committee is in a position to actually address that particular question.

So, my answer is yes to the question of clinically relevant endpoint.

DR. NERENSTONE: My answer is yes. And I think it's a strength that these trials need to be larger. Because then we're going to have more acceptance of the results. I think it may change our requirement of two trials. But if you have one 1,600-patient trial, then I think you can believe the answer and feel that it is going to be applicable to a wide variety of patients. So, I'd say yes.

DR. SLEDGE: I vote no, because I think this is a false dichotomy. I mean, do we have to have either/or? As I mentioned earlier, clinical trialists in breast cancer over the past decade -- if you look at many of the trials published in the past decade -- look at CR plus PR plus prolonged stable disease as an endpoint. That strikes me as being an endpoint that is fairly similar to time to progression in many ways, in terms of what you are actually looking at. I would personally be comfortable with either of those endpoints, or both.

DR. PELUSI: I would vote yes.

DR. GEORGE: I would vote yes, but a couple of comments. The reason I'm voting yes is I don't particularly like response rate all that much as an endpoint.

But to take up what Dr. Sledge just mentioned, if I were looking at this, I would personally like to see non-inferiority -- if that's what it was -- in at least both of these endpoints, and maybe others. I still worry about survival, for example, that seems to get sloughed off as something that's not easily looked at or treated as a safety issue. I would like to see non-inferiority in all of these things. The effect of that on the sample size, I would hate to imagine.

The other thing did have to do with the sample size, as Dr. Sridhara mentioned earlier. I think that the things that were presented -- and she can correct me if I'm wrong -- are the result of some internal research that's going on, and there could be some refinement in some of those things. That's the only way I could put it now, but maybe the sample sizes are not quite as big as some of those presented.

DR. REDMAN: I believe time to progression is a valid clinical endpoint. The idea of superior to tamoxifen versus non-inferior I think is a difficult one, and we're probably not going to be able to answer that. But overall to the question, I will answer yes.

DR. TAYLOR: I would vote yes, but I don't think we should ignore the response rate. It should not be excluded.

DR. BLAYNEY: As the question is phrased, I vote no. If the question was phrased, "could TTP replace in a predefined study design," my answer would be yes. I think TTP is a valuable endpoint. So, it would be up to the sponsor to specify that. And if they specified TTP as their primary endpoint and it could meet the study number hurdle, as my colleague across the way pointed out, it would expand to bone-only disease and increase the number of women who might be available for such trials.

DR. NERENSTONE: Is that a yes or a no?

DR. BLAYNEY: As the question is phrased, it's a no.

DR. NERENSTONE: And, Dr. Lippman, do you want to weigh in?

DR. LIPPMAN: Again, I am leaning towards the response that Dr. Sledge gave, and for the exact reasons, and the point that you raised about the issue of one versus two trials. I am concerned, if we say that the trials have to be designed on a time to progression endpoint, we are requiring two pivotal trials. It could be up to 3,000 to 5,000 women.

So, I think either endpoint, although I believe that time to progression is very valid and I would be very concerned if we saw a barely significant response difference and time to progression was going the wrong way. And that's what we do when we review these. So, I think having time to progression prominently as a prespecified secondary endpoint, but with designs being on response rates would be acceptable.

So, I guess my answer is no.


DR. TEMPLE: Just one point. We would always look at and do always look at overall survival, but because of crossovers and others reasons, we don't expect an effect. And since we don't know that any of these therapies have an affect on survival, we can't design a non-inferiority study literally. All you can do is take a qualitative look at it and conclude, oh, well, that doesn't look bad. And that is what we do. If it was going the wrong way, we would surely be very nervous.

DR. NERENSTONE: Do you need us to take a vote on question 2 as it's written?

DR. ALBAIN: What was the outcome of the last vote?

DR. NERENSTONE: Oh, sorry, the outcome of the last vote. 10 yes, 3 no.

DR. PAZDUR: I have a question. The way we wrote the change -- or in our paragraph -- I wonder if there is a de facto discrepancy kind of between question number 1 in our vote and question number 3. And I want to bring that up. Because when we took a look at the designs of non-inferiority for TTP, it appeared to us -- and I don't know if Raje wants to comment on this -- that it would be very difficult to do this with Arimidex based on one trial, in a relatively small trial.

If we are demanding basically TTP be the endpoint, we are virtually saying that the new comparator here is being letrozole if it's for a non-inferiority analysis, and could people comment on that?

DR. NERENSTONE: I'm entirely consistent, but I understand your point. That's right.

DR. PAZDUR: I just want to know if people know that. There is a discrepancy here. And accepting one, the TTP endpoint via clinical methodology problems -- and perhaps Raje wants to comment on this, the problems with Arimidex in a non-inferiority analysis.

DR. SRIDHARA: Yes. I think although some of the TTP sure has to do with looking at superiority to tamoxifen, I think the words have also come to look at superiority with respect to tamoxifen and TTP. So, in that sense, the trials are really not large when you are looking at superiority versus tamoxifen.

With respect to non-inferiority, yes, we have data from one trial with letrozole on TTP. That's the best data that we have.

DR. TEMPLE: So, as a practical matter and not because there is a theoretical belief that letrozole is better than anything else, if you were following the non-inferiority route to approval as opposed to beating tamoxifen, there really wouldn't be much choice in any way that we can figure out yet. Someone could make the case that Arimidex really has enough data or something, and then it could. But without that, it would be very difficult to write a design that had a credible non-inferiority margin.

DR. PAZDUR: That's the point I was trying to make. So, I want people to have that analytical -- we're making a de facto type of suggestion here as far as the comparator, which is somewhat in contrast to the previous vote, but this is just on the practicality of the information available to us.

DR. NERENSTONE: I think the committee's feeling is that it should not be required just because of lack of data. But if it is a de facto requirement because there is nothing else, then they obviously voted that that was okay. But in a purely scientific way, they felt uncomfortable saying that this guy was the winner.

Am I adequately reflecting the consensus of the committee? Dr. Henderson?

DR. HENDERSON: If I've heard things correctly, what you're saying is that -- let's just say that there is a company out there that has anti-progestational agent. And I don't know that there is such a company, but let's just say that somebody has an anti-progestational agent. And they're now down to their last 50 patients of accrual. And they come in 14-15 months from now and they've shown that in fact this new anti-progestational agent is only -- the lower limits are maybe 1 or 2 percent lower than tamoxifen.

What you're saying is that that wouldn't be approvable now. Is that really what the committee is trying to -- I mean, as I read the votes, that is what you've just said, is that a company brings in this new anti-progestational agent, they have completed the study. These studies often take five to 10 years by the time you go through the development process and so on, and what we are talking about is fairly new. If you follow what you just said, you wouldn't approve that drug?

It's a hormone therapy. It would fall into this overall class, new treatments. It's kind of an interesting new area, anti-progestin. We don't have much of that, but we do know that that's a possibility. So, this is something that could in fact happen.

DR. PAZDUR: I think that's question 5, for ongoing trials. We will address this. It's coming up. We would like to hear your opinion.

DR. NERENSTONE: Do you need us to go back to number 2, or has that been discussed at length enough for you?

DR. PAZDUR: I think we can move on. There are several other issues that we need to cover.

DR. NERENSTONE: So, question number 4, questions about the definition of efficacy: In non-inferiority studies, it is important to know the size of the treatment effect of the comparator agent, and to decide on the amount of comparator effect that should be preserved when testing a new treatment. The estimate of the effect size, the amount of efficacy to be preserved, and the choice of endpoint influence the sample size of non-inferiority studies. Sample sizes may range from several hundred to many thousands of patients, depending on the combination of these factors.

Based on the committee's clinical expertise, what amount of the comparator effect should be preserved in a non-inferiority trial of a new hormonal agent in the first-line treatment of metastatic disease?

And this is open for discussion. And certainly our additional colleagues, please weigh in.

Dr. Henderson?

DR. HENDERSON: The problem with a question like this is that, with almost any drug, there are huge tradeoffs. So, let's just say, for example, that you had in either class, either the anti-estrogen class or the aromatase inhibitor class, a drug which, for some reason or another, had exactly the same outcome, or even smaller. Let's say you've lost 50 percent. But you have no hot flashes, for some reason.

That's a big tradeoff. I could tell you, lots of women out there would be willing to trade off 50 percent of the benefit to get rid of some of the negative effects, which we tend to discount when we are talking about tables but, I can tell you, when you're at the bedside, are not minor. So, if the patients didn't have these side effects, sure, they would prefer to have the drug that gives them the whole benefit guaranteed. But if they did have some of these benefits, they're quite willing to make a trade of that magnitude.

DR. NERENSTONE: Dr. Lippman?

DR. LIPPMAN: For that reason that was just raised, I believe that we should cast the net sort of broadly. It's hard to pick an exact number -- 52 percent, whatever -- but I think 50 percent seems to be reasonable, again, for the reasons mentioned. Because part of what we do is evaluate all the other effects, side effects and other things that come up. And you would hate to have a drug not make it because it lost 52 percent of its effects but was much more favorable in terms of toxicity profile.

So, I wouldn't do 25; I don't know if I would do 75; I think 50 percent seems to be reasonable.

DR. NERENSTONE: Dr. Carpenter?

DR. CARPENTER: I would support that. That keeps your sample size from being prohibitively large. And if we make the bar too high, then a small company with a new drug that is very interesting may simply not be able to mount a study of that magnitude, even though they have an agent that really might be quite interesting. The 50 percent range gives us reasonably large sample sizes and does allow for the play of other clinical effects of the drug which, as Dr. Henderson pointed out, may be very important.

DR. NERENSTONE: Dr. Blayney.

DR. BLAYNEY: I think 50 percent is a good number, but I'd want some lower limit also. At 50 percent of a 20 percent effect, which is what you're talking about in tamoxifen, you're down in the noise. A 10 percent effect is probably equivalent to a placebo effect. So, I think you would need some lower limits on what that 50 percent cut was.

DR. NERENSTONE: Other comments?

DR. PAZDUR: Is the committee basing their decision on a clinical decision of a 50 percent retention or simply looking at the number of patients that would be entered on the trial? Let me just throw that question out.

DR. LIPPMAN: I think both.


DR. GEORGE: Well, I can't answer the clinical thing.

What I was going to get at in my comment was that these non-inferiority trials are so sensitive to things like the effect size and this effect retained, that it's very difficult to separate your clinical judgment from what effect that would have on the subsequent trials and ability to get anything approved. So, it's a really tricky business in that sense.

50 percent does have an advantage, but then it doesn't matter whether you're talking about the effect retained or the effect lost. It has a nice symmetry in that sense.


DR. SLEDGE: Actually I very definitely was doing it on a practical matter. To use one of the examples that Raje gave, a 3,500-patient metastatic trial would just simply not be doable in the United States in any sort of timely fashion. I can't image any drug company that would want to do it in the metastatic setting.


DR. TEMPLE: The tradition of how to set the non-inferiority margin is not lengthy and is still being worked on, and different answers have been concluded for thrombolytics and for other things. In oncology, we've actually felt that we could be a little less conservative because you do get to measure response rate, which is another independent measure of activity, and you might also feel similarly inclined, if drugs are within the same class, that you think you know something. You could call those Bayesian priors if you were inclined that way. So, we have, for example, not insisted on the lower bound of the 95 percent confidence interval which leads to stratospheric sample sizes and have instead, based on some internal analyses, done this gamma confidence interval, which leads to more reasonable sample sizes. They're still pretty large sometimes, but they're more reasonable.

The other observation I would make is that if you're talking about time to progression, remember we're only looking at the difference between the drug and tamoxifen. That's the figure we're looking at. So, preserving any portion of that doesn't get you down into the noise level really, whereas with response rates, if you had a half or 25 percent of 20 percent, you might not feel very reassured. This means that you are positive on what you're pretty sure is the time to progression effect of the other drug. So, there's less worry about meaningless things.


DR. PAZDUR: One of the other questions that I have that I'd like to see some discussion and potential vote on is two trials versus one trial. One of the aspects that we talked about in the presentations was that sloppiness obviously obscures a non-inferiority analysis here, and many times we have internally discussed the requisite of requiring two trials for confirmation of the results. We're talking about large numbers of patients here with the time to progression and a 50 percent retention, perhaps 1,500 patients in each trial.

Many sponsors have come up and wanted to split the patient numbers and combine these trials together. Obviously, there are problems in pooling results, which will be discussed on applications you'll see coming toward you this session of ODAC.

But when we're talking about confirmation of results and requiring and saying we're not going to approve a drug, because we are looking for two trials, with this magnitude of the number of patients, is this realistic? And I'd like some discussion on this.

DR. NERENSTONE: Dr. Lippman.

DR. LIPPMAN: Well, I agree with Dr. Henderson on this. I'd like to have my cake and eat it, too. I'd like a one well-designed, large, multi-center trial that's very compelling, and it would be nice to have a confirmatory smaller trial. We'd feel better. But I don't think we should mandate that.

Assuming that the application comes with other supportive data from biologic plausibility, particularly with these molecular targeting agents, phase II studies, some of this other data, I think in that setting, if they come in with one very well-designed and well-conducted large trial, I think that should be enough. I'd like it better if there were two, but I'd rather see that than what we're going to see later in this meeting of combining smaller trials and interpreting sort of apples and oranges. Then I think we really invoke the sloppiness point that Dr. Temple raised.

DR. NERENSTONE: My question is, how large is large? In this disease, 1,500 patients is not unreasonable. I think it probably is unreasonable to demand two of those kind of trials, but I think if you have an aggregate number, whether it's going to be that you have two trials that equal 1,500 or if you have one large trial, that's less important. The problem is some of the trials that we see are marginal to begin with and then they have marginal responses, and you never know, when you get marginal on top of marginal, what you have at the end.

DR. PAZDUR: When we do accept one trial, generally we do ask and discuss with the companies that there is going to be internal consistency and precision in doing the trial, both from a methodological and clinical trials' execution endpoint. So, that would be demanded basically.


DR. TEMPLE: We have a long document about when we are prepared to rely on single trials.

But one of the things that's often helpful is two really completely independent endpoints within the trial, so that, for example, if you're non-inferiority on time to progression and you also are looking good on response rate, that's two separate things. Those don't necessarily have anything that much to do with each other. They're sort of independent.

The other thing that's worth remembering is if you set the bar as requiring that you demonstrate retention of 50 percent of the effect, you may be statistically not so powerful for that finding, but you're very powerful for the presence of some effect.

So, those are all things we think about in an attempt to be reasonable.

DR. NERENSTONE: Dr. Henderson.

DR. HENDERSON: One thing I think is kind of a fact of life that the FDA has to deal with, but not necessarily a pleasant one, is that you're oftentimes not allowed as much flexibility as desirable. That is, what you do with one company you sort have to do with all the others.

But the best of all possible worlds is if you could sit down -- for example, the letrozole trial. Sometimes I encourage people to actually write a paper based on what they think the outcome of a trial is going to be and then see if they want to redesign the trial once they've done that.

For example, the letrozole trial. If you had anticipated these results, at least that was one of the options, and you realized that this would be a sea change, certainly regardless of the issue of the companies, in terms of how society would be best served, society would have been better served if we had two letrozole trials of that size because we're taking about such a momentous effect.

So, I think again it depends a little bit on the setting. It's just like on the last question, nobody emphasized it but, in fact, on question 3 it does say for first-line treatment of metastatic breast cancer. It seems to me it's reasonable to say that the barrier should be higher there and that larger numbers are more reasonable and they are also obtainable. Let's say you have an initial approval. You already have now positive data for that drug. Then going on for your next indication as a first-line indication, I think it is reasonable to get two fairly large trials in that setting or you wouldn't require that for your initial approval, let's say, in second- or third- or fourth-line therapy.

DR. NERENSTONE: Dr. Przepiorka.

DR. PRZEPIORKA: A question. Now that we have multiple drugs that could potentially be comparators, would you accept, instead of two large studies that looked at exactly the same study design, two different comparators, one in each trial?


DR. TEMPLE: But still you have to be able to do it. So, one non-inferiority against letrozole and one beat tamoxifen? We'd be delighted.

DR. NERENSTONE: Dr. Lippman.

DR. LIPPMAN: I think the issue with the one large trial and the issue of how large does it need to be, that's what I meant when I said by how compelling the data are and the issue of two independent endpoints. I think one large, well-done trial and large well-powered -- and these are kind of the discussions we have -- where the primary endpoint is significant, the important prespecified secondary endpoints are going in the right direction, this is part of the subjective reason we have this meeting. If it was boiler plate, we wouldn't need a meeting if they met these criteria, at least the ones that get brought to us. So, I think that if we determine that all of these are in the right direction and it's extremely compelling from all these directions, I don't think that we should require or mandate a second trial.


DR. GEORGE: I think it's clear. I think everyone understands that there's no definitive answer to the question of whether you should do one or two trials, but there are certain characteristics. That is, size does matter. It is good to have bigger trials.

But also, even if you do one trial, there are various kinds of internal consistency things that you would look for, you could look for, particularly in these multi-institutional trials, statistical techniques where you look at sampling again from the trial to see if you get the same kind of results and by institutions. And all these kinds of things that are normally done add weight to a single trial. If you saw something strange, for example, in it, that there was a vast difference by site, it may cause you to worry a little bit, more than if there was more consistency across sites and across other things.

So, there is no definitive answer, and I for one never have liked the idea that there has to be two trials. It certainly makes you feel more comfortable when you see two large, well-done trials, but that's an obvious point.

DR. NERENSTONE: Dr. Lippman.

DR. LIPPMAN: And I agree. I think when there is one large trial like this, we should really torture it. We should look at it by site. We should do all of these internal controls to convince us that the quality is very high and there aren't factors that may be biasing it.

My concern with the two-trial issue, quite frankly, at least the kinds that come to the committee often with two trials, is they're two sort of borderline trials. The power is not great. There are a lot of issues and bias. So, it isn't usually the case where we get two definitive, large-scale trials. Certainly given that kind of scenario, I prefer a very large, well-done -- what we've talked about -- trial where we look at all the issues and internal consistency and so on.

DR. PAZDUR: One of the problems that we face is you don't know a priori what the results for the trial area. Obviously, everybody goes in wishing their drug has an unequivocal p value that everybody has confidence in, that there's internal consistency. Then once you have kind of an iffy trial, you're five years down the line here, and nobody wants to be overburdensome in requiring two trials.

But the flip side of the issue is that if you do have that iffy trial where things aren't looking right -- maybe the randomization code wasn't quite on par; there are questions within the trial -- then you're stuck, and it puts everybody in a quandary of is the drug really effective, should we approve this drug, should it not be approved. And it's kind of late to start going back after five years. It puts the development plan tremendously behind schedule, et cetera.

I just want to get the flip side of the issue. Nobody wants to be overburdensome in requiring trials, but the flip side of that is it could definitely hinder a drug getting approved, as well as denying the American public, obviously, of potential effective therapies.

DR. NERENSTONE: Dr. Sridhara.

DR. SRIDHARA: I just want to make sure that you understand that if there's superiority in one trial, probably we have less concern about it when we see some effect. But when it's a non-inferiority trial, we are basing on many assumptions, percent effect that we want to retain, and what's the effect size, and so on and so forth. So, it all depends on how you have estimated and what you're doing.

So, in the setting of a non-inferiority trial, we have more concern when we have just one trial. But if it is a superiority trial, as I said, if it's a large study proving something definitive, then it's not that much of an issue.


DR. SLEDGE: I must say, having sat on the committee now for a while, whenever I've seen more than one trial brought before the committee, it has virtually always been that there's one powerhouse trial and then one mediocre, second-rate, smaller trial. While I agree with the sentiment that two trials are generally better than one, I'm not always sure that that's the case, particularly if the second one has broad enough confidence intervals that I'm not quite sure how to interpret it. It's the old question of if you're looking at home run hitting, do you learn more from Babe Ruth or from all of the 1927 Yankees? I suspect you learn more from Babe Ruth.

DR. NERENSTONE: Dr. Lippman.

DR. LIPPMAN: Exactly. That sort of indicated my point better than I did. In some cases that you may see, you get two small trials, but in many cases, you get one great one and one sort of gratuitous trial that's thrown in, huge confidence intervals, and you say it's consistent with almost anything you find in the big trial. So, I don't know if any applications have come in with two real pivotal trials designed in the same way.

But then the question about the statistics. you said that you would feel more comfortable with two trials with non-inferiority. Would you feel more comfortable with two small -- I won't say mediocre, but borderline -- trials, maybe the same size as the big one all together that demonstrate non-inferiority or one very large, multi-center trial that we've been talking about? It seems to me that the endpoint of whether it's superior or non-inferior really doesn't relate to the two versus one trial issue.

DR. PAZDUR: It does.

DR. SRIDHARA: It does very much. If you can flip over the slides actually, you'll see that for superiority, you don't require as many patients or as many events, but with non-inferiority, you do require larger events. Again, by their nature, the non-inferiority trials are always bigger than superiority trials. In whichever setting you are talking about, the non-inferiority trials are going to be larger.

But as I said, the fact is that we are basing it on some kind of estimates of what percent retention that we want to do, what's the effect size, none of which we have a good handle on. So, that's a kind of uneasiness that we would rather have two studies rather than one study.

DR. LIPPMAN: But just to clarify my point. It depends on how you design a study. If you have two smaller trials, based on non-inferiority on response rate, so they're larger than superiority trials, but not that much, not that large, would you prefer that situation? Again, two smaller trials with non-inferiority with broad confidence intervals and so on versus one very well-done large trial that shows non-inferiority.

DR. PAZDUR: It's impossible to answer that question. Basically, to summarize this, we look at the totality of the evidence that comes in, internal consistency within a trial, between the trials, et cetera. So, your question is very difficult to answer.

But getting back to I think a central point here, when we do have a non-inferiority analysis, it presents some difficulties to us. To put this in kind of crude terms, to show garbage equals garbage is not a difficult task. To show a superiority trial is a difficult task and it's an onerous one on the part of a drug to show that you're better. So, it gives us much more confidence.

But a poorly done trial, a poorly done control arm in a non-inferiority trial where the therapeutic effect may be relatively marginal, non-inferiority may not be that difficult to show because of the inconsistencies in the trial, the poor conduct of the trial, et cetera.

But one trials versus two trials. It's not so much the number here. At the end of the day, it's what is the convincing body of evidence that we have, and I think that is the focus that I wanted to end the discussion on.

DR. NERENSTONE: Dr. Henderson, did you have one other point?

DR. HENDERSON: It's just a quick one. George's analogy to baseball went by so fast I almost missed it. But I certainly wouldn't judge the future of Yankee batters on the basis of Babe Ruth.


DR. NERENSTONE: We have one more question I think, and we're running out of time.

There are many ongoing studies of hormonal agents for the first-line treatment of metastatic breast cancer. Some of these studies are designed to demonstrate non-inferiority to tamoxifen and some are designed to demonstrate non-inferiority to other approved first-line agents from various classes of hormonal therapies using response rate as the primary endpoint.

Are there any potential trial designs that would need to be changed based on the answers to the above questions? And the answer was yes, at least starting from now, depending on how retroactive you think it's appropriate to go back.

But let's open it up. Dr. Ohye.

MR. OHYE: It's Mr. Ohye.


MR. OHYE: Perhaps before I begin, I could say words of how I arrived on this committee because perhaps I'm a bit of a surprise or a mystery to some.

I was asked to serve on this committee by the Pharmaceutical Research and Manufacturing Association. They asked a group of retired pharmaceutical people if they would like to serve. And I specifically asked for this committee because I believe this committee and the Oncology Division have done more to relieve suffering of Americans than any other therapeutic group regulated by the FDA. I think you have done so by keeping the care of patients in mind primarily and not getting bogged down with the macro-elements of study data.

I also have a personal reason for being here: My mother succumbed to cancer at an early age, and I too am a cancer survivor.

That being said, I have only two points that the industry has asked me to bring forward. The companies are concerned about the consequences of changing either controls or endpoints to ongoing studies and how that might complicate evaluation, particularly since these studies take a long time to accrue patients. If changes are made midway in a study, particularly when some study centers are closed, et cetera, it would be very difficult to come up with really the profound data that we all would like to see.

They say if this is demanded of them, they would suggest that for these ongoing studies, that approval could still be obtained under the accelerated approval route where confirmatory phase IV studies would still be required.

Thank you.

DR. NERENSTONE: Dr. Lippman.

DR. LIPPMAN: I was going to make this point when Dr. Henderson brought up the issue of the progestins and 50 patients to go. I really think we have to approach trials that are ongoing differently than new trials. I thought what we were talking about here is what to tell industry now are the designs that are acceptable based on 2001 September data. I think that if a drug came in that was done differently, we'd evaluate it differently.

I think the accelerated approval is an interesting concept in terms of how to handle that. I actually like that idea. I hadn't thought about that. But I think that that should come up as part of the discussion and that we shouldn't mandate changes in design in the middle of trial. That would be the kiss of death for a lot of the analyses we're talking about.

DR. NERENSTONE: And I would agree with that approach, which is if you're in trial, you stay in trial, especially large studies. You can't do that. You can't hold somebody responsible to a new level if it was already designed and started 5 or 10 years ago.

I think, though, for a practical matter, that the publicity is going to take care of itself, which is a drug that does not come up to the new standard is not going to be widely accepted in the oncologic community as a drug that has beaten the new standard. So, I think we don't want inactive drugs to be put out there, but I don't think that we have a fear that that will happen.

DR. PAZDUR: To summarize this, I think we would only demand changes if there was a very strong sense that we would see a safety issue that would emerge. From the discussions that we've had, I don't get that feeling especially from the first answer where there is not a consensus of changing to one single new comparator arm, et cetera. So, we appreciate your comments on this last question.

DR. BLAYNEY: Yes, I agree. I think it's unfair to change in the middle of the stream in the absence of a safety issue. I was going to raise that before you did. I don't see the safety issue. I think let's go.

DR. PAZDUR: Heaven forbid. We wouldn't want anyone to accuse the FDA of being inconsistent.



DR. TEMPLE: We try to keep those quiet.


DR. TEMPLE: We fairly recently approved capecitabine as an alternative to fluorouracil/leucovorin even though the studies didn't have any CPT-11 around. And the label says we don't know how this regimen interacts with CPT-11, but we didn't think it shouldn't be approved because of that. I guess there is a post-marketing obligation to do a trial in the presence of CPT-11.

DR. NERENSTONE: Further comments? Dr. Albain.

DR. ALBAIN: I'll go back to something Dr. Sledge said earlier about the tendency in many trials now to use, for response rate endpoints, CR, PR, plus a 6-month stable disease or some other interval. Is there a precedent for accepting that as a primary endpoint in any pivotal trial that you're aware of? Because it seems to have crept in, and I'm not quite sure that we know that stable disease for 6 months --

DR. PAZDUR: No. We have not traditionally accepted stable disease as being incorporated into the response rate because we look at that in a different fashion. Response rate is a unique endpoint in that all of the effect of a response rate is attributable to the therapy that is being introduced, and we look at that as kind of a different type of endpoint because of that.

Obviously, the problem with stable disease is what is the contribution to the natural history of the disease. Again, you could probably set a parameter there, whether it be 6 months, 8 months, 3 months. Here again, I think it would probably confound what we're actually after when we look at response rate. And that could be easily measured by an analysis of time to progression.

DR. NERENSTONE: Any other comments?

(No response.)

DR. NERENSTONE: Well, I'd like to thank everybody for a relatively free-wheeling discussion.

DR. ALBAIN: Stacy, there's one more question.

DR. NERENSTONE: Oh, sorry. Over-eager to go to lunch I guess. Okay.

Number 6. Please discuss whether these recommendations would change if patients treated with a hormonal therapy are found to have improved survival compared to patients treated with tamoxifen with respect to the comparator drug and endpoints.

Open for discussion. Are those data accruing so that we will have some legitimate survival endpoints?


DR. NERENSTONE: Dr. Henderson.

DR. HENDERSON: It comes back to what we were discussing before. I think if you know definitely that there's a survival difference associated with a class of drugs, I think it should change things. If you've got one trial with one drug and the rest of the class doesn't show that and no scientific explanation for that, I think you should be skeptical. I think you should always be skeptical of all clinical data to begin with, and then you break down that process of being skeptical as you get more and more evidence.

DR. PAZDUR: I think, since the time is late, what we would be looking at, obviously, is the consistency of the data. If it's one trial, how would that change versus having it confirmed in other trials, et cetera.

DR. HONIG: Right. And it probably, I assume, would depend on the magnitude of the survival benefit.

DR. NERENSTONE: I think it would be very important to know that, and I think it would then take the hormone drug discussion out of, "oh, well, it's not very toxic and it doesn't matter" to we have to look at survival as an endpoint much more seriously and make it much more like a cytotoxic where survival is a primary endpoint.

DR. PAZDUR: But here again, it's not only the magnitude, but how reliable we feel that endpoint is. Was the survival advantage really attributable to the study drug? What are the crossovers that, especially with this long natural history of many of these patients, would have contributed to a difference? So, I think the analysis could be quite complicated of a survival claim.

DR. NERENSTONE: Dr. Lippman.

DR. LIPPMAN: On the face of it, not knowing about the issues specific to hormonal therapy, survival in my view trumps all the other endpoints. So, it would be a no-brainer. But because of the reasons raised by Dr. Carpenter and we all know there's a lot of sequential therapy that goes on, it seems to me that it would be very difficult to prove it. But if you did and felt comfortable with it, I think it would be an easy answer.

DR. NERENSTONE: Dr. Blayney.

DR. BLAYNEY: And there are supportive therapies as well that have an impact on survival that may or may not be applied to any given patient population in one country or one center or another, which also confounds and introduces noise into this system.

DR. NERENSTONE: Okay. The committee needs to be back at 1:30. Thank you very much for your attention.

(Whereupon, at 12:20 p.m., the committee was recessed, to reconvene at 1:30 p.m., this same day.)