FOOD AND DRUG ADMINISTRATION















                            JOINT MEETING OF






                           ADVISORY COMMITTEE



                               VOLUME III











                       Friday, February 18, 2005


                               8:07 a.m.










                          Hilton Gaithersburg

                           620 Perry Parkway

                         Gaithersburg, Maryland



                        P A R T I C I P A N T S


      Alastair J. Wood, M.D., Chair


      Arthritis Advisory Committee:


      Allan Gibofsky, M.D., J.D.

      Joan M. Bathon, M.D.

      Dennis W. Boulware, M.D.

      John J. Cush, M.D.

      Gary Stuart Hoffman, M.D.

      Norman T. Ilowite, M.D.

      Susan M. Manzi, M.D., M.P.H.


      Drug Safety and Risk Management Advisory Committee:


      Peter A. Gross, M.D.

      Stephanie Y. Crawford, Ph.D., M.P.H.

      Ruth S. Day, Ph.D.

      Curt D. Furberg, M.D., Ph.D.

      Jacqueline S. Gardner, Ph.D., M.P.H.

      Eric S. Holmboe, M.D.

      Arthur A. Levin, M.P.H., Consumer Representative

      Louis A. Morris, Ph.D.

      Richard Platt, M.D., M.Sc.

      Robyn S. Shapiro, J.D.

      Annette Stemhagen, Dr.PH. Industry Representative


      FDA Consultants:


      Steven Abramson, M.D.

      Ralph B. D'Agostino, Ph.D.

      Robert H. Dworkin, Ph.D.

      John T. Farrar, M.D.

      Leona M. Malone, L.C.S.W., Patient Representative

      Thomas Fleming, Ph.D.

      Charles H. Hennekens, M.D.

      Steven Nissen, M.D.

      Emil Paganini, M.D., FACP, FRCP

      Steven L. Shafer, M.D.


      National Institutes of Health Participants



      Richard O. Cannon, III, M.D.

      Michael J. Domanski, M.D.

      Lawrence Friedman, M.D.



                  P A R T I C I P A N T S (Continued)


      Guest Speakers (Non-Voting):


      Garret A. FitzGerald, M.D.

      Ernest Hawk, M.D., M.P.H.

      Bernard Levin, M.D.

      FDA Participants:


      Jonca Bull, M.D.

      David Graham, M.D., M.P.H.

      Brian Harvey, M.D.

      John Jenkins, M.D., F.C.C.P.

      Sandy Kweder, M.D.

      Robert O'Neill, Ph.D.

      Joel Schiffenbauer, M.D.

      Paul Seligman, M.D.

      Robert Temple, M.D.

      Anne Trontell, M.D., M.P.H.

      Lourdes Villalba, M.D.

      James Witter, M.D., Ph.D.

      Steve Galson, M.D.

      Kimberly Littleton Topper, M.S., Executive




                            C O N T E N T S


      Call to Order:

                Alastair J. Wood, M.D.                           5


      Conflict of Interest Statement:

                Kimberly Littleton Topper, M.S.                  5


      Naproxen  Investigator Presentation

           Alzheimer Prevention Study: ADAPT

           (Alzheimer's Disease Anti-Inflammatory

           Prevention Trial):

                Constantine Lyketsos, M.D.                      14


      Additional Background Presentations

           Interpretation of Observed Differences

           in the Frequency of Events When the

           Number of Events is Small:

                Milton Packer, M.D.                             42


      Clinical Trial Design and Patient Safety:

      Future Directions for COX-2 Selective NSAIDS

                Robert Temple, M.D.                             95


      Issues in Projecting Increased Risk of

      Cardiovascular Events to the Exposed  Population

                Robert O'Neill, Ph.D.                          109


      Summary of Meeting Presentations:

                Sharon Hertz, M.D.                             132


      Sponsor Responses                                        140


      Advisory Committee Discussion of Questions               147


                Question 1:                                    165

                Question 2:                                    284

                Question 3:                                    320

                Question 4:                                    356

                Question 5:                                    367

                Question 6:                                    391

                Question 8:                                    418

                Question 7:                                    432


      Meeting Wrap-up                                          438




                         P R O C E E D I N G S


                             Call to Order


                DR. WOOD:  Let's get started.  This is our


      third day and thanks to everybody for coming back.


      We have obviously entertained you sufficiently.


                Kimberly has a statement to read.


                     Conflict of Interest Statement


                MS. TOPPER:  The following announcement


      addresses the issue of conflict of interest with


      respect to this meeting and is made a part of the


      record to preclude even the appearance of such.


      Based on the agenda, it has been determined that


      the topics of today's meeting are issues of broad


      applicability and there are no products being




                Unlike issues before a committee in which


      a particular product is discussed, issues of


      broader applicability involve many industry


      sponsors in academic institutions.  All special


      government employees have been screened for their


      financial interests as they may apply to the


      general topics at hand.


                To determine if an conflict of interest


      existed, the agency has reviewed the agenda and all


      relevant financial interests reported by the




      meeting participants.  The Food and Drug


      Administration has granted general-matter waivers


      to the special government employees participating


      in this meeting who require a waiver under Title


      18, United States Code, Section 208.  A copy of the


      waiver statements may be obtained by submitting a


      written request to the agency's Freedom of


      Information Office, Room 12A-30, of the Parklawn




                Because general topics impact so many


      entities, it is not practical to recite all


      potential conflicts of interest as they apply to


      each member, consultant and guest speaker.  FDA


      acknowledges that there may be some potential


      conflicts of interest but, because of the general


      nature of the discussions before the committee,


      these potential conflicts are mitigated.


                With respect to the FDA's invited industry


      representatives, we would like to disclose that Dr.




      Annette Stemhagen is participating in this meeting


      as a non-voting industry representative on behalf


      of regulated industry.  Dr. Stemhagen's role on


      this committee is to represent industry interests


      in general and not any one particular company.  Dr.


      Stemhagen is Vice President of Strategic Develop


      Services for Covance Periapproval Services, Inc.


                In the event that the discussions involve


      any other products or firms not already on the


      agenda for which an FDA participant has a financial


      interest, the participants' involvement and their


      exclusion will be noted for the record.


                With respect to all other participants, we


      ask, in the interest of fairness, that they address


      any current or previous financial involvement with


      any firm whose product  they may wish to comment




                There is one administrative announcement.


      Would you please make sure that you take your phone


      calls outside.  It is messing up with our audio and


      we would really appreciate it.  Thank you.


                DR. WOOD:  The other administrative thing




      that the sound person has asked me to say is, to


      the committee, try and remember to switch off your


      microphones when you are not using them.


      Apparently, it messes it up.


                MR. LEVIN:  Mr. Chairman?


                DR. WOOD:  Yes, Arthur?


                MR. LEVIN:  I wanted to express a concern


      I have in terms of the agenda for today's meeting.


      For those of us who have been at advisory committee


      meetings before, we know that there is often a


      tendency to sort of squeeze the most important part


      of these advisory committee meetings which is the


      discussion and answers to the questions and giving


      directions to FDA.


                My concern is that, given the lengthy


      discussions we have had over the past two days and,


      given the fact that this is last day, that we will


      not have enough time to fully explore all of the


      questions that have been raised over the last two


      days and to give some definite direction to the FDA


      as to how to pursue these issues.


                So I would like to suggest to the group




      that we might shorten the presentations, or


      eliminate them entirely, in order to have adequate


      time to fully discuss all of our concerns and


      different points of view around the table.  I think


      it would be really unacceptable to leave here today


      unable, because of a time constraint, to give


      direction to the FDA on this issue.


                DR. WOOD:  Did you have any particular


      people you wanted to eliminate?  Or do you want to


      pass me a note, privately?


                MR. LEVIN:  It may be something the


      committee as a whole should decide.


                DR. WOOD:  Let me make a suggestion.  I


      think that is a reasonable approach.  I am sure the


      committee will want to hear the data from the ADAPT


      study and we should hear that in its totality.


      Milt Packer has come a long way so we should hear


      from him, I think.  Milt is always entertaining,




                Do we really need to hear from the two




                DR. TEMPLE:  I don't have any ego involved




      in this.  A fair amount of--some of what I am


      talking about is about the adverse consequences of


      blood-pressure elevation which I think I could


      skip.  So I could shorten it considerably.  But you


      guys decide.  It is there for you to read if you




                DR. WOOD:  Why don't you do this.  Why


      don't you distribute your talk to us.


                DR. TEMPLE:  I think it has been.


                DR. WOOD:  Right; I understand that.  I


      will take that as a given.  And both of you make


      whatever remarks you would like to make from your


      seats there at the times that you are allotted, but


      brief and pointed.  And let's not revisit all the


      things we have visited before.


                DR. TEMPLE:  That's fine.


                DR. WOOD:  Does that sound fair?  Dr.




                DR. O'NEILL:  Yes; that is fine.


                DR. WOOD:  That will save us some time.


      So that is a good thought.  In addition, we have


      got Sharon Hertz's talk which, I notice, has




      40-something slides here--45 slides--which is a lot


      to get through in a few minutes.  So I think, while


      we are sort of working up to that, she may want to


      look at that and decide what she really needs to


      say.  I mean, after all, it is very unusual for the


      FDA to summarize the meeting for the committee,


      which is partly what the committee is here to do, I




                So let's make sure that she can finish


      that taking  the time she has been allotted for it


      which is 30 minutes.  She would be better to remove


      some slides rather than rush through it, I think.


                Having said all that, let's get to the


      first presentation.  Does anyone else have any


      thoughts on that?  Yes, Annette?


                DR. STEMHAGEN:  I would like to ask


      whether the manufacturers could have just one or


      two minutes to make some summary comments before we


      start our deliberations after lunch.


                DR. WOOD:  Do they want to do that now?


      Is that what you are asking?


                DR. STEMHAGEN:  No; I think after these






                DR. WOOD:  Okay.


                DR. STEMHAGEN:  Thank you.  I appreciate




                DR. WOOD:  Let's have some discussion


      amongst the committee.


                DR. CUSH:  What would be the purpose of


      their having--they have had lots of time already to


      present their data and had lots of mike time in the


      back already.


                DR. STEMHAGEN:  Just in terms of the


      deliberations that have gone on, there might be


      some clarifying comments.


                DR. CUSH:  I think, if we have questions,


      we can ask for clarifying comments.  I think that


      is what we--I would suggest--and I agree with


      Arthur Levin in that we should get on to discussion


      as quickly as possible.


                DR. STEMHAGEN:  I realize this is sort of


      in contrast to try to shorten it.  But I would like


      to ask that that time be awarded.


                DR. WOOD:  Any other thoughts on that? 




      Let me get a sense of the committee.  What is the


      committee's pleasure about that?  Yes?


                DR. BOULWARE:  I actually support that


      recommendation, too, and would suggest you give


      them a limited time, like you did with the public


      comment where you will cut them off at two minutes,


      so we know it will be limited.  I would be


      interested in the direction they plan to take.  We


      heard some startling news yesterday about the


      possible remarketing of a product that they have




                DR. WOOD:  Does anyone object to them


      getting two minutes apart from Dr. Cush?  Then, I


      think, the answer on that is that that is fine.


      Remind them that, in contrast to most of their


      experiences in the past for senior managers, the


      microphone will be cut off.


                DR. STEMHAGEN:  Thank you very much.  I


      think we saw evidence of that yesterday.


                DR. WOOD:  Right.  So they got the


      message; right?  Okay.  Let's move along to the


      first speaker, Dr. Lyketsos.


                       Investigator Presentation


                  Alzheimer's Prevention Study: ADAPT


                DR. LYKETSOS:  Good morning, everyone.  I




      do not have slides.  My name is Constantine


      Lyketsos.  I am a professor at Hopkins and I am


      presenting here today on behalf of the ADAPT study,


      Alzheimer's Disease Anti-inflammatory Prevention


      Trial.  I would like to thank the committee for


      inviting us to present.  I am here today with my


      colleague, Steve Piantadosi, who is also on the


      steering committee and will be available to answer


      any questions that might come up later on as well.


                I have a prepared statement that will be


      distributed to the committee later on today.  I


      delivered it to the staff this morning as I was




                Before I get into the statement, I just


      wanted to take a few moments to remind us of the


      public-health importance of Alzheimer's disease to


      somewhat set the context about how the ADAPT trial


      has started specifically.  Alzheimer's, as we all


      know, is a major public-health problem.  It is a




      devastating disease, typically runs a ten-year


      course of neurodegeneration affecting probably


      close to 4 or 4-and-a-half million of our citizens


      at present and the number is expected to rise given


      the aging of the population of the next several


      decades to approach, perhaps, 12 to 15 million,


      based on current projections.


                Because of the these public-health


      numbers, there has been a very significant effort


      in our field for the last several years to develop


      preventive strategies for Alzheimer's disease


      because, once neuronal degeneration has started,


      the evidence that treatments work, so far, is very




                These preventive strategies have centered


      on several possible treatments but the most


      supported by the observational literature have been


      nonsteroidals with over 24 studies right now


      including four prospective population studies


      suggesting substantial reductions of risk of


      Alzheimer's disease perhaps with risk ratios, in


      some cases, as much as 0.4 or 0.5.  So it is within




      that context that ADAPT was started with the


      support of the National Institute of Aging.


                I will move now to reading the prepared




                The steering committee of the ADAPT study


      welcomes the opportunity to present the rationale


      for its decision, on December 17, 2004, to suspend


      the NSAID treatments in ADAPT.  This presentation


      is important because there is much public


      misunderstanding about our decisions and their




                The ADAPT Steering Committee is deeply


      committed to the safety of human subjects, even


      more so in the context of prevention trials where


      risks are typically not balanced by any promise of


      tangible near-term benefit.  In this notable way,


      prevention trials differ from treatment trials


      whose participants may hope for relief of symptoms


      or improved outcomes in a condition already




                The risk:benefit balance in prevention


      trials is even further removed from a comparison of




      the benefits of a proven treatment with its


      acknowledged risks.  Because ADAPT has not quite


      completed the process of auditing and tabulating


      the trial's cardiovascular safety on the date of


      suspension, we cannot, today, present the trial


      safety results at the time of the decision to




                We defer that presentation to a


      peer-reviewed publication planned for the near


      future.  For today, we note that, even with the


      risk:benefit calculus of a prevention trial, these


      data would not, in themselves, have led to our


      decision to suspend either treatment.  In reality,


      those decisions were made in very unusual


      circumstances.  They reflected events external to


      ADAPT that raised strong concerns about the


      practicalities of continuing the treatments.


                As the advisory committee probably knows,


      ADAPT is a randomized, double-masked, multicenter


      trial of celecoxib, 200 milligrams twice daily, or


      naproxen sodium 220 milligrams twice daily versus


      placebo for the primary prevention of Alzheimer's




      dementia and for the prevention of age-related


      cognitive decline which is, in many instances, a


      prodrome of Alzheimer's disease.


                ADAPT also provides an opportunity to


      study the long-term safety of its treatments in a


      healthy elderly population.  Eligibility criteria


      include an age of 70 years or older at enrollment


      and a health history that excludes many of the


      known risk factor for adverse events with NSAID


      treatments; for example, we exclude those with


      preexisting uncontrolled hypertension, anemia or a


      history of gastrointestinal bleeding, perforation


      or obstruction.


                To provide independent recommendations


      regarding continuation of the trial, the ADAPT


      Treatment Effects Monitoring Committee, or TEMC,


      which, I suppose, is our term for a DSMB, meets


      twice a year.  In response to emerging concerns


      about cardiovascular risks with NSAIDs, membership


      of the TEMC was recently expanded to include Dr.


      Bruce Psaty, a physician with expertise in


      evaluation of cardiovascular risks in clinical






                As an additional safeguard for participant


      safety, the ADAPT study officers and consultants


      also conduct reviews of safety data at intervals


      between TEMC meetings.  Amid the emerging


      controversy about the cardiovascular safety of


      selective COX-2 inhibitors, the ADAPT study officer


      had been relatively reassured by their periodic


      reviews of the celecoxib safety data. The study


      chair communicated this information in a telephone


      conversation on 15 October 2004 with Dr. Sharon


      Hertz at FDA.


                As of December 17, 2004, the data of


      suspension of treatments and enrollment in ADAPT,


      we had enrolled 2,528 participants.  Of these,


      2,463 had been randomized before October 1 of '04


      with some 20 months average duration of


      observation.  These participants contributed a


      total of 3,888 person years of follow up to


      analyses that were presented to the TEMC on


      December 10, 2004.


                Those analyses suggested a weak signal




      suggesting increased risks of cardiovascular and


      cerebrovascular events with naproxen.  Reviewing


      the data, however, we understood well the TEMC's


      evident conclusion that this signal was not


      sufficiently compelling or definitive to warrant a


      recommendation to suspend the treatment or to


      otherwise alter the protocol.  This was on December


      10, 2004.


                Thus, the study officers were surprised on


      December 17 by announcements that two trials of


      celecoxib for the prevention of recurrent


      adenomatous colon polyps had been suspended citing


      increased cardiovascular risks with treatment in


      one of these studies, the Adenoma Prevention with


      Celecoxib trial, or APC.  This news led to


      extensive discussion among the steering committee


      on that day centering on the following




                Number one; one arm of the APC trial had


      used the same celecoxib dosing as ADAPT, 200


      milligrams twice daily, but over a longer period of


      time.  News reports cited a relative risk of 2.5




      for cardiac events in this arm of APC.  Although


      this risk was reported as only "marginally


      significant," a greater cardiac-risk signal was


      reported with the higher APC dosage of 400


      milligrams twice daily.


                Thus, we took seriously the possibility of


      harm over time to ADAPT participants receiving


      celecoxib.  Especially in a prevention trial with


      no strong prospects of immediate benefit, we had


      strong misgivings about continuing celecoxib




                Knowing almost nothing at the time about


      the particulars of the APC trial and, in light of


      the apparent lack of risk with celecoxib in the


      other prevention trial, we might have discounted


      the APC data and continued celecoxib.  To do so,


      however, we would clearly have needed the


      concurrence of the seven IRBs that oversee ADAPT.


      These IRBs began almost immediately to question us


      about implications of the APC results and seemed


      likely to question a decision to continue.


                Even if we had persuaded them to permit




      continuation of celecoxib using a revised consent


      process, we would surely be involved in lengthy


      discussions with these IRBs.  In the meantime, we


      would be unable to offer much explanation to our


      participants, thereby endangering the relationship


      of trust that is vital to the success of long-term




                Number three; as is common in long-term


      trials, ADAPT was experiencing some difficulty with


      adherence to treatments.  This difficulty grew


      following the withdrawal of rofecoxib and we


      expected the announcement of the APC results to


      exaggerate the problem further with scores of


      participants stopping treatment, in effect, "voting


      with their feet."  This would erode statistical


      power and increase the potential for bias in ADAPT.


                Thus, even though the ADAPT safety data


      did not, themselves, warrant suspension of


      celecoxib treatments.  There seemed little


      practical choice but to do so.


                We next confronted the dilemma of what to


      do about naproxen and its placebo.  As suggested




      above, we regarded the accumulated naproxen safety


      data as being somewhat more concerning than the


      celecoxib safety data.  Yet, they, also, were not


      compelling.  Although some post hoc data composites


      barely reached statistical significance--these are


      post hoc data composites barely reached statistical


      significance for naproxen versus placebo, no


      singular vascular event was clearly more frequent


      with naproxen versus placebo.


                Furthermore, vascular risks were not


      expected with naproxen treatment.  In fact, a


      substantial body of prior data at the time had


      suggested that naproxen offers some cardiovascular


      protection.  This lack of prior expectation cast


      further doubt on the meaning of the naproxen data


      in ADAPT which were vulnerable, in any case, to the


      problem of multiple comparisons.


                We could, therefore, have attempted to


      have revised ADAPT to a two-armed trial of naproxen


      versus placebo, instructing our participant to stop


      taking their "white  pills," as they are known in


      the study, which are celecoxib and its placebo, but




      continue to take their "blue pills," which contain


      naproxen and its placebo.


                However the dangers were several.


      Participants might end up getting confused and


      taking the wrong pills and many would stop taking


      their treatments altogether.  We faced an ethical


      dilemma.  The suspension of celecoxib and


      continuation of naproxen would have created the


      impression among participants and among the general


      public that celecoxib was risky but naproxen was


      "safe."  At least based on the signals from the


      ADAPT data, this impression would have been




                What would we then tell participants about


      the risks with naproxen as we led through the


      inevitable process of revised consent necessitated


      by the protocol revision.  Would the multiplicity


      of IRBs even allow us to follow this course?


                Finally, there was another risk to


      consider.  We began ADAPT expecting to see some


      increase with naproxen in gastrointestinal bleeding


      and other events.  Even though we attempted to




      reduce these excess G.I. risks by excluding


      participants with prominent risk factors other than


      age, the ADAPT data showed a notable increase in


      G.I. bleeding with naproxen versus placebo.


                Especially amid concerns that ADAPT was


      exposing its participants to potential risks that


      were immediate, while the trial's hoped-for


      benefits lay in the future, the totality of the


      above arguments lead the steering committee to


      suspend both treatments and to also suspend


      enrollment into ADAPT.


                As noted above, we expect, within a few


      weeks, to submit a scientific paper for peer review


      and publication.  The paper's focus will be on the


      process and rationale underlying the decision to


      suspend treatments and enrollment in ADAPT.


      Because these decisions did rely, in some measure,


      on the ADAPT safety data as of 10 December, the


      paper will, also, disclose some of these data.


                We are also cooperating with ongoing


      efforts at the NIH to investigate the


      cardiovascular and cerebrovascular risks of NSAIDs.




      In addition, the NIA and the ADAPT Steering


      Committee are committed to a further two years of


      additional safety monitoring of our participants.


                In preparation for a later, more


      definitive discussion of the ADAPT safety data, we


      plan to revisit a number of the adverse events to


      collect additional information and then to submit


      all information available now or later to a process


      of expert adjudication.  Depending on particulars,


      the latter process will take months.  In the nearer


      term, we concur with the expert opinion that,


      having taken these widely publicized decisions, the


      steering committee must fulfill its obligation to


      disclose its reasons for doing so based upon the


      data available.


                At the same time, we are intent that our


      public presentation even of the current "working"


      data must be at the highest attainable standards of




                Thank you.


                DR. WOOD:  Thank you very much.  Are there


      questions directed to the speaker?  Dr. Nissen?


                DR. NISSEN:  I fully understand your


      rationale and I understand that the trial was


      fundamentally stopped because of an issue of




      futility.  You didn't think that you could keep


      people in the celecoxib arm.  That is all well and


      good.  The problem that occurred here is that a


      warning was issued on naproxen which had the effect


      of being the medical equivalent of screaming "fire"


      in a crowded auditorium.


                All over the country, many of us got calls


      from patients saying, "I want to stop my naproxen


      because it causes a cardiovascular risk."  I think,


      just a comment here, that it would have been far


      better to have announced that the trial was


      suspended for futility rather than for hazard when


      there was a non-statistically significant hazard.


      So, one man's comment.


                DR. WOOD:  I agree with that.  Any other


      comments?  Yes?


                DR. FARRAR:  I wonder if you could comment


      on the G.I. bleed component since, obviously, one


      of the deliberations we have to undertake is the




      relative problems with G.I. bleed versus


      cardiovascular risk.  Certainly, that was known a


      priori before starting the study.


                As you commented very carefully, that


      wasn't the only consideration.  But, in a drug


      trial where the outcome is unknown and the risk is


      really fairly well known, I wondered how you


      thought about that in terms of putting patients at


      risk of something on the order of a few percentage


      over the course of a five-year trial who might have


      serious complications from the G.I. bleeding.


                DR. LYKETSOS:  I guess you are asking me a


      human-subjects question.


                DR. FARRAR:  I am asking how, in the


      design of the study, obviously the choice was made


      to accept that risk for the unknown potential


      benefit of reduction in Alzheimer's disease over


      the course of the same trial.  I am wondering if


      you have any insights into how that decision was


      made because, clearly, there are issues there about


      the use of these drugs and their risks.


                DR. LYKETSOS:  Well, I am glad you are




      asking the question.  It certainly is an issue that


      we have spent a lot of time discussing and which we


      discussed with study sections, IRBs, at quite some


      length and continue to discuss.


                I think the fundamental point that I would


      start with is where I started my presentation which


      is the devastation that Alzheimer's disease brings


      and the fact that all the study participants were


      individuals who had a first-degree relative with


      the disease and had, therefore, personal




                In that context, we were very careful and


      very clear with them about what we thought at the


      time the known G.I. risks were so that, in the


      process of consent, and that was revealed through


      careful discussions in the consent process as well


      as the consent form, the risk of G.I. bleed was


      stated very clearly and that that, in some cases,


      might lead to death.


                So I think we felt that this was a


      decision that our participants could make, given


      that the risks were relatively small, and the risk




      that they would develop Alzheimer's disease was


      higher and that we felt they could make the


      decision for themselves if they were willing to


      take the risk:benefit calculus as we saw it.


                DR. WOOD:  Dr. Gibofsky?


                DR. GIBOFSKY:  I share Dr. Nissen's


      concern about this effect of crying fire in a


      crowded theater.  Many of our patients called and


      suggested that they were going to stop their


      celecoxib because of the concerns that were raised


      from ADAPT as well.  But you raised a very


      interesting concern that I confess I hadn't given


      enough thought to and that is the difference


      between a prevention trial and an outcome trial.


                Much of our discussion here later today, I


      suspect, is going to focus on what action should be


      taken, if any, to restrict drugs based on treatment


      from data on prevention trials.  I would be very


      curious to hear you expound on that a bit more.


                DR. LYKETSOS:  That is an interesting


      question.  Let me just, if I could, because there


      have been three comments now--I just would like to




      refer you to the early part of my statement where I


      said the presentation is important because there is


      much public misunderstanding about our decisions


      and their rationale.


                Several of you pointed out that there was


      a cry of fire.  I don't believe that that came from


      the study.


                DR. WOOD:  We won't ask you to speculate


      where it came from.  There is certainly a view on




                DR. LYKETSOS:  I am not sure where it came


      from.  But, to address the other issue, I must say


      I have not given it much thought as to whether


      prevention-trial safety data would generalize in


      the way that you are thinking about it.  So I will


      defer on that because I think it would need a fair


      bit more thought by people who are more expert in




                DR. WOOD:  Dr. Fleming.


                DR. FLEMING:  It is my understanding, from


      what you are saying, that the steering committee


      was particularly influenced by the APC prior data




      not by the internal data from ADAPT; i.e., there


      were, from you were describing, some emerging


      trends that, in my words, were in the unfavorable


      direction but in the context of monitoring trials,


      we know that one has to be extremely cautious, when


      you are looking at data continually over time, not


      to overinterpret emerging trends that can easily


      ebb and flow.


                So my understanding, from what you are


      saying, is it wasn't that there were, at this


      point, some emerging trends that happen to be in


      the unfavorable direction on naproxen.  Rather, it


      was the external data on the APC trial for Celebrex


      that was the driving issue behind the




                DR. WOOD:  Just to develop that question,


      what I understood you to say was you hadn't passed


      some stopping boundary; is that correct?


                DR. LYKETSOS:  I'm sorry?  I didn't hear


      the first--


                DR. WOOD:  You hadn't violated your


      stopping rule, or whatever stopping rules, you had




      for safety.


                DR. LYKETSOS:  I think that our TEMC, our


      DSMB, had opined the week before with the same data


      from within the trial that they felt that we should


      continue.  So it was interesting how the two events


      were back-to-back.


                DR. FLEMING:  I would like to come to that


      second.  I am leading to that.  But first I wanted


      to make sure that I understood what was the nature


      of the concern.  Is my interpretation correct?


                DR. LYKETSOS:  I think so.  Back to how I


      put it, the issue really was one of practicalities


      more than our internal data, is that we felt we


      would have to talk to IRBs and participants and


      tell them something about--


                DR. FLEMING:  Could I first understand


      what your sense of the evidence was.  I want to


      discuss that first, versus the practicality.


                DR. LYKETSOS:  The sense of the study




                DR. FLEMING:  The sense of the evidence


      that was the basis for the decision in terms of




      adverse effects.  I have heard two things.  One is


      the naproxen, but that was not compelling evidence.


      That was within the framework of emerging results


      that could be by chance alone when you are


      monitoring data frequently.  But external APC data


      was very influential to you.  That is what I am


      hearing.  Is that correct?


                DR. LYKETSOS:  Well, in fact, we didn't


      know all the details of the APC data, as I pointed


      out.  I think it was that plus the climate that had


      been created by rofecoxib coming off the market,


      the influence that that had to some extent on our


      participants, then the widely publicized APC


      results and the sense that, even though the data we


      were seeing and that our TEMC the week before had


      seen, did not compel us to stop treatment based on


      our own data, that there was now a climate created


      where, practically speaking, we had to stop and


      take stock and get more information, et cetera.


                So it was that sort of the decision.  I


      was a complicated decision and that is why it takes


      a three-page statement to try and explain what went




      through our minds.


                DR. FLEMING:  There may not have been, to


      the steering committee at this time, access to data


      on PRECEPT for celecoxib or to the etoricoxib, the


      lumiracoxib, data on naproxen that were very


      favorable, but you did have access to the VIGOR


      data which was very reassuring for naproxen and you


      had evidence from the CLASS trial and some other


      data from Celebrex.


                I am perplexed that you would look at the


      totality of these data and say that the results


      were conclusive in terms of at least not being able


      to provide information to the IRBs and to the


      patients and caregivers in the trial representing


      the totality of the data when your data-monitoring


      committee had looked at the totality of the


      evidence for benefit to risk.


                On a data-monitoring committee, I have


      always argued, don't just show me the safety data,


      even if we are just looking at early assessments


      for safety.  It always has to be benefit to risk.


      Even though, as you are pointing out, this wasn't a




      therapeutic setting, prevention trials also provide


      major opportunity for benefit.  Preventing major


      diseases is also a very significant benefit.


                My understanding is your data-monitoring


      committee, in looking at the data, looking at the


      benefit as well as the risk, indicated the study


      should continue.  How did the steering committee


      judge, without access to ongoing data, that benefit


      to risk couldn't be sufficiently favorable and that


      a notification to the investigators, to the


      patients and to the IRBs, that the monitoring


      committee has carefully looked at benefit and risk


      and that the totality of the data is beyond the APC


      trial when you are looking at Celebrex and


      naproxen?  Why wasn't that strategy pursued?


                DR. LYKETSOS:  First, as I pointed out in


      my statement, some members of the steering


      committee did have access to the data that the DSMB


      had seen.  That is the first point.  The second


      point is, as you point out and as I think this


      whole discussion points out, is these are very


      difficult judgment calls.  They have to take into




      account evidence but also practical aspects of


      continuing to conduct this sort of a prevention


      trial in this sort of a population.


                I think it was the judgment call, and I


      can tell you, there was substantial discussion


      around this when we had the steering committee


      meeting, about these very issues.  It was the


      collective judgement at the time that this was the


      right thing to do, given the various issues that I


      have articulated in my statement.


                DR. FLEMING:  I will just pursue one more.


      I am dismayed to hear the steering committee, some


      steering committee members, had access to the data.


      That is also a violation of the principles of


      monitoring trials.  It should have been in the sole


      possession of the data-monitoring committee.


                I am also distressed because I am not


      hearing that monitoring committee was front and


      center in terms of having these issues brought back


      to it for reassessment.  So, to me, what I am


      hearing raises very significant concerns about


      putting at risk the integrity of studies with




      prejudgments using only access to partial external




                DR. WOOD:  There was one other thing,


      though, at least the word on the street was, and


      you sort of mentioned that as well, I understood


      there was a very large number of dropouts from the


      trial after the Vioxx withdrawal and others and


      that one of the perceptions was it was no longer


      possible to continue the trial.  Is that true?


                DR. LYKETSOS:  Let me clarify that.  The


      adherence had been declining on an annual basis


      even before rofecoxib was withdrawn from the


      market.  So adherence was perceived as an issue in


      that we felt that now there were data about one of


      the study drugs and that that would further erode


      adherence.  We did not see a huge erosion in


      adherence with rofecoxib, specifically, but there


      had already been an erosion that was concerning and


      we anticipated a further erosion.


                DR. WOOD:  Right.  But the question for


      this committee that Dr. Fleming is pursuing


      vigorously, and I agree with him, is that the




      announcement that you all made--the announcement,


      as it was picked up--maybe I should put it like


      that--was that this trial was being stopped for a


      safety signal.


                What I heard in your statement and what I


      hear from you now is that the trial was being


      stopped for operational problems in the trial and


      the safety signal was a convenient moment at which


      to do that.  But you had operational difficulties.


      That is a very different interpretation and a very


      different interpretation for the public and




                Is that what you are hearing, Tom?


                DR. FLEMING:  It certainly appears to be.


      It is part of what is concerning to me.


                DR. LYKETSOS:  I think my statement should


      speak for itself.  In terms of what the data were,


      as I have pointed out, they will be submitted very


      soon so that you can judge for yourselves.


                DR. WOOD:  Okay.  Any other questions?


      Sorry; Dr. Farrar.  I beg your pardon.  Dr. Farrar,


      go ahead.


                DR. FARRAR:  I think, actually, that this


      study provide some vitally important information


      with regards to our consideration of the entire




      class of drugs; namely, the NSAIDs.  I would like


      to just read on sentence from the statement.


                It said, "Although some post hoc data


      composites barely reached statistical significance


      for naproxen versus placebo."  Now, clearly, this


      discussion would be much clearer after the


      presentation of the data, a careful review of the


      data.  But Dr. Fleming noted that, in the VIGOR


      study, there was some reassurance about naproxen.


      I would like to just question that.


                What is very clear in the VIGOR study is


      that naproxen was safer than rofecoxib.  But it


      does not comment at all with regards to the


      potential risk compared to placebo.  In fact, I was


      surprised when I heard the statement by Dr. Fleming


      because, in fact, I have assumed, based on all the


      data that we have, that every NSAID will not fare


      well against a placebo.


                I think that this data, and probably will




      be supported by the publication although I don't


      want to try and foresee the future, but my guess is


      that naproxen will not fare particularly well


      against placebo in terms of its cardiovascular


      safety.  I think we need to be able to accept the


      fact that all of them have some risk with regards


      to cerebrovascular disease and this study is likely


      to provide the data to support that.


                DR. WOOD:  Dr. Nissen?


                DR. NISSEN:  I don't want to belabor this


      because we have got a lot more to discuss today,


      but I think it is extremely important that, as a


      medical community, we learn from this episode.  In


      the kind of media frenzy that was going on during


      that period of time, this announcement, this


      warning that was issued on a national basis about


      naproxen, was inappropriate, led to some panic


      amongst the public and we simply can't do business


      this way.


                We can't operate in this kind of a


      fashion.  I would urge any of the individuals who


      were involved in the decision to issue a warning to




      go back and look at what happened and try to ensure


      that we don't do this sort of thing again, because


      once this gets picked up by the media, it passes


      through generations of people and becomes the topic


      of extensive discussion and may lead patients who


      don't have the ability that we have around this


      table to filter data--they don't understand


      data-safety and monitoring boards.  They don't


      understand stopping rules.  And it caused a panic


      that was unnecessary and it shouldn't have


      happened, and I hope it doesn't happen again.


                DR. WOOD:  Thanks very much.  Let's move


      on to next speaker, Dr. Packer.


                  Additional Background Presentations


             Interpretation of Observed Differences in the


                  Frequency of Events When the Number


                           of Events is Small


                DR. PACKER:  Thank you, Alastair, members


      of the advisory committee, FDA, ladies and


      gentlemen.  Today I have been invited by FDA to


      address a specific question which is how should be


      interpret differences in the observed frequency of




      events in a clinical trial when the number of


      events is small.


                Let me just say arbitrarily that I will


      define, for purposes of today, what I mean by a


      small number of events and that would have provided


      less than 70 percent power to have detected a true


      treatment difference assuming an effect size


      similar to that generally encountered in clinical




                This is just a thought.  Just suppose you


      do a trial for a noncardiovascular indication and


      you note that there are 13 major adverse


      cardiovascular events in the placebo group and 33


      such events in the drug-treatment group.  How


      should this difference be interpreted?


                Many would simply perform a statistical


      test, derive the p-value, and get excited if the


      p-value were less than some arbitrary value such as


      0.05.  In this example, the p-value of 0.002 would


      suggest, to some, that this difference between 13


      and 33 in a trial of about 3,000 patients, would


      have been observed only two times out of 1,000, an




      effect unlikely to have been due to the play of




                However, before getting excited, we should


      remember that p-values must be interpreted in some


      context.  P-values are most easily interpreted when


      they refer to predefined primary endpoints in


      trials adequately powered, more than 80, 90 percent


      power, to detect differences between treatments.


      However, even under such circumstances, p-values


      are not necessarily reproducible.


                Bob O'Neill and others have made the point


      that, if a p-value in the trial is 0.05, the


      likelihood of seeing 0.05 in a second identical


      trial is only about 50 percent.  It is only when


      the p-value in the first study is 0.001 that the


      likelihood of seeing 0.05 or less in the second


      identical trial is at least 90 percent.


                These calculations are the basis of the


      frequent FDA guidance that, to demonstrate


      persuasive evidence for efficacy, a sponsor needs


      to provide two trials with 0.05 or less or one


      trial with a very, very small p-value.


                But what if the event was not the primary


      endpoint in the study?  What, in fact, if the event


      was not even precisely defined before the start of




      the trial?  What if the trial was not adequately


      powered to detect a treatment difference for the


      endpoint?  What does a p-value mean under these




                Unfortunately, this happens quite


      frequently in clinical trials under a variety of


      circumstances.  But it is particularly true in the


      analysis of adverse events.  So lets make a list of


      things to worry about when using p-values to


      compare the frequency of adverse events in a


      clinical trial.


                First, there are literally hundreds of


      adverse events in a clinical trial and, therefore,


      there are hundreds of possible comparisons that can


      be made.  Now, this is classically referred to as


      the multiple comparisons problem.  For example, if


      a typical large-scale clinical trial yields as many


      of 500 individual terms describing adverse events


      and if a p-value were calculated for each pairwise




      comparison, one would, of course, by chance alone,


      expect about 5 percent of the terms, or about 25


      events, at a p-value of 0.05 or less and 1 percent


      of the terms are about 5 events to have a p-value


      of 0.01 or less.


                The second issue in interpreting


      comparison of frequency of adverse events is the


      fact that adverse events are spontaneous


      nonadjudicated reports.  Now, adverse events are


      reported at the discretion of the investigator and


      then translated into standardized terms.  There is


      little uniformity on how an event is identified,


      defined or reported and this uncertainty increases


      when the event is in a field remote from the


      investigator's focus.


                Now, some of you may believe that you can


      fix this problem by carrying out blinded


      adjudication of events after the fact.


      Unfortunately, the rules guiding post hoc


      adjudication are inevitably influenced by the


      knowledge that a treatment difference has been


      seen.  In fact, any bar set by a post hoc process,




      is capable of magnifying or diluting an effect.


                For example, if you set very strict


      criteria, a committee could reduce the number of


      events and, therefore, reduce statistical power.


      By setting very loose criteria, the committee can


      include many questionable events and reduce the


      magnitude of a treatment difference.


                To make things more complicated,


      adjudication committees do not generally examine


      individuals who did not report an event to make


      sure they didn't have an event.


                The third issue in interpreting


      comparisons of frequencies is that some signals are


      apparently only if adverse events are grouped


      together.  Now, that is not much of a problem if


      the difference is fairly straightforward and


      focuses on one single event.  But things can become


      a little bit more complicated if the analysis


      requires a combining event and combining trends


      across two or more events in order to reach some


      magical level of statistical significance.


                Now, the problem is that these groupings




      are frequently constructed after the fact, making


      it possible to include only events that showed the


      trend the investigator is interested in.  For


      example, if an investigator believed the drug


      increased the risk of a major cardiovascular event,


      he or she might first look at myocardial infarction


      and stroke, but, finding little difference here, he


      or she might be tempted to look at other related


      events; for example, not seeing a difference in


      myocardial infarction, an investigator might be


      tempted to broaden the definition of a myocardial


      ischemic event to include sudden death or unstable


      angina if the differences between the groups


      supported some predetermined judgment.


                Similarly, not seeing a difference in


      stroke, an investigator might be tempted to broaden


      the definition to include a TIA.  But the


      possibilities of grouping is very, very large and


      the possibilities of finding something, if you want


      to be creative, are also quite large, even though


      these differences may be related to the play of




                As a result, the definition of grouping


      may vary from study to study.  Now, some


      investigators try to fix this problem by setting up




      a uniform definition to be used across all studies.


      But when the definition is developed after a


      concern has been raised, those creating the


      definition have frequently already looked at the


      data or have communicated with those who have


      looked at the data, and know either consciously or


      subconsciously what kind of definition is required


      to capture the events of interest.


                The fourth, and what I want to focus on


      the most in my presentation, is the issue of


      interpreting comparisons of frequency of adverse


      events because the number of adverse events is


      small and, because they are small, they result in


      extremely imprecise estimates.


                Now, you may think that investigators


      generally understand the difficulties of analyzing


      small numbers of events.  For example, most


      investigators know that, when the number of events


      is small, the lack of an observed difference does




      not rule out the existence of a true difference.


      We have been taught that this should be apparent by


      looking at the confidence interval and, as you can


      see here, the confidence interval is very wide and


      includes the possibility of benefit and harm.


                So investigators, basically, consider


      these kind of data to be inconclusive.  But what is


      generally not appreciated is that, when the number


      of events is small, the confidence interval is


      necessarily so wide that it may not truly represent


      the range of values that would include the true


      effect of the drug.  As a result, even the finding


      of an observed difference does not necessarily


      prove the existence of a true difference.


                To illustrate this point, this slide shows


      the effect size and confidence intervals required


      to reach statistical significance in a hypothetical


      trial of 3,000 patients assuming a range from a


      very small to a very large number of events.


                Now, assuming the trial shows a


      statistically significant effect--that means that


      we are only going to look at this if a p-value,




      let's say, is less than 0.05--the smaller the


      number of events, the larger must be the treatment


      effect in order for this effect to be statistically


      significant and the wider the confidence intervals


      have to be.


                Put it another way, if the number of


      events is small, the trial will show a significant


      difference only if the treatment effect is very


      large and the estimate of the effect is very




                Unfortunately, when you look at adverse


      events in a trial, the number of events will always


      be small.  This is because the trial, as you know,


      was designed to provide enough data to examine the


      primary endpoint, the trial produces a very precise


      estimate of, but it is not powered to look at any


      other analyses and, therefore, at the end of the


      trial, you get generally a less precise estimate of


      the secondary endpoint and an extremely imprecise


      estimate of any specific adverse event.


                Now, you may ask, what is wrong with an


      imprecise estimate?  Well, imprecise estimates are




      fine if the intent is to withhold judgement until


      more data are collected to make the estimates more


      precise.  But imprecise estimates are problematic


      if the intent is to stop and reach a conclusion.


                That is because, when calculated in the


      usual manner, p-values and 95 percent confidence


      intervals are most easily interpreted in the


      context of a completed experiment.  Unfortunately,


      the adverse-event data generated in a typical trial


      is not the result of a completed experiment.  In


      fact, viewed from the amount of data needed for a


      precise estimate, the adverse-event data in a


      single study only represents a snapshot of an


      ongoing experiment to characterize the safety of


      the drug.


                As a result, performing an analysis of


      adverse-event data is akin to performing an interim


      analysis of primary endpoint data in an ongoing


      clinical trial.  Now, this is important because we


      know a fair amount of how to interpret interim


      analyses in a clinical trial and here I really must


      apologize to Tom Fleming because what I am going to




      review here very quickly is borrowed heavily from


      his extensive work in this area.


                But it is really important to think about


      small numbers of adverse events as an interim look


      on a global effort to characterize the safety of a




                Now, as you know, when you look at interim


      analyses in a clinical trial, one plots the


      treatment difference represented by a z-score


      against the amount of information that we have, and


      that is generally represented by the fraction of


      expected events.


                We start the trial at zero effect and zero


      information.  At the end of each interim analysis,


      we add a point until we get to get to the end of


      the study.  Now, if we have assigned an alpha of


      0.05 to the endpoint, we want to make sure that we


      evaluate the treatment difference seen at the end


      of the trial against an alpha of about 0.05 which


      generally corresponds to a z-score of about 2.0.


                Now, some might think, naively, that,


      during the course of a study, the observed




      difference between treatments will be so


      predictable that we would observe a linear march


      between the start of the study and the end of the


      trial.  But know that when the amount of data is


      small, things tend to bounce around a lot, so much


      so that early results can be very misleading.


                It is sort of like the situation of trying


      to predict the results of an election when only 1


      percent of the precincts have been reported and


      they are not even representative.  So, as a result,


      if we got excited about any difference in z-score


      more 2.0 early in the trial, we would be getting


      excited about effects that were not likely to be


      seen or sustained if we had more data even though a


      z-score of 2.0 would normally correspond to a


      p-value of less than 0.05.


                In fact, the smaller the amount of data,


      the more things can bounce around a lot, the more


      it is likely that what we will be seeing will be


      due to the play of chance.  Therefore, to prevent


      investigators from reaching a conclusion when the


      estimates are imprecise, statisticians,




      particularly Tom, have recommended that


      investigators refrain from getting excited about


      nominally significant z-scores when the amount of


      data is scarce.


                Specifically, they have proposed that


      boundaries must be crossed before we can feel


      comfortable that an effect seen early is likely to


      be present at the end of an experiment.


                Now, Tom, in particular, has proposed a


      curvilinear boundary like this.  There are many


      other boundaries that have been performed by


      others.  But this is very, very commonly used in


      the United States.  This represents a boundary with


      an alpha of 0.05 for a primary endpoint.  It sort


      of looks like this.  Because it is curvilinear, to


      be significant at the 0.05 level, the treatment


      difference must be extreme when the amount of


      information is small as would be the case early in


      the study.


                However, as the trial proceeds, treatment


      differences required to conclude that there is an


      effect at the 0.05 level decreases and become




      closer and closer to a z-score of about 2.0 at the


      end of the study.


                Now, this is a very different thought


      process and a very different approach than getting


      excited about a p-value less than 0.05 no matter


      when you observed it during the study.  For


      example, a z-score of 2.5--that is right


      here--would be meaningful if seen at the end of the


      study but it wouldn't be considered significant if


      seen early in the study even though the nominal


      p-value at this time is less than 0.05.


                Now, if the number of events is small, the


      difference would need to be far more extreme--say,


      a z-score up here--to be meaningful at the 0.05




                Here is a specific example.  This is an


      old cardiovascular trial.  This is the Coronary


      Drug Project.  It was carried out more than 30


      years ago.  It included a comparison of clofibrate,


      a lipid-lowering drug, and placebo on coronary


      events.  At four separate times during the study,


      the difference in favor of clofibrate was




      statistically significant at a nominal p of 0.05 or


      less.  But, at the end of the trial, there was no


      difference between placebo and clofibrate.  The


      difference seen early in the trial was related to


      the imprecision inherent when analyzing small


      numbers of events.


                In fact, if a boundary had been used in


      this study, at no time during the trial would the


      treatment effect have crossed the boundary and led


      to the conclusion that clofibrate was better than




                Now, let me say this kind of fluctuation


      early in a study is very, very common.  There are


      even examples that at treatment has been associated


      with a nominally significant adverse effect which


      later was reversed during the course of the trial


      and became statistically significant at the end of


      the study.


                Now, I should mention that the boundary


      that I have shown you is a boundary with an alpha


      of 0.05.  This means, when the boundary is crossed,


      the p-value for the treatment effect is less than




      0.05 not less than the nominal p-value that


      corresponds to the disease score that allowed the


      boundary to be crossed.


                Now, for each p-value or each alpha, there


      is a separate boundary.  The requirement for


      strength of evidence as it becomes more stringent,


      the boundary is shifted upward and to the right.


                You might ask why am I going through all


      this.  Because analyzing data derived in an


      underpowered trial raises the same concerns as


      analyzing data derived from an underpowered interim


      analysis in an adequately powered study.


                The cardiovascular field is replete with


      examples of how misleading small numbers of events


      can be.  Let me give you a few examples.  For


      example, in an early pilot trial, the ACE/NEP


      inhibitor, Omapatrilat, reduced the risk of a major


      cardiovascular event by 47 percent when compared


      with an ACE inhibitor.  As you can see, the


      confidence intervals are extremely wide because the


      analysis here was based on only 39 events.


                Later, a definitive trial was carried out




      that recorded nearly 1900 events.  There was no


      difference between Omapatrilat and the comparator


      ACE inhibitor on the same endpoint in the same




                Here is another example.  In an early


      pilot trial, amlodipine reduced the risk of a major


      cardiovascular event by 45 percent, small p-value


      but wide confidence intervals.  Later, in a


      definitive trial which recorded four times as many


      events, there was no effect of amlodipine on the


      same endpoint in the same population using the same




                There are even examples when the effect


      seen in a pilot trial was reversed when the


      definitive study was carried out.  Two examples.


      In two pilot trials, both in heart failure, one


      with the drug Vesnarinone, one with the drug


      Losartan, both drugs significantly reduced the risk


      of death--not a minor endpoint; death--by 50 to 60


      percent.  But these benefits were seen in trials


      that were each recorded fewer than 50 events and


      thus produced treatment estimates with extremely




      wide confidence intervals.


                When both drugs were reevaluated in


      definitive trials that recorded ten times as many


      events, both drugs were associated with increased


      risks of death, in one case, significant at the


      less than 0.05 level.


                Now, notice that the confidence intervals


      of the treatment effect in the definitive trials do


      not overlap with the confidence intervals of the


      treatment effect in the early pilot studies.  So


      here we have an effect, two examples, of an


      underpowered trial that showed a  significant


      benefit whereas the definitively powered study


      showed significant harm.


                Here is another example.  This is a


      meta-analysis of a small number of trials looking


      at the effect of magnesium in acute myocardial


      infarction.  A meta-analysis of a number of studies


      showed intravenous magnesium associated with the


      striking reduction in mortality, a 55 percent


      reduction in risk of death, but wide confidence


      intervals, a very small p-value, in a fairly large






                This effect appeared to be reinforced


      smaller treatment effect but wide confidence


      intervals and then, subsequently, in a definitive


      trial that recorded 4,000 deaths, there was a


      nearly significant adverse event of magnesium on


      the same endpoint in the same population.


                Now, again, please note that the


      confidence intervals of the treatment estimate in


      this definitive study do not overlap at all, with


      the confidence intervals of the estimates in the


      earlier moderately sized study, and not at all in


      the meta-analysis.  Again, this is really a


      reflection of the imprecision inherent in looking


      at small numbers of events.


                Let me give you one final example because


      it actually deals with an adverse effect.  In an


      early pilot trial with extended-release


      metoprolol--this is a study that looked at a very


      small number of events, about 20 events, showed a


      three-fold increase in the risk of hospitalization


      of heart failure in the metoprolol group compared




      with the placebo group.  Look at the confidence


      intervals here.  They go from about Washington to


      California, very, however, nominally significant


      treatment effect.


                When this trial was replicated in a


      similar population with exactly the same drug,


      exactly the same formulation, exactly the same


      dose, there was now a reduction in the frequency of


      hospitalization for heart failure.  Let me just


      emphasize, this was recorded as an adverse event in


      this earlier trial.


                So what have we learned from all this?


      Well, a couple of thoughts.  To achieve statistical


      significance in an underpowered analysis, the


      effect size must be extreme and the estimate must


      be imprecise.  Yet the more extreme the effect, the


      more imprecise the estimate, the less likely it


      will be reproduced in a definitive trial.  That is


      why I think, of all the things that we can worry


      about in looking at adverse events, the most


      worrisome is the imprecision inherent in the


      analysis of small numbers of events.


                Let me just close with a few final


      thoughts.  You might ask, based on all of this,


      what should we do.  Well, I think the first step,




      perhaps the most important first step, is to


      develop an approach to analyzing data in trials


      with small numbers of events which actually


      accurately reflects the true imprecision of the


      treatment effect estimate and its statistical




                Let me just emphasize one thing, and I


      just want to put this as a proposal.  In no way,


      would I propose this as a definitive solution but,


      to get the discussion going, this might be an


      interesting first way of thinking about this.


                The conventional way of comparing small


      numbers of events is to calculate 95 percent


      confidence intervals followed by the derivation of


      the p-value.  However, the conventional calculation


      of the confidence intervals incorporates into it a


      z-score that the investigator designates as the


      target value for statistical significance.  For


      example, most statisticians, in calculating a




      confidence interval, would simply use a z-score of


      about 2.0.


                And they would do that because that is the


      critical value for the z-score at the end of an


      adequately powered trial with an alpha of 0.05.  So


      what they would do is they would take this z-score


      and they will use it to calculate the confidence


      interval.  What a lot of people, I think, fail to


      realize is that this z-score is not the critical


      value for decision making if one looks early in the


      same experiment.


                Early in that experiment, the critical


      value for a z-score should be determined by the


      interim monitoring boundary appropriate for the


      information content, not the z-score at end of the




                Now, if one uses the boundary z-score in


      the calculation of the 95 percent confidence


      intervals, the confidence intervals here will be


      much, much wider resulting in a p-value that will


      no longer be statistically significant.  Now this


      is important because everyone talks about p-values




      at these meetings.  I showed you these data before.


      Conventionally calculated, the p-value would be


      0.002 meaning the likelihood of chance alone being


      2 in 1000.


                Well, if, in fact, if one recognized that


      the data here really result in a very imprecise


      estimate and one incorporates the thinking process


      of an O'Brien-Fleming boundary into this, as a


      reflection of this imprecision, then the confidence


      intervals now truly reflect the imprecision in the


      estimate and now the p-value is a lot interesting


      than it was before.


                Now, the use of boundary-adjusted


      confidence intervals would, I think, appropriately


      describe the great uncertainty inherent in the


      analysis of small-numbers events, hopefully


      markedly reducing the false-positive error rate.


                In spite of using a boundary-adjusted


      confidence interval, adverse effects that are known


      to be characteristic of specific drugs would


      generally remain statistically significant.


      However, this approach, and it is just a thought




      experiment, would not provide a way to interpret


      trends observed in imprecise data.


                So, lastly, let me just conclude with some


      thoughts about what we should do with worrisome


      trends in imprecise data.  The first thing we could


      do is believe in those that are biologically


      plausible.  However, we need to be very careful


      here.  Everyone knows physicians can always be


      relied on to propose a biological mechanism to


      explain the validity of an unexpected and


      potentially preposterous finding simply because it


      happens to have an interested p-value.  Anyone who


      doesn't believe this, you know, I would be happy to


      show you overwhelming evidence that this is the




                Second, is we could look for confirmatory


      evidence in other studies reminding that we


      shouldn't be selective.  But, even if every study


      showed the same trend, how would you know that you


      had enough evidence to reach a conclusion?  Some


      have proposed doing a cumulative meta-analysis in


      which each trial is considered to represent an




      interim analysis on the way to a final judgement.


                Indeed, Salim Yusef has proposed that, as


      each trial is added to the meta-analysis, that one


      use interim monitoring boundaries to interpret this


      cumulative meta-analysis.  This has, certainly, a


      considerable amount of appeal.


                Let me just emphasize.  Salim has, in


      fact, underscored the fact that the conditions here


      are not identical those that exist for a true


      interim analysis.  In the case of a true interim


      analysis, we generally know that the types of


      patients in studies are similar at all observation


      points.  Here it is different.


                In the case of a cumulative meta-analysis,


      the types of patients in studies differ across the


      various trials.  So, as a result, Salim has


      proposed that, when reaching a conclusion based on


      data that has been combined across trials, that a


      boundary more strict than 0.05 be used.


                Now, he has specifically outlined the


      importance of this using the example of intravenous


      magnesium.  I showed you the data on intravenous




      magnesium in myocardial infarction.  When the early


      trials with magnesium were carried out, the z-score


      of greater than 2.0 was crossed early.  As the


      cumulative evidence occurred, the initial boundary


      of 0.05 was crossed.


                But then a large study, when added to the


      other cumulative analyses, brought this treatment


      effect down to a 0 level.  So Salim, and others, in


      fact, have emphasized that, when you are using a


      meta-analysis approach and using intra-monitoring


      boundaries, that maybe one should require a p-value


      of less than 0.05 or even, perhaps, a small




                Let me say that most of the effects the


      committee has seen over the past two days would not


      come even close to meeting these criteria.


                Now, some of you may say, why not avoid


      all of this uncertainty and simply carry out an


      adequately powered definitive trial with the


      adverse event as the primary endpoint.  Is this


      crazy?  No; it is not crazy at all.  Sponsors


      pursue encouraging trends.  Most are disappointed,




      but they will pursue them.  Sponsors, therefore,


      should have an obligation to pursue discouraging


      trends realizing that most of them probably won't


      be confirmed either.


                On a definitive trial can address


      ascertainment and classification biases as well as


      concerns about multiplicity of comparisons and


      imprecision of the data.  However, can we really


      expect sponsors to pursue every adverse trend?


      There are some obvious limitations to doing this.


      Furthermore, if you could decide which adverse


      trend you wanted to pursue, how easy would it be to


      carry out the trial intended to definitively


      evaluate an increased risk of an adverse effect?


                Can you imagine the consent forms for the


      IRBs for such a study?  Some may say that we are


      being too stringent here, the that criteria of


      raising a safety concern need not be as stringent


      as the criteria for establishing efficacy.  But I


      am not so sure that the criteria for establishing


      efficacy and safety should be that different.


                As a rule, we are very strict in reaching




      conclusions about efficacy because saying that


      there is a benefit when there is none means that


      millions will be treated unnecessarily and subject


      to side effects and cost.  Now, although some might


      advocate being less strict in reaching conclusions


      about safety, please remember; saying that there is


      an adverse effect when there is none means that


      millions will be deprived of an effective




                In conclusion, the findings of controlled


      trials are most easily interpreted when they


      represent the principal intent of the study.  A


      non-principle finding is subject to many


      interpretive difficulties many of which we have


      reviewed; ascertainment biases, inflated


      false-positive rates due to multiplicity of


      comparisons and, the one I have emphasized the


      most, the imprecision of estimates inherent in the


      analysis of small numbers.


                I think FDA, industry and academia remain


      in a quandary as to how to respond in a responsible


      fashion to observe differences in the reported




      frequency of adverse events.  Let me just


      emphasize, my presentation shouldn't be construed


      as favoring one particular side in all the


      discussions that have occurred.  In my view,


      regardless of one's position, it is critical to


      understand the limitations of what we know and to


      resist the temptation to reach conclusions before


      we are justified to do so.


                I think only by recognizing our ignorance


      will we be able to take the first step towards


      developing a rational approach that is in the


      interest of all patients.


                Thank you.  I will be happy to answer any




                DR. WOOD:  Dr. D'Agostino?


                DR. D'AGOSTINO:  Thank you very much,


      Milt.  I have a couple of questions that I think, I


      hope, are relevant to our deliberations.  In terms


      of your sense of large and the idea of chasing


      after a safety event and making more out of it than


      one should, we have a study approved where there


      was a serious up-front prestated deliberation to




      make sure they had good ascertainment and


      adjudication of cardiovascular events, and they


      come up with 45 versus 25 events, carefully




                I am struck by that's being small, but I


      am also struck by the carefulness in which it was


      done, say, as opposed to the APD where they did an


      interim analysis that has those problems.  Could


      you comment on, say, the approved study?


                DR. PACKER:  I think that, when you have


      incomplete data, as you would if you have


      small-numbers events, you need to be a lot more


      careful about the thinking process.  That doesn't


      mean you can't make judgments.  It doesn't mean you


      can't incorporate a set of principles that would


      guide decision making by looking at the totality of


      the evidence and bringing to the process what you


      inherently believe.  I think that is what the


      committee needs to do today.


                What I really wanted to address, however,


      is how hard this is and that the normal


      reliance--as you know, clinical investigators,




      because they don't understand p-values, rely on


      them.  What I am trying to do is to explain that,


      in fact, we are less certain about what we know


      here than we, perhaps, should be.


                DR. D'AGOSTINO:  But that is on the


      approved, studies, it was reasonable, too.


                DR. PACKER:  I think you need to take that


      in the totality of the carefulness in which it was


      done, the prospective nature of it.  But, remember,


      in all the examples that I showed you, the trend


      seemed sometimes very striking trends in early


      pilot trials that were prespecified, adjudicated


      endpoints but, because they were small-number


      events with very imprecise estimates, the


      definitive trial was non-confirmatory.


                So just because it is up-front and




                DR. D'AGOSTINO:  That is my question, yes.


      That is my question.  You still end up with small


      numbers.  Let me have just a couple of other


      questions.  The second question is really bothering


      me very much in terms of how we would recommend




      trials.  If you decide--if the group decides and


      suggests to the FDA that there should be more


      trials, more randomized clinical trials, the


      sponsors are, then, going to have to go back and


      say, well, they are going to set up a trial saying


      the null hypothesis that the relative risk is 1.0


      versus the relative risk is not 1.0.


                Now, the best thing a sponsor can do is to


      run a very sloppy study and they will accept that


      null hypothesis because the confidence intervals


      will so wide and they will contain 1.0.  The


      alternative is to sort of do a noninferiority type


      idea that you end up the study, you end up with the


      confidence interval, and that confidence interval


      has to be below something like 1.3.


                Do you have advice for us if you did this


      sort of second approach?  We are dealing with rates


      like 1 percent.  Could we live with a 1.3 relative


      risk that you rule out, a 1.3 relative risk?


      People may be dying if you do that.  So how do you


      respond to that?


                DR. PACKER:  I wish I knew the answer to




      that.  I think that it depends on the type of


      adverse reaction.  It depends on the particular


      drug.  It depends on the vulnerability of the


      patient population.  All of these need to be


      factored together with the actual feasibility of


      doing the study.


                The one thing I would say is that one


      learns very little by doing a lousy trial.  So,


      doing a good trial is the only way to get a


      reasonable answer or reasonable estimate of the




                DR. D'AGOSTINO:  Just one more.  I will


      make it quick.  In these trials, in many of these


      trials, people just won't stay in the trial.  Can


      you give us some advice on how to deal with the


      drop-out--now, there are rules that you could say,


      the individual wants to leave, has decided to leave


      because the blood pressure is building up or


      because of G.I. problems building up.


                To say, we are only going to look at that


      individual for 14 more days after they leave, to


      me, is a problem because if the blood pressure is




      building up, they may be on their way and it may


      take two or three months before they get an M.I.


      and so forth.  So you have got the sort of


      dropouts, terminations, that are part of the


      protocol but you also have the individuals who just


      stop coming.  And they could be substantial.  So,


      any advice to us?


                DR. PACKER:  Gee, as you know, when we do


      trials for superiority, the effort that we put into


      adherence is extreme.  We really want people to


      stay on treatment and we organize the trials to do


      everything we can to ethically and reasonably


      maintain adherence.


                I take your point that, if the trial were


      a noninferiority trial, it is possible that the


      investigators and sponsor might be less motivated


      recognizing that poor adherence works in their


      favor.  I think that there needs to be a reasonable


      effort--I mean, you can maintain adherence in most


      trials if you really, really want to.


                DR. D'AGOSTINO:  Thank you.


                DR. WOOD:  I suspect we are not going to




      solve that problem today.  Dr. Shapiro?


                MS. SHAPIRO:  Just a comment on your


      comment.  We all know, of course, that the Federal


      Regulations require that participants be allowed to


      withdraw and not be badgered into staying.  But


      what I really wanted to talk about was your


      observations about how it is wrong to suggest that


      we should not chase safety quite as rigorously


      because we will, then, deprive ourselves and others


      of information and access to effective treatment.


                I don't think it is as simplistic as that,


      in that, when we are looking at potential harm or


      safety problems, we have to look not only at


      likelihood that it exists but prevalence and




                So I think that your response to that


      approach has to take account of those factors as




                DR. PACKER:  Let me try to reframe my


      response.  You can't isolate benefit from risk.


      The judgment as to whether a drug should be used on


      an individual basis or on a population basis has to




      be the relative value of benefit to risk.  You may


      decide that you don't even want to pursue a safety


      trend in a non-fatal event when you know the drug


      prolongs life.  That would be a very reasonable




                On the other hand, you might want to


      vigorously pursue a very serious safety is in a


      drug for a symptomatic or cosmetic condition.  So


      the risk-to-benefit relationship is the one that


      has to be vigorously defined.


                MS. SHAPIRO:  Right.  I am sure you will


      agree with this; you also have to factor in


      prevalence of the condition and likely use of that


      drug in the population.


                DR. PACKER:  That's right.  But it is


      always--it is risk to benefit.  The goal here is


      not to say that the risk-to-benefit relationship


      can be altered, simply because you want to


      emphasize one part or another, has to be in the


      context of the clinical problem and looked at from


      the patient point of view.


                DR. WOOD:  Dr. Cush?


                DR. CUSH:  I have two questions.  One, I


      need some education.  You were frequently referring


      to very wide confidence intervals where it didn't




      seem so wide.  It was only, like, 0.3 and 0.4


      where, obviously, when it ranged from 1.0 to 8.0,


      that is very wide.  But you used those terms in


      both situations.  Could you explain the differences




                DR. PACKER:  Actually, I have used "wide"


      to refer to extremely wide, moderately wide and




                DR. CUSH:  And narrow would be--


                DR. PACKER:  Narrow is less than wide.


                DR. CUSH:  Okay.


                DR. PACKER:  Let me try.  All the examples


      that I showed you that I characterized as wide


      truly reflected estimates that had a high degree of


      uncertainty associated with it.  On the benefit


      side, benefits that range from an 80 percent


      reduction in risk on the high side to a 20 percent


      reduction in risk--remember, and I guess I should


      emphasize this and I guess Tom would reinforce this




      dramatically, the concept of how these curves


      looked like in terms of the width is not


      symmetrical on both sides of 1.0.  The lowest you


      can go below 1.0 is 0.  So wide confidence


      intervals below 1.0 can be 0.2 to 0.8.  Those would


      be wide confidence intervals.  There is no limit


      for estimates greater than 1.0, so you can have 1.0


      to 24 on the adverse side of this.  So you have to


      sort of think about what is wide differently when


      you are looking at estimates below 1.0 than when


      you are looking at estimates above 1.0.  Maybe that


      would be helpful.


                DR. CUSH:  That does help.  Secondly, you


      have told us that when we are dealing with


      low-numbers adverse events and that being very


      imprecise and hard to make conclusions from, is it


      even less valid or even greater error to, then,


      take that data derived in one situation, like in an


      Alzheimer's trial, and then try to generalize that


      to the general population?


                DR. PACKER:  But we do that all the time.


      There is a general sense that efficacy is not




      extrapolatable across diseases but safety that is


      not disease-specific is extrapolatable.


                Let me put it this way.  If we didn't do


      that, the problem that I put forward would be


      really impossible, really impossible.  So I


      actually feel comfortable extrapolating safety data


      across indications as long as the safety item is


      not disease-specific.


                DR. WOOD:  Dr. Shafer?


                DR. SHAFER:  Thanks.  That was actually a


      very informative presentation and I can confirm the


      distance from Washington to California.


                There are really two questions here that I


      think we need to bifurcate.  One of them involves


      the scientific question of getting at the truth,


      whatever that is.  I appreciate everything you say


      and, prior to a drug being approved, at least


      ideally, there would be adequate time and resources


      to do exactly what you are proposing.


                But there is a second question which is


      how to inform clinical and regulatory decision


      making based on imprecise information following




      approval because, in that setting, a daily decision


      is being made by patients and their physicians as


      to whether or not they need to take the drug.


                One question about how to approach these


      sorts of imprecise data when, in fact, a daily


      decision is occurring, is can you take the


      confidence bounds for both the risk and the benefit


      and integrate those over the public-health hazard


      and the public-health benefit to try to incorporate


      the entire--both the point estimates but also the


      uncertainty about them into the regulatory


      decision-making process?


                DR. PACKER:  Oh, wow.  Just a couple of


      comments.  One, the precision of the estimates on


      efficacy is almost always more precise, much more


      precise, than the estimates on safety.  So you have


      this very precise estimate on efficacy.  You have


      this very imprecise estimate, in general, on


      safety.  And you try to sort of integrate them and


      you have to now weigh them because it could be that


      the efficacy thing you are looking at is really


      important and the safety is sort of not very




      important.  Or it could be other way around, the


      efficacy is sort of very small--the efficacy is


      small, but the safety is a big risk.


                DR. SHAFER:  That is exactly the question.


                DR. PACKER:  You might think that someone


      in the world might be clever to create a


      statistical model that would allow that to take


      place.  I am actually much more comfortable with


      people doing that than statistical models doing


      that.  Somehow, people have the ability to


      integrate all of this, especially a group of people


      have an ability to integrate this, much better than


      any mathematical model.


                I would be very uncomfortable if someone


      were actually to propose a mathematical model that


      replaced the human, very important human, element




                DR. WOOD:  Dr. Farrar.


                DR. FARRAR:  Every example that I have


      seen to date in looking at the risks in


      overinterpreting data seem to go from being a


      positive study to a negative study.  I wonder about




      the other way around and whether there are any


      inherent differences in thinking about it the other


      way around, the bottom line being that if you have


      ten studies that show no safety issue with a


      well-measured process, whether you can then say,


      well, maybe the 11th study is going to show it




                DR. PACKER:  I think you need to find out


      how much information there is in each study, how


      easily or how appropriate it is to combine the data


      across the studies to determine how precise the


      estimates, after you have collected and integrated


      all of the data, and put that into a judgement as


      to how much data you actually need to be confident


      about the precision of the estimate.


                So there isn't a uniform way of thinking


      about.  It is not like you will know it when you


      see it.  There is  some guidance, some mathematical


      guidance, that needs to be incorporated into the


      thinking process.


                DR. WOOD:  Dr. Domanski.


                DR. DOMANSKI:  You know, I am not nearly




      as sophisticated, really, Milton, as you are about


      this sort of thing nor about some of the people in


      the room, but I am a little bit concerned about


      some of the examples.  I will give you one.  I


      don't think ISIS 4 was a definitive trial of


      magnesium, because I know something about that.  We


      did the MAGIC study which was a very large study.


                Like ISIS 4, it was negative, but ISIS 4


      was substantially different methodologically in


      terms of when that was given.  I think that example


      actually, to be honest, is fairly misleading as a


      result.  I think it is an example of a stopped


      clock is right twice a day.  But, yeah; it came out




                But I a worried if that is the basis for


      this--that kind of thing is the basis for this


      discussion across more of the landscape.


                DR. PACKER:  Let me emphasize, Mike, that


      I knew that if I picked one study and gave you an


      example of one st that I would be at great risk


      because everyone knows something about these


      studies more than what I know about these studies




      although some of the studies I actually mentioned


      were studies I was personally involved with and


      think that I know a little more about them.


                So I just wanted to--I would not


      overemphasize--and, in fact, one might


      appropriately underemphasize--the magnesium


      example.  But the other examples, time and time and


      time and time again.  It is just like reaching


      conclusions during a very early part of a study


      based on interim monitoring.  When you have small


      numbers of events, the estimates are very imprecise


      and may not reflect what happens at the end of a


      complete experiment.  That is just a general




                I take your point about ISIS 4 but the


      number of examples here is just overwhelming.


                DR. WOOD:  It is important, Milton, to


      remember, we have replication for two of these


      drugs and these safety signals here.  So it is not


      just single studies.


                Dr. Furberg.


                DR. FURBERG:  Milton, I think that was a




      great presentation.  I think, for balance, it would


      be nice if you can have examples showing the other


      side, how trends in smaller studies were confirmed


      in definitive trials.  And I know plenty of those.


                DR. PACKER:  Oh, yes.


                DR. FURBERG:  That was never discussed.


      You are painting a dark picture saying you can't


      trust smaller studies.  You are right.  You never


      know where you are going to end up and you need to


      be careful.  But don't say that you can't rely on




                DR. WOOD:  I was actually on the advisory


      committee that turned down Vesnarinone, that looked


      at that study.  There were lots of issues that came


      up at that time that led us to do that.  So it


      wasn't just that there was a study that was


      compelling and that people went with that.


                Dr. Nissen?


                DR. PACKER:  Curt, let me just say that--I


      think your point is very, very important.  What I


      have not done is shown many, many examples of


      interim monitoring in trials where the early




      results were reflective of the endpoint.  I have


      not shown a whole host, probably more than I could


      think of, of all of the pilot trials where the


      initial trends encouraged someone to pursue it and


      that the second study was, in fact, very




                Let me just make my point clear.  It is


      just not as reliable as we think it is.  It is not


      that it is worthless.  I do not want to say that.


      If I have implied that, then I do not want to imply


      that.  I just want to say that the risk of error


      early when you have small-number events is much,


      much greater than when you have a much more precise


      estimate at the end of the trial.


                My plea here is that when you don't know,


      the best thing you can do is say, "I don't know."


      And that is my only plea.


                DR. WOOD:  Milt, when you have two trials


      that replicate one another, with a p-value of less


      than 0.05, if that was an efficacy endpoint, we


      would approve on the basis of that; correct?


                DR. PACKER:  That's right.


                DR. WOOD:  But you are telling us that,


      when it is a safety endpoint, we should not act on


      that.  I think it is counterintuitive.




                DR. PACKER:  No, no, no.


                DR. WOOD:  Hang on.  That seems to me


      counterintuitive.  We have, for two of these drugs,


      two randomized trials that replicate the outcome.


      In three of the four trials, the outcome was


      predefined, adjudicated and so on.  That is about


      as good as any drug that has been approved on the


      U.S. market that I can think of.


                DR. PACKER:  Let me just add one


      dimension, Alastair, to the thinking process and


      that is that when you have a p less than 0.05 on


      two trials, on the primary endpoint because it is


      efficacy, you have two trials that were designed


      for the endpoint and have fairly narrow confidence


      intervals and precise estimates.


                That is not the same concept as having a p


      less than 0.05 on two imprecise estimates which are


      combined together.


                DR. WOOD:  No; I understand that very




      well.  I think we all do.  The issue here is both


      of the second trials--both of the second


      trials--were designed to test the safety issue that


      was in the first trial even though they were


      efficacy studies.  So it is not like they were just


      two trials that fell on the ground from Mars that


      arrived with something.  These were designed, at


      least according to the sponsors, to check for that




                So I think you are overselling the point a




                Let's move on.  Dr. Jenkins?


                DR. JENKINS:  I found the presentation


      very interesting and I wanted to probe a little bit


      further on the APPROVe study because that is the


      one that I think we were feeling very comfortable


      with the finding in APPROVe.  Yet, I went back to


      Merck's presentation, and their prospective plan


      was actually to combine three studies that were


      going to be placebo versus rofecoxib in three


      different populations.


                Their plan was to have 25,000 patients to




      evaluate the cardiovascular signal.  Now, in


      APPROVe, presumably, they had stopping rules that


      the Data Safety Monitoring Committee saw an extreme


      effect that met those criteria so they stopped the


      study.  But I am just interested in hearing your


      thoughts about how should we interpret APPROVe


      where the stopping rule is met for an individual


      study when the prespecified plan was to have three


      studies combined for 25,000 patients.


                DR. PACKER:  Gee, I must say that I am


      delighted to have everyone ask me the hard


      questions for this afternoon.  I sort of think that


      this is what this committee has to do.  I only


      wanted to add a dimension to the thinking process


      here.  I don't come with any answers on how to put


      all of the data together.  All of the points on how


      to synthesize these data, I am very comfortable


      with the human process of doing so as long as the


      human process incorporates an understanding of how


      difficult and imprecise this is and the fact that,


      in the past, although it has led to predictions


      that came true, it also led to predictions that did




      not come true.


                DR. JENKINS:  I think, more specifically,


      the point I was trying to get you to comment on is


      not the overall interpretation of the rofecoxib


      data but the fact that there was a plan for 25,000


      patients in three studies.  What I am trying to


      understand is how should we, then, interpret a


      finding from one of those three studies where an


      interim analysis crossed the stopping boundary and


      met the criteria for stopping the study.  What


      weight should we give to that finding in that


      single study?


                DR. PACKER:  I don't think there is a


      precise answer to that.  Any time you deviate from


      your preplanned attack on the conduct of analysis


      of a trial, you weaken, to varying degrees, the


      precision of the estimate and the confidence you


      have in the data that you are looking at.


                DR. WOOD:  Dr. Nissen?


                DR. NISSEN:  Milt, there is an additional


      subtlety here.  Let me see if I can drill down with


      you on it.  What we have here is a class of drugs




      where we have multiple trials within the class.  So


      what we are asked to do is not necessarily, in some


      respects, for each individual drug, say, well, do


      we have replication or not.


                But if we take the position that this is a


      class effect, then we have got four, or perhaps,


      five trials.  This came up once before.  It was


      kind of controversial.  I think you may have been


      on the committee at the time when we had the


      angiotensin-receptor blockers for renal protection.


      What the two companies did with two different drugs


      is they stipulated that the other could use the


      data from the other company's trials as supportive.


                So the reason that this is really much


      harder is that we have a lot of trials here.  We


      may not have reached all the evidence in an


      individual drug, but we have trials across the


      class of drugs.  I wonder if you have any thoughts


      about this because it is obviously a difference


      between studying a single agent and studying a


      class of agents.


                DR. PACKER:  I think that, Steve--I mean,




      that is why the process works best when there are


      human beings involved in the thinking process.


      There is no predetermined sense that one should


      bring to the process--that you confine the analysis


      only to one drug.  What you should allow yourself


      to do is look at the data with one drug, look at


      the data with drugs that you think are related.


                If there are data that you think are in a


      drug that really isn't related, you might want to


      analyze that separately or do it both ways to see


      if it is consistent.  There is no statistical


      formula that can guide the very important human


      process here.


                My major point is that the precision that


      most clinical investigators think exists here isn't


      as precise as we think it is.  But that doesn't


      mean that you--and Curt would emphasize this--that


      doesn't mean that you can't put together your own


      picture of the totality of the data and bring to it


      a sense of whether it reaches some critical level


      of concern.


                In the absence of precision, you have got




      to do that.  But don't forget inherently that the


      data are imprecise.


                DR. WOOD:  Curt, do you want to say


      something else?  No.  Then let's move on.  The next


      speaker is Bob Temple who we are going to confine


      to his seat.


                DR. TEMPLE:  Alastair, I have a question.


      What am I supposed to do about my slides?  Can


      someone show them for me?  I will delete many of




                DR. WOOD:  Okay.  You can come up here if


      you do it quickly.


                DR. TEMPLE:  I don't care where I'm from.


      I really don't.


                DR. WOOD:  Then Kimberly will work the


      slides for you.


                DR. TEMPLE:  Okay; if Kimberly will do




                 Issues in Projecting Increased Risk of


            Cardiovascular Events to the Exposed Population


                DR. TEMPLE:  I was not in any way trying


      to address the main issues the committee is




      grappling which is about what to do about these


      drugs.  But it seems to me you can't help noticing


      that there is some data we would all like to have


      that we don't have and that is what I was trying to




                Obviously, the main thing we are worried


      about is the effect of the COX-2-selective NSAIDs


      on cardiovascular outcomes, notably death, stroke


      and heart attack.  But are particularly interested


      in the single drug effects, whether they are all


      the same.  We are interested in whether we are


      looking at true class effects of differences.


                We also can't help noticing there is not a


      lot of long-term data on the nonselective NSAIDs


      and, of course, has been pointed repeatedly, some


      of them are sort of selective anyway.


                There is major interest in possible


      differences in the subpopulations that might be a


      different risks.  I think there are mechanistic


      considerations, how much of this is really likely


      to be platelets and could there be a blood-pressure


      effect.  The importance of that, to me, is that it




      is not quite clear what to do about platelet


      effects, but, conceivably, you could manage a


      blood-pressure effect if that was a problem.


                There is a lot of importance and interest


      in the dose and dose interval.  And it is important


      to think about how long studies have to be to


      detect these things.  Obviously, some of trials


      seem to have shown things in a matter of seven or


      eight months.  There is some suggestion that some


      of the effects need much longer to detect.


                Skip the next one.


                With respect to cardiovascular effects,


      the main question is whether everything is really


      answered.  You know, there are lots of studies, as


      Alastair was pointing out.  They are not perfectly


      consistent, maybe, but there are a number of


      studies with a number of drugs that seem to be


      showing the same thing.


                I guess, to me, they don't seem entirely


      consistent.  There are a number of possible reasons


      for that.  One is that there really are differences


      between drugs, or at least between doses.  Another




      is that even the best controlled studies sometimes


      give different answers.  Another is that small


      effects are difficult to evaluate in epidemiologic


      and even controlled studies.  Then the last is that


      the effects may be population-dependent.  That has


      been discussed.


                So it does seem to me there is more to


      learn.  Skip the next.  We all know that.  Platelet




                One of the things that seems important to


      pin down and I don't think it has been pinned down


      yet is the possibility that blood pressure is a


      significant part of all this, that there is some


      impression that Vioxx has bigger blood-pressure


      effects than the other drugs, but I don't think


      there is what we would call adequate data on the


      effects of all these.


                By adequate data, I mean data that gives


      you information about the effect of drug over the


      entire dosing interval, that has pinned down dose


      response and that has pinned down the effect of


      different dosing intervals.  There is an




      impression, though, that these drugs can reverse


      the effect of other anti-hypertensives, perhaps,


      especially, ones that work through the renal and


      angiotensin system.  They seem to have, at least


      some of them, an effect on blood pressure generally


      and then there are isolated reports of hypertension


      in trials reported as adverse reactions, clearly


      more common in the treated groups.


                I have a bunch of slides showing that


      elevated blood pressure is bad for you.  You can


      deduce that from epidemiologic effects, from a


      mountain of clinical studies.  The most recent


      study that of interest, which I will not


      describe--keep going--in detail is a study that


      Steve Nissen knows about called CAMELOT which you


      can read as saying that a change in blood pressure


      of even 5 millimeters of mercury systolic and 3


      diastolic might have a reduction of about


      33 percent in the kinds of events we are talking


      about in people whose diastolic pressure is only


      about 100.


                That is not definitive.  This is a subset




      of the data and you can look at my slide to see


      what I did.


                As I said, we don't know as much about the


      blood pressure as we should.


                So a crucial question is in the larger


      assessment of cardiovascular effects; what can we


      really study more.  My own view is that, given


      VIGOR and fairly consistent epidemiologic findings,


      it would be difficult to study 50 milligrams of


      rofecoxib.  I doubt you could write a proper


      informed consent.


                I take Milton's concern to heart but I


      guess my own view is there is probably enough


      information about that.  But what you could with


      respect to other things depends on what you




                Suppose you believe that the


      cardiovascular risk of 200, 400, of celecoxib is


      not entirely clear.  One polyp study says yes and


      other studies are not so clear.  And you believe,


      also, that a class effect is uncertain or, more


      particularly, that the effect might not apply to




      certain doses and certain dose intervals even if


      you are inclined to believe that the class does


      have a problem.


                If you also believe that more needs to be


      known about the long-term use of all NSAIDs,


      including those that are nominally COX-2-selective


      and those that are not, if you believe that new


      COX-2-selective agents conceivably could be


      developed with appropriate information, and if you


      believe the pharmacology gives hypotheses that need


      to be tested, not necessarily just believed--sorry


      Garret--then here is what you might be able to do.


                Again, I am not, in any way, saying who


      should do this.  This will be a massive


      undertaking.  But it does seem to me that there is


      information we all collectively need as a


      community.  So I am calling it an ALLHAT study for


      anti-inflammatory drugs.


                This is just one of what people could


      dream up as what might be compared.  The drugs, it


      seems to me, one might think about putting in it


      include ibuprofen, which we think probably ought to




      be neutral, not bad.  It may not have the platelet


      effects you want.  Naproxen--I am embarrassed to


      say this but I am letting myself be affected by the


      epidemiology studies.  Naproxen sort of looks good.


      You might even say it is at least a placebo, but I


      am not quite ready to say that.


                Diclofenac seems a good model of a regular


      NSAID that is really COX-2-selective, at least to a


      degree.  Celecoxib possibly at more than one dose,


      although, maybe for caution, one would want to


      think about the lower dose first.  Then I have two


      other groups that I will be interested in people's


      comments on, and I am not totally sure you could


      bring these off.


                But could one include an aspirin full-dose


      study.  We know it is an effective agent in


      arthritis accompanied by a proton pump inhibitor.


      Now, you would have to first show that proton pump


      inhibitors really do block the ulcerogenic effects


      of aspirin.  That is a short-term study and maybe


      one could do that.  So I will be interested in


      whether people think you can bring that off.


                The reason for doing it is we know the


      effects of aspirin are not unfavorable and we think


      they are probably favorable in at least many




      populations, in populations at high risk and


      probably not unfavorable in people at low risk.


                The last one that seems worth considering,


      and my understanding is that, in many parts of the


      world, at least osteoarthritis is treated this way,


      to use acetaminophen plus codeine added as needed


      and try to do something about the constipation.


                That would be as close to a true placebo


      group as I think you can get in a setting like


      this.  So it seems quite interesting.


                It is worth saying if one had a new single


      agent, my suggestion, and one still thought that


      drugs like this should be developed, that the


      single agent might be compared to naproxen and I


      would still hope for one of the other last two


      comparisons as a true placebo.


                Obviously, these are all people who need


      chronic pain medications.  You would want O.A. and


      R.A. stratified.  I don't believe you could use the




      APAP group for rheumatoid arthritis but others may


      not agree with that.  You probably want to study a


      range of cardiovascular risks but you probably


      would want to study the lower-risk people first.


                The reason I say that is anyone with known


      coronary-artery disease really has to be given


      aspirin just because that is part of treatment and


      it isn't clear yet, to me, how aspirin interacts


      with the COX-2-selective drugs.  You would think it


      would make them unselective but the data don't seem


      to necessarily say that.


                A good question is how big the sample


      would have to be and that depends on what you want


      to find out.  If you are really trying to compare


      the drugs with a true placebo, they wouldn't have


      to be that large to rule out, say, a two-fold risk


      or something like that.  We have seen studies with


      about 1,000 per group that have distinguished


      between drugs.  So that is not so huge.


                But if you really wanted to get at whether


      one drug is a little bit different from another,


      you are talking about studies of massive kind.  I




      have asked various numerically qualified people and


      the general impression is that if you wanted to


      rule out a 20 or 30 percent difference, you are


      talking about 50,000 per group.  That is beyond my


      hopes even for ALLHAT 2.


                Obviously, the outcomes of major interest


      are cardiovascular death, stroke, AMI and bleeding.


      I have heard some thoughts that maybe heart failure


      should be looked at in addition but I wouldn't make


      that the primary endpoint.  I think you can look at


      that separately.


                A big problem is what to do about blood


      pressure.  My first thought was that you would


      monitor it and treat anything over 120 over 80, but


      that really isn't standard practice.  So a question


      I would raise is whether one could leave people to


      go to 130 over 90, would that be acceptable.


                A question one could raise is why do this


      at all?  Do you really need these drugs?  We have


      heard fairly strong feelings that G.I. intolerance


      is not trivial.  But my answer is more that we


      really don't know enough about the whole range of




      these drugs.  There is no question that people are


      going to get something for their arthritis.  I am


      not entirely comfortable with looking at the data


      and saying we know what we need to.


                You could sort of deduce that naproxen


      usually looks pretty good.  It usually beats what


      is there except we just heard about a study where


      it was a little worse.  But it is not clear where


      ibuprofen comes.  It doesn't show the same thing.


      It seems to me there is a serious population need


      to find out about these things and to understand


      more whether all selectivity is the same.


                We have been through diclofenac at length


      and it is not clear what one needs.  So I think the


      idea of doing a large study has weight.


                If you believe that it is really all


      settled, that cardiovascular risk is clearly


      increased with all of the COX-2-selective agents,


      ignoring for now which ones are actually selective,


      there still are things one might want to know.


                It might be of interest to do a study that


      still would have the ibuprofen and naproxen groups




      and might still have my aspirin or APAP groups.


      One might consider trying a celecoxib with the


      addition of aspirin.  I know the results of that


      have not shown that any adverse effect seems to be


      mitigated, but that still doesn't make much sense


      and it might be something one could still want to


      test.  It would seem that if you added aspirin to a


      selective agent, you ought to have a de facto


      unselective agent.  Of course, that presumes


      mechanism and you shouldn't presume mechanism.  You


      should test it.


                Anyway, those are my thoughts.  I think my


      main point is that there is really a very important


      need for better information on the whole array of


      these drugs and the kind of study needed to do that


      is mind-boggling large.  However, people are


      already undertaking studies with 25,000 and 30,000


      patients already.  So it is not as outlandish as I


      would have said it was before we started this




                Thank you.


                DR. WOOD:  Okay.  I am just interested,




      why didn't you suggest a PPI with naproxen?  For


      your ALLHAT study, why didn't you suggest a PPI


      with naproxen?


                DR. TEMPLE:  That is a fair question.  I


      think the answer on--what did I suggest it with?


                DR. WOOD:  With aspirin.  It doesn't




                DR. TEMPLE:  I will tell you the reason.


      Full-dose aspirin is just plainly impossible to use


      because of massive G.I. intolerance.  I believe,


      historically based, it is worse than we expect with


      naproxen.  So I thought you had to do it there


      urgently.  You could do it with naproxen, too.


      That would be okay.


                I have to point out that we do not have


      definitive labeling or evidence that those drugs


      really do prevent this but we have heard about some


      studies that suggest it.  I do think that is an


      early thing to discover.


                DR. WOOD:  Okay.  Understood.  Let's move


      straight on to Bob O'Neill's presentation who also


      is going to do it from his seat.


                  Issues in Projecting Increased Risk


           of Cardiovascular Events to the Exposed Population


                DR. O'NEILL:  I won't go through the




      slides.  I might point your attention to a few of


      them.  I will try and do this in five or ten




                DR. WOOD:  Do you want us to have the


      slides up, Bob?


                DR. O'NEILL:  What I was asked to do is


      essentially provide a framework.  This is a very


      difficult problem of projecting risk to the


      population.  Very little has been published about


      how to do this appropriate so I was intending to go


      through sort of the logic and the framework of how


      you might think about this.


                It requires the integration of exposure


      data at the national population level and it needs


      information relative to how long people are on


      drugs and it uses information from the clinical


      trials as well as from the epidemiology studies to


      the extent that they are relevant to the question


      that is being asked.


                This is a very difficult problem.  It was


      not intended to give any estimate, any single


      number.  It was intended to show how hard it is to


      get there and, at the end of the day, how variable


      and sensitive the estimate might be to all the


      assumptions you have to make.




                So I used the Vioxx VIGOR and APPROVe


      studies as an example of the process that one might


      go through.  I made the point that event


      definitions and many things matter.  But I guess if


      there is anything that I would like people to take


      home is that time matters.  The hazard rate


      matters.  And the hazard ratio matters as a


      function of time when you do any of these




                I would just recall two slides.  One would


      be the VIGOR study which is Slide 12 so that


      everybody could remind themselves and Slide 16.


      The VIGOR study shows a separation of curves.


      Behind that is what is called a hazard rate.  I


      believe the data supports that the escalation of


      the risk increases with duration of exposure. 




      Merck and we have talked about this in the past and


      sort of have different views of this, but we seem


      to feel that that risk does escalate.


                That does not mean that there is no risk


      in that picture early on.  I think David Graham has


      made this point that it may be a power issue but,


      nonetheless, it is what it is and I am not


      convinced that the epidemiological studies at this


      stage add anything to our knowledge about early


      risk for the points I made yesterday because I


      think time zero matters in terms of looking at the


      risk, in terms of how long you are on.


                The next slide is Slide 16 which is the


      APPROVe study.  Similar pattern, only delayed a


      year.  So instead of the curve separating at


      approximately six months, four months, they


      separate a little later on.  The idea here is that


      the relative risks that are summary relative risks


      for both of these trials, for VIGOR, for thrombotic


      event, it is approximately 2.28 and, for APPROVe,


      it is approximately 1.92 for confirmed thrombic


      events is an average relative risk averaged over




      all the time points so that the relative risk at


      different times is a function of time.


                That is an important concept when, then,


      you go and you look at the national projection of


      how many people are exposed for how long a period


      of time.  I won't go through that because they are


      in the slides.  But we have no data in the United


      States to do this.  So we did a projection based


      upon the IMS National Prescription data, another


      separate database that allowed us to look at how


      long exposure, success of exposures, might be to


      get an idea of how long individuals may stay on the




                Surprisingly enough, a very small


      percentage of the millions of people that are


      prescribed the drug are on the drug for more than a


      year.  That is in one of the slides on the


      Caremark.  So what this meant is you multiply all


      these estimates which, essentially, are time.  We


      calculated a time-specific difference in absolute


      incidence rates for the different trials, made a


      projection and essentially used in that projection




      a number of assumptions many of which are not


      verifiable, and then came up with some crude


      estimate of what might even be an upper bound on a


      confidence interval for any estimate.


                We probably don't believe it because there


      is no real methodology to support that estimate but


      nonetheless to say that an estimate is very




                So the bottom line, and the conclusions


      here, given the time frame, is that purpose of the


      projection effort was essentially just to


      provide--this is the last slide; it is Slide 47--it


      is essentially to provide a framework for


      considering how you would think about developing an


      estimate and to provide a range of estimates and,


      also, essentially, to point out that there are many


      limitations to any estimate that you would provide.


                We are not supporting any, or putting


      forward any,  one estimate but I do believe that we


      need to understand this problem by moving away from


      summarizing nonproportional hazards in person


      years.  It is not a good idea.  It begs the




      question as to whether the risk is constant or


      whether the risk is dependent on time.


                If there is one problem with the


      epidemiological literature, it constantly reports


      person-year risk as opposed to every one of the


      clinical trials we have seen presents a


      Kaplan-Meier curve that looks at the time-dependent


      risk.  Unless you understand that, you can't come


      to grips with comparing one drug to another.


                You can't come to grips with comparing a


      drug to itself.  If you look at the VIGOR study


      relative to the approved study, they are in


      different populations.  One is in a population of


      R.A.  The other is in a polyp prevention trial.


      One is at 50 milligrams.  The other is at 25




                There are many things that need to be


      sorted out.  So the point here is that this is a


      very difficult exercise to project.  This was just


      a framework to say, here is how you might think


      about it.  Most of the estimates are fraught with a


      lot of danger and have to have many caveats placed




      on them were you to bank on any one estimate alone.


                That is pretty much my bottom line.


                DR. WOOD:  Bob, just to make sure


      everybody in the audience understands what you are


      talking about with estimates, what you are talking


      about are absolute numbers of people--


                DR. O'NEILL:  An estimate of the absolute


      numbers of individuals that might have been at risk


      and had these events if they were exposed--if they


      were exposed.  This is a model projection.


                DR. WOOD:  Right.  I just wanted to


      clarify that.  So it is not the relative risk.  It


      is not the same as what Milt was talking about.


                DR. O'NEILL:  Right.  Exactly.  This is a


      long discussion to get into the concept of


      attributable risk in its own right.  Given the


      time, I wouldn't be able to do that.


                DR. WOOD:  So you are talking about the


      number of people, these sort of numbers that are


      out there.


                DR. O'NEILL:  Right; to go through that


      exercise.  It is hard enough to interpret a single




      study or a collection of studies.  To go to an


      estimate of what the increased number of events


      might be at the exposed level is what this effort


      was about, all the different, five different


      separate interlinked but disparate databases that


      you would need to get there to make this kind of an




                DR. WOOD:  Okay.  Good.  Thanks.


                DR. WOOD:  We will take a few minutes, a


      very few minutes, for questions to the last two


      speakers and then we will take a break and be back.


      So the panel needs to remember that they are eating


      into their break.


                Dr. Nissen?


                DR. NISSEN:  Quickly, Bob, Bob Temple.


      The difficulty, of course, in the ALLHAT study is


      that it is very--it seems unlikely that it will get


      done.  So the question is, putting some constraints


      on this, and I thought about this last night in


      some detail into the wee hours of the morning, it


      seems to me that what we really need for this class


      of drugs is a reference standard.  That reference




      standard, unlike many studies, can't be placebo


      because you can't treat arthritis patients with




                So I would submit to you that, if you are


      going to do comparisons, that the reference


      standard, the best reference standard we have, is


      naproxen because we know as much about it as


      anything else.  We think it is, at worst, neutral


      and maybe a little better than neutral.


                So I would argue that, if you want to do


      ALLHAT light, then what you do is you test every


      agent both that stay on the market and that are


      proposed to bring onto the market against naproxen


      with an adequately sized trial and you set an upper


      bound, which we have to talk about, about what the


      upper bound of hazard you are willing to accept is,


      and the test that you run is on efficacy and on


      cardiovascular hazard.


                If your drug is beaten by naproxen, you


      don't make it.  If you can show equivalence within


      a reasonable upper bound of naproxen, then we would


      be pretty comfortable--I think I would be pretty




      comfortable that the drug is not going to create a




                What do you think about that strategy?


                DR. TEMPLE:  That is actually--I went


      through it very fast, but that is actually what I


      said at the bottom of one slide.  I still would


      like to know better whether the naproxen is less


      bad or is really good.  Therefore, as I said on the


      slide, in my heart, I would like to see somebody


      try to give full-dose aspirin for a while because


      we are really pretty sure that won't be bad.


                I think the community, in the long run,


      needs that.  Who is going to do it?  That is a


      perfectly good question.  I do want to point out,


      though, that the way some of the trials were done,


      like TARGET, they could have given answers on some


      of this, or at least closer.  But, because they did


      separate trials, instead of randomizing to each of


      the treatments, that was obscured.


                You could have had a very substantial


      naproxen-ibuprofen comparison, but you didn't get


      it because of the structure of the trials.  So I




      think it is very important to randomize to each of


      the treatments, obviously, whatever it is.  But


      that would be my best guess at the moment.  But, in


      line with what Alastair asked before, when you do


      naproxen and you are looking at G.I. effects, do


      you add a proton pump inhibitor?  I think you need


      a little more information before you do that, but


      you might say that, which then raises the


      fundamental question of how much help you get from


      being COX-2-selective.


                DR. WOOD:  Dr. Cryer?


                DR. CRYER:  I wanted to comment on several


      of the questions, Dr. Temple, that you raised as


      well to ask a question.  I guess I will just ask


      the question first.  When you say "full-dose


      aspirin," are you referring to full


      anti-inflammatory doses of aspirin, 3.9 grams a day




                DR. TEMPLE:  Which I assume most people


      will not tolerate and there will be huge bleeding.


      So you have got to do something.


                DR. CRYER:  Right.  See, I think that is a




      non-practical experiment design and I think we have


      come a long way from 3.9 grams of aspirin per day,


      particularly because of the concerns of the adverse


      events, the silicysm, the G.I. events.  Clearly,


      100 percent of those people are going to have


      gastric ulcerations assessed endoscopically.


                So I also would prefer one of the newer


      NSAIDs, traditional NSAIDs, in that comparison.


                With regard to--


                DR. TEMPLE:  Actually, before you leave


      that, do you know what would happen if you added a


      proton pump inhibitor to aspirin?


                DR. CRYER:  Not at 3.9 grams a day.  I


      don't think anybody thought that would be a


      feasible design.


                DR. TEMPLE:  Short term, then, just to


      look at endoscopic ulcers.


                DR. CRYER:  I don't know and I don't think


      that it will ever be known.


                DR. TEMPLE:  Then I won't get the answer.


                DR. CRYER:  What I do know is that, if you


      give 3.9 grams of aspirin per day in the




      short-term, greater than 90 percent of your


      patients who take aspirin will have endoscopic


      ulceration.  I don't know what the effect of the


      PPI would be.


                I wanted to address your last kind of


      question that you threw out there of whether or not


      a short-term study would show that celecoxib plus


      80 milligrams of aspirin would have a favorable


      effect, a G.I. effect, compared to a non-selective


      NSAID.  Those experiments have been done.


                With respect to endoscopic ulcer, COX-2


      plus aspirin equals traditional NSAID.  With regard


      to hospitalizations, having said that, there is a


      recent study not yet published, epidemiologic study


      from Canada, indicating that COX-2 plus aspirin,


      hospitalizations for that are less than


      hospitalizations for non-selective NSAIDs plus


      aspirin.  Then we have outcome studies not yet


      fully published in the abstract form which indicate


      that events on COX-2 plus aspirin are similar to


      events on non-selective NSAID plus aspirin--G.I.




                DR. TEMPLE:  It is possible that if you


      add aspirin--I mean, it is sort what I would


      expect--is that you would get something that is a




      lot closer to being--in a cardiovascular sense, a


      lot closer to being just a regular NSAID and maybe


      you would still have some residual advantage in a


      G.I. sense.


                But, I must say, the data so far don't


      show that.  But they didn't seem definitive to me.


                It raises the question of--you know, the


      idea of COX-2 selectivity is, at least, in part, a


      conceptual and promotional idea.  As Garret pointed


      out the first day, five or six of those old drugs


      that aren't coxibs are COX-2-selective.  So there


      is a whole range.  My feeling is we need to


      understand the consequences of what all that means


      and there is a somewhat artificial separation


      between the coxibs and the others because those old


      drug at least are partially selective and may have


      some of the same properties.


                So one of my hopes that we could look at a


      range of these.


                DR. CRYER:  With respect to your last


      comment, I am entirely in agreement with that.


                DR. WOOD:  Let's move on.  Dr. Cush?


                DR. CUSH:  ALLHAT, I like the intention of


      it.  I would suggest, though, that if you are going


      to have a study long enough to pick up some of




      these events, a year or two, it is going to be


      very, very hard to keep O.A. patients on one of


      those drugs.


                So maybe actually stratifying according to


      pure COX-2-specific drugs to COX-2-selective drugs


      to the non-selective drugs that are more


      predominantly COX-1 and then having a totally


      nonsteroidal, non-nonsteroidal group, which would


      be the Tylenol group you talked to or other


      analgesic agents might work over the long term.


                DR. TEMPLE:  That would answer a lot of


      the questions.  My real hope--you have a better


      idea whether it is possible than I do--is that you


      could actually find a population that could be


      given what we are pretty sure is a


      cardiovascular-neutral treatment.  That is really




      the only way to pin this down and it does seem


      worth pinning down.


                DR. WOOD:  Dr. Hennekens?


                DR. HENNEKENS:  I think I gleaned from Dr.


      O'Neill that if we determine there is a class


      effect that it varies not just by drug and dose but


      by duration of therapy.  From Dr. Temple, the


      comment that--I am very attracted to the concept of


      what I would call a large simple trial rather than


      an ALLHAT trial.  I think there is merit in seeing


      aspirin studied in therapeutic doses and I think


      there is evidence that anti-inflammatory effects


      are seen a doses far lower than the 3.9 grams.


                But the question I have for Bob is there


      are three currently marketed FDA-approved coxibs.


      So would you include valdecoxib and 25 milligrams


      of rofecoxib in your design?


                DR. TEMPLE:  Part of the reason I didn't


      address that is I figured that is what the


      committee is going to talk about.  I was willing to


      say that the celecoxib data look funny enough so


      that you might consider it.


                DR. WOOD:  That is part of what we are


      going to discuss.


                DR. TEMPLE:  That is what you are going to




      discuss so I didn't address it.


                DR. WOOD:  Let's move that to later.  Dr.




                DR. DOMANSKI:  I will pass.


                DR. WOOD:  Dr. Abramson?


                DR. ABRAMSON:  Thank you.  I want to


      probably say something rather naive in support of


      the study, Bob, and that is that we are at a moment


      where we can do a paradigm shift, meaning that


      study that you propose is an important one but it


      is very large and it is going to be very hard to


      get any resources to do that.


                I think we are at a moment where for the


      companies and the FDA and the government to think


      about a collaborative study where, if you have a


      drug that has some--this information is important,


      that we put together a collaboration among industry


      to do a multi-arm study of multiple drugs.  It is


      something, you know, in the osteoarthritis field,




      the companies have supported largely this


      osteoarthritis initiative through the NIH to look


      at outcomes in large numbers of patients.


                I think what we need is a similar COX-2


      initiative where either with the FDA or the NIH


      participating, with collaboration among industry,


      we are doing a multi-armed large study with


      biomarkers, with pharmacogenomics studies, with


      genetics and other blood pressure, but try and do


      it in a utopian way.


                I think everyone here wants to get the


      right answer, whether it is in industry or here at


      the table.  This could be a good opportunity to do


      something very differently than we have done before


      in a large trial.


                DR. TEMPLE:  I don't disagree at all.  I


      mean, some of the drugs are generic.  They don't


      have any company that is massively interested in


      them.  So it is going to be a mixture of


      government, generosity and a wide variety of other


      things that are scarce.  So I don't know how


      to--you noticed I didn't have a slide on how to do






                DR. WOOD:  Dr. Ilowite?


                DR. ILOWITE:  Just a minor point.  I


      understand the need for a cardiovascular-neutral


      anti-inflammatory drug in an ALLHAT study.  But I


      was a little confused because I am aware of some


      literature directed at people who are interested in


      Kawasaki disease suggesting that high-dose


      anti-inflammatory aspirin is actually prothrombotic


      because of differential effects on prostacycline


      and thrombotics.


                DR. TEMPLE:   There are aspirin studies


      going back to at least moderate doses that show


      beneficial effects.  It is not just 80 milligrams.


      It is certainly at least a gram a day.  Some of the


      early ones were more than that.  That is worth


      thinking about.  I am encouraged by the thought


      that you might be able to get away with doses less


      than 3 grams.  So I didn't know that it was


      considered prothrombotic.  I thought aspirin always


      looked good.  But that is not up to grams.  I don't


      think any of the studies have done anything like






                DR. WOOD:  We will give Dr. Fleming the


      last word.


                DR. FLEMING:  I am just debating whether


      to do it now or after the break.


                DR. WOOD:  Let me help you.  Go ahead.


                DR. FLEMING:  Now?


                DR. WOOD:  After the break will be great.


                DR. FLEMING:  All right.  I will wait.


                DR. WOOD:  We will take a break and then


      we will be back here in ten minutes.




                DR. WOOD:  Okay, folks.  Let's get


      started.  The next presentation will be given by


      Sharon Hertz who is Deputy Director of the




                DR. HERTZ:  Thank you.  I am just going to


      spend a very few minutes summarizing some of our--


                DR. WOOD:  Let me, in fact, just before


      Sharon begins--Sharon Hertz has passed out a


      handout that includes a lot of her slides.  In the


      interest of time, she has graciously agreed to




      delete some of these slides and just focus on a


      smaller subset of what is in the handout.


                However, the committee does have the


      handout and the committee may find that handout


      useful for referring to some of the data.


                DR. HENNEKENS:  Alastair, a quick comment.


      I want to make a quick clarification on the earlier


      comment about pro-inflammatory effects of high


      doses of aspirin.


                DR. WOOD:  Sorry; I missed that.  About




                DR. HENNEKENS:  In the randomized trials,


      135 randomized trials with over 212,000 randomized


      subjects, whether the doses of aspirin are 75


      milligrams or up to 2 grams a day, there are


      significant cardiovascular benefits to aspirin even


      at high doses.  The issue, as Bob pointed out, at


      the high doses, is not that there is a reversal of


      the benefit but that the side effects are




                So I think that is an important point to




                DR. ILOWITE:  I just wanted to say that in


      pediatrics, we think of anti-inflammatory doses as


      100 milligrams per kilogram.  So those are the




      doses I was speaking of.


                DR. GIBOFSKY:  Finally, the high-dose


      aspirin that would be necessary to treat patients


      with rheumatoid arthritis of 3.9 grams or greater


      would have significant problems on the stomach, as


      Dr. Cryer said, significant problems on the hearing


      of the patient and significant problems, perhaps,


      on other organ systems as well.  It is not a study


      that could be easily undertaken.


                DR. HENNEKENS:  I won't debate the value


      of the study of 3.9 grams of aspirin but, from the


      perspective of anti-inflammatory effects, they have


      been observed at doses of 2 grams of aspirin a day


      and, in fact, there are randomized studies going on


      directly comparing that somewhat higher doses of


      maybe 1 to 1-and-a-half grams a day might have


      significant anti-inflammatory as well as


      anti-atherogenic effects as measured by endothelial


      function, nitric oxide formation and other






                So I don't think that the traditionally


      high doses are the ones that necessarily would need


      to be done.  But I don't want to debate whether we


      should be studying doses of 4 grams of aspirin.


                DR. WOOD:  What you are telling us,


      Charlie, is that you are comfortable that there is


      an antithrombotic effect at the high doses of


      aspirin.  Is that right?  Okay.  Good.


                Dr. Cush wants to say something.


                DR. CUSH:  Again, you need not


      anti-inflammatory doses but analgesic doses which


      can be substantially lower.  I do want to make a


      statement with regard to a study that wasn't


      presented here that I think is germane and we


      should know about it, and this is quick.  There is


      a very large trial that is NIH supported that is


      called the GATE study, glucosamine in


      osteoarthritis of the knee.


                This is a 1588 study that is completed and


      is currently being analyzed.  That Data Safety


      Monitoring Board of the study has analyzed it for




      cardiovascular risk because there is a Celebrex


      arm.  There are five arms in this 1500-patient


      study; placebo, Celebrex 200 milligrams once a day,


      glucosamine only, chondroitin sulfate only, and


      glucosamine and chondroitin sulfate.


                The outcome here, in a six-month trial, is


      pain reduction in osteoarthritis in the knee.


      Because of all this press and what not, they have


      looked at the safety outcomes and they have not


      shown any increase in cardiovascular events


      including M.I., any difference between the Celebrex


      group and the other four control groups.


                DR. WOOD:  Let's move on to the program.


      Dr. Hertz?


                    Summary of Meeting Presentations


                DR. HERTZ:  There are now several versions


      of my slides around and you are free to look at


      whichever interests you.  There is one correction


      on the lumeracoxib slides from the original set


      where I substituted the word diclofenac for


      ibuprofen.  So those of you looking at those slides


      just be aware of that, please.


                What I am really just going to do now is


      just focus down again some of the reasons why we


      are here.  This would not be the current slide set.




      Any help here?


                Looking at the most recent set that were


      handed out, and we will just work from there


      because there is not a lot of data anymore to


      present, but, basically, I want to just point out


      that we are here because we do recognize that pain


      drugs are critically important, that the


      COX-2-selective NSAIDs have been extensively


      studied and there are, over time, studies that


      revealed new potential uses as well as new risks.


                We need to determine how we feel about


      these risks.  Are they limited to individual


      products?  Are they applicable across the group of


      COX-2 selectives and how far does this extend to


      the nonselective anti-inflammatories.


                There is a slide that describes--


                DR. WOOD:  Sharon, apparently everybody


      has hard copies of your slides.


                DR. HERTZ:  Right.


                DR. WOOD:  So if you want to just go


      through them and refer to the slide number, that


      would probably be helpful to people.


                DR. HERTZ:  Okay.  If we go to the third


      slide, you can get a sense of the sizes of the


      databases that were presented in the individual




      reviewer descriptions of FDA reviews.


                A couple of points.  The numbers there


      reflect predominantly patients on the drug of


      interest as opposed to the entire database.  The


      outcome studies are more reflective of the entire


      populations including comparators.  These drugs


      were assessed and have been assessed over time in


      fairly large numbers of patients.


                I think it is useful to note that we have


      not approved, in this country, all of the


      COX-2-selective NSAIDs that have come to us in


      applications for a variety of reasons.  Some of


      these may be related to cardiovascular-risk


      assessment.  Some may be related to


      non-cardiovascular-risk assessment which we really


      haven't gotten into in this setting.


                In addition, you may also note that


      parecoxib has not yet been approved in this country


      although it has been approved elsewhere.  So I


      think that we have a lot of issues to consider with


      these products.


                When we reviewed the studies that have


      been presented, we see that there is some increased


      risk for cardiovascular events but one of the key


      issues here is that the results are not consistent




      across studies and across situations.  We also have


      seen that there is risk that is being associated


      with some of the nonselective products.


                So we have a story of conflicting data.  I


      am up the Slide 5.  We have data that has been


      present across short- and long-term studies, the


      epidemiologic studies.  The challenge is to compare


      across populations, across comparators.  It is


      striking that sometimes very similar study designs


      have very different results.


                It is possible there is more than one


      mechanism.  Again, the data has been inconsistent


      with the NSAIDs.  We also have conflicting




      information coming back on what occurs in the


      context of concurrent aspirin use.  It is really


      unclear if aspirin use has a truly meaningful


      effect on whether there is any G.I. benefit of the


      COX-2-selective products.  That has not been clear




                I have been asked to point out that, in


      addition, time to onset of risk is something that


      we need to consider very importantly, too, which,


      again, is something that is evident when we look at


      the study data and important in our deliberations


      for this.


                So, in spite of this conflicting data and


      the many questions, we have to move forward.  We


      have to determine what the role of approved


      products are on the market today, what additional


      studies are necessary, what studies would be most




                I am going to summarize and combine some


      of the questions that we have posed.  These are


      questions we dearly would like input from the


      committee.  To start, if we think about the first




      three questions, does the available data support a


      conclusion that celecoxib, rofecoxib and valdecoxib


      significantly increase the risk of cardiovascular


      events.  Does the overall risk-versus-benefit


      profile for each of these support marketing in the


      U.S.  If yes, in whom?  And which of the potential


      benefits of celecoxib or the others outweigh the


      potential risks and what actions would you


      recommend that we consider implementing to ensure


      safe use?


                I think it is also important to understand


      that some of these answers are going to depend on


      if we think that this is a fairly uniform class


      effect and, if not, we are going to have weigh the


      amount of information available for each of the


      products.  It is not the same.  We don't have the


      longer outcome studies, for instance, with


      valdecoxib at this point.


                Question 4 asks if the available data


      support a conclusion that one or more of the


      COX-2-selective agents increase the risk of


      cardiovascular events and what is the role of




      concomitant aspirin in attempting to mitigate that


      risk.  What additional clinical trials or


      observational studies, if any, would you recommend


      as essential for us to further evaluate celecoxib,


      rofecoxib and valdecoxib?


                What about to further evaluate the


      potential G.I. benefits for these same products?


      Would you recommend that the labeling for these


      products include information regarding the absence


      of long-term controlled clinical-trial data


      assessing potential cardiovascular effects and if


      you have a recommendation for how that should be


      conveyed in terms of warnings, boxes and such.


                What additional trials would be essential


      to evaluate the nonselective nonsteroidal


      anti-inflammatory drugs particularly with respect


      to cardiovascular risk?  Similarly, what will now


      become essential for products under development


      prior to approval to help gain approval?


                We have to determine what studies would be


      necessary to evaluate the cardiovascular risk of


      these products and how much information do we need




      to know about the gastrointestinal risk?  If


      preapproval studies recommended as essential do not


      demonstrate an increased risk for a cardiovascular


      event, how would you propose the FDA handle that


      information in the labeling?  Would the absence of


      a cardiovascular-risk signal preclude the need for


      any warnings or precautions in the labeling of a


      new product or should we rely more on a class


      warning or precaution in the absence of a signal of


      increased risk in the preapproval databases?


                If you think a class warning is


      appropriate, please advise with particular


      attention to whether you recommend it apply to all


      NSAIDs or only COX-2-selective NSAIDs.


                So I want to thank everybody here for


      their time and their commitment to helping us


      through this extremely challenging program and we


      really look forward to hearing your deliberations


      and your recommendations.


                Thank you.


                DR. WOOD:  Thank you very much.


                The companies have also asked for two




      minutes to respond.  We all heard the rules


      yesterday so it is two minutes.  Microphone gets


      turned off two minutes later and just keep moving.


                           Sponsor Responses


                DR. HARRIGAN:  Could I have Slide No. 1.


      This is Harrigan from Pfizer.  What I would like to


      do is first to summarize what we know about


      celecoxib and what we think that tells us about the


      benefit:risk equation for that drug.


                I make the point in this slide about


      Celebrex being extensively studies and to remind


      the committee of the contrast of the very widely


      used nonspecific NSAIDs.  On the next point, we see


      that efficacy has been demonstrated in arthritis


      pain and familial adenomatous polyposis.  Our


      prescription data and observational study data tell


      us that approximately three-quarters of patients


      who are taking celecoxib are receiving daily doses


      of 200 milligrams or less.


                Celebrex does have a favorable G.I. safety


      profile, a point emphasized by the very relevant


      G.I. safety findings that we heard about this




      morning from ADAPT compared to over-the-counter


      doses of naproxen.


                Cardiovascular risk was not detected in


      the setting of treating arthritis patients


      understanding all the caveats about that data that


      we have heard over the past two days.  In APC, an


      increase in cardiovascular risk was reported


      apparently in a dose-related pattern.  In contrast,


      two additional long-term placebo-controlled trials


      did not find evidence of increased cardiovascular


      risk at daily doses of 400 milligrams.


                The comment about the ADAPT findings is


      supported by the initial announcements from


      National Institute of Aging.  We await that data


      with great interest, particularly given the size,


      the duration in the elderly population study which


      would lead us to believe, expect, that the number


      of events in that trial will exceed the number of


      events in either or both of the other two trials




                The final ADAPT data and the polyp


      efficacy data will make significant contributions