1

 

                DEPARTMENT OF HEALTH AND HUMAN SERVICES

 

                      FOOD AND DRUG ADMINISTRATION

 

                CENTER FOR DRUG EVALUATION AND RESEARCH

 

 

 

 

 

 

 

 

 

 

 

 

                            JOINT MEETING OF

 

                  THE ARTHRITIS ADVISORY COMMITTEE AND

 

                  THE DRUG SAFETY AND RISK MANAGEMENT

 

                           ADVISORY COMMITTEE

 

 

                               VOLUME III

 

 

 

 

 

 

 

 

 

 

                       Friday, February 18, 2005

 

                               8:07 a.m.

 

 

 

 

 

 

 

 

 

                          Hilton Gaithersburg

                           620 Perry Parkway

                         Gaithersburg, Maryland

                                                                 2

 

                        P A R T I C I P A N T S

 

      Alastair J. Wood, M.D., Chair

 

      Arthritis Advisory Committee:

 

      Allan Gibofsky, M.D., J.D.

      Joan M. Bathon, M.D.

      Dennis W. Boulware, M.D.

      John J. Cush, M.D.

      Gary Stuart Hoffman, M.D.

      Norman T. Ilowite, M.D.

      Susan M. Manzi, M.D., M.P.H.

 

      Drug Safety and Risk Management Advisory Committee:

 

      Peter A. Gross, M.D.

      Stephanie Y. Crawford, Ph.D., M.P.H.

      Ruth S. Day, Ph.D.

      Curt D. Furberg, M.D., Ph.D.

      Jacqueline S. Gardner, Ph.D., M.P.H.

      Eric S. Holmboe, M.D.

      Arthur A. Levin, M.P.H., Consumer Representative

      Louis A. Morris, Ph.D.

      Richard Platt, M.D., M.Sc.

      Robyn S. Shapiro, J.D.

      Annette Stemhagen, Dr.PH. Industry Representative

 

      FDA Consultants:

 

      Steven Abramson, M.D.

      Ralph B. D'Agostino, Ph.D.

      Robert H. Dworkin, Ph.D.

      John T. Farrar, M.D.

      Leona M. Malone, L.C.S.W., Patient Representative

      Thomas Fleming, Ph.D.

      Charles H. Hennekens, M.D.

      Steven Nissen, M.D.

      Emil Paganini, M.D., FACP, FRCP

      Steven L. Shafer, M.D.

 

      National Institutes of Health Participants

      (Voting):

 

      Richard O. Cannon, III, M.D.

      Michael J. Domanski, M.D.

      Lawrence Friedman, M.D.

                                                                 3

 

                  P A R T I C I P A N T S (Continued)

 

      Guest Speakers (Non-Voting):

 

      Garret A. FitzGerald, M.D.

      Ernest Hawk, M.D., M.P.H.

      Bernard Levin, M.D.

      FDA Participants:

 

      Jonca Bull, M.D.

      David Graham, M.D., M.P.H.

      Brian Harvey, M.D.

      John Jenkins, M.D., F.C.C.P.

      Sandy Kweder, M.D.

      Robert O'Neill, Ph.D.

      Joel Schiffenbauer, M.D.

      Paul Seligman, M.D.

      Robert Temple, M.D.

      Anne Trontell, M.D., M.P.H.

      Lourdes Villalba, M.D.

      James Witter, M.D., Ph.D.

      Steve Galson, M.D.

      Kimberly Littleton Topper, M.S., Executive

      Secretary

                                                                 4

 

                            C O N T E N T S

 

      Call to Order:

                Alastair J. Wood, M.D.                           5

 

      Conflict of Interest Statement:

                Kimberly Littleton Topper, M.S.                  5

 

      Naproxen  Investigator Presentation

           Alzheimer Prevention Study: ADAPT

           (Alzheimer's Disease Anti-Inflammatory

           Prevention Trial):

                Constantine Lyketsos, M.D.                      14

 

      Additional Background Presentations

           Interpretation of Observed Differences

           in the Frequency of Events When the

           Number of Events is Small:

                Milton Packer, M.D.                             42

 

      Clinical Trial Design and Patient Safety:

      Future Directions for COX-2 Selective NSAIDS

                Robert Temple, M.D.                             95

 

      Issues in Projecting Increased Risk of

      Cardiovascular Events to the Exposed  Population

                Robert O'Neill, Ph.D.                          109

 

      Summary of Meeting Presentations:

                Sharon Hertz, M.D.                             132

 

      Sponsor Responses                                        140

 

      Advisory Committee Discussion of Questions               147

 

                Question 1:                                    165

                Question 2:                                    284

                Question 3:                                    320

                Question 4:                                    356

                Question 5:                                    367

                Question 6:                                    391

                Question 8:                                    418

                Question 7:                                    432

 

      Meeting Wrap-up                                          438

 

                                                                 5

 

                         P R O C E E D I N G S

 

                             Call to Order

 

                DR. WOOD:  Let's get started.  This is our

 

      third day and thanks to everybody for coming back.

 

      We have obviously entertained you sufficiently.

 

                Kimberly has a statement to read.

 

                     Conflict of Interest Statement

 

                MS. TOPPER:  The following announcement

 

      addresses the issue of conflict of interest with

 

      respect to this meeting and is made a part of the

 

      record to preclude even the appearance of such.

 

      Based on the agenda, it has been determined that

 

      the topics of today's meeting are issues of broad

 

      applicability and there are no products being

 

      approved.

 

                Unlike issues before a committee in which

 

      a particular product is discussed, issues of

 

      broader applicability involve many industry

 

      sponsors in academic institutions.  All special

 

      government employees have been screened for their

 

      financial interests as they may apply to the

 

      general topics at hand.

 

                To determine if an conflict of interest

 

      existed, the agency has reviewed the agenda and all

 

      relevant financial interests reported by the

 

                                                                 6

 

      meeting participants.  The Food and Drug

 

      Administration has granted general-matter waivers

 

      to the special government employees participating

 

      in this meeting who require a waiver under Title

 

      18, United States Code, Section 208.  A copy of the

 

      waiver statements may be obtained by submitting a

 

      written request to the agency's Freedom of

 

      Information Office, Room 12A-30, of the Parklawn

 

      Building.

 

                Because general topics impact so many

 

      entities, it is not practical to recite all

 

      potential conflicts of interest as they apply to

 

      each member, consultant and guest speaker.  FDA

 

      acknowledges that there may be some potential

 

      conflicts of interest but, because of the general

 

      nature of the discussions before the committee,

 

      these potential conflicts are mitigated.

 

                With respect to the FDA's invited industry

 

      representatives, we would like to disclose that Dr.

 

                                                                 7

 

      Annette Stemhagen is participating in this meeting

 

      as a non-voting industry representative on behalf

 

      of regulated industry.  Dr. Stemhagen's role on

 

      this committee is to represent industry interests

 

      in general and not any one particular company.  Dr.

 

      Stemhagen is Vice President of Strategic Develop

 

      Services for Covance Periapproval Services, Inc.

 

                In the event that the discussions involve

 

      any other products or firms not already on the

 

      agenda for which an FDA participant has a financial

 

      interest, the participants' involvement and their

 

      exclusion will be noted for the record.

 

                With respect to all other participants, we

 

      ask, in the interest of fairness, that they address

 

      any current or previous financial involvement with

 

      any firm whose product  they may wish to comment

 

      upon.

 

                There is one administrative announcement.

 

      Would you please make sure that you take your phone

 

      calls outside.  It is messing up with our audio and

 

      we would really appreciate it.  Thank you.

 

                DR. WOOD:  The other administrative thing

 

                                                                 8

 

      that the sound person has asked me to say is, to

 

      the committee, try and remember to switch off your

 

      microphones when you are not using them.

 

      Apparently, it messes it up.

 

                MR. LEVIN:  Mr. Chairman?

 

                DR. WOOD:  Yes, Arthur?

 

                MR. LEVIN:  I wanted to express a concern

 

      I have in terms of the agenda for today's meeting.

 

      For those of us who have been at advisory committee

 

      meetings before, we know that there is often a

 

      tendency to sort of squeeze the most important part

 

      of these advisory committee meetings which is the

 

      discussion and answers to the questions and giving

 

      directions to FDA.

 

                My concern is that, given the lengthy

 

      discussions we have had over the past two days and,

 

      given the fact that this is last day, that we will

 

      not have enough time to fully explore all of the

 

      questions that have been raised over the last two

 

      days and to give some definite direction to the FDA

 

      as to how to pursue these issues.

 

                So I would like to suggest to the group

 

                                                                 9

 

      that we might shorten the presentations, or

 

      eliminate them entirely, in order to have adequate

 

      time to fully discuss all of our concerns and

 

      different points of view around the table.  I think

 

      it would be really unacceptable to leave here today

 

      unable, because of a time constraint, to give

 

      direction to the FDA on this issue.

 

                DR. WOOD:  Did you have any particular

 

      people you wanted to eliminate?  Or do you want to

 

      pass me a note, privately?

 

                MR. LEVIN:  It may be something the

 

      committee as a whole should decide.

 

                DR. WOOD:  Let me make a suggestion.  I

 

      think that is a reasonable approach.  I am sure the

 

      committee will want to hear the data from the ADAPT

 

      study and we should hear that in its totality.

 

      Milt Packer has come a long way so we should hear

 

      from him, I think.  Milt is always entertaining,

 

      anyway.

 

                Do we really need to hear from the two

 

      Bobs?

 

                DR. TEMPLE:  I don't have any ego involved

 

                                                                10

 

      in this.  A fair amount of--some of what I am

 

      talking about is about the adverse consequences of

 

      blood-pressure elevation which I think I could

 

      skip.  So I could shorten it considerably.  But you

 

      guys decide.  It is there for you to read if you

 

      want.

 

                DR. WOOD:  Why don't you do this.  Why

 

      don't you distribute your talk to us.

 

                DR. TEMPLE:  I think it has been.

 

                DR. WOOD:  Right; I understand that.  I

 

      will take that as a given.  And both of you make

 

      whatever remarks you would like to make from your

 

      seats there at the times that you are allotted, but

 

      brief and pointed.  And let's not revisit all the

 

      things we have visited before.

 

                DR. TEMPLE:  That's fine.

 

                DR. WOOD:  Does that sound fair?  Dr.

 

      O'Neill?

 

                DR. O'NEILL:  Yes; that is fine.

 

                DR. WOOD:  That will save us some time.

 

      So that is a good thought.  In addition, we have

 

      got Sharon Hertz's talk which, I notice, has

 

                                                                11

 

      40-something slides here--45 slides--which is a lot

 

      to get through in a few minutes.  So I think, while

 

      we are sort of working up to that, she may want to

 

      look at that and decide what she really needs to

 

      say.  I mean, after all, it is very unusual for the

 

      FDA to summarize the meeting for the committee,

 

      which is partly what the committee is here to do, I

 

      guess.

 

                So let's make sure that she can finish

 

      that taking  the time she has been allotted for it

 

      which is 30 minutes.  She would be better to remove

 

      some slides rather than rush through it, I think.

 

                Having said all that, let's get to the

 

      first presentation.  Does anyone else have any

 

      thoughts on that?  Yes, Annette?

 

                DR. STEMHAGEN:  I would like to ask

 

      whether the manufacturers could have just one or

 

      two minutes to make some summary comments before we

 

      start our deliberations after lunch.

 

                DR. WOOD:  Do they want to do that now?

 

      Is that what you are asking?

 

                DR. STEMHAGEN:  No; I think after these

 

                                                                12

 

      presentations.

 

                DR. WOOD:  Okay.

 

                DR. STEMHAGEN:  Thank you.  I appreciate

 

      it.

 

                DR. WOOD:  Let's have some discussion

 

      amongst the committee.

 

                DR. CUSH:  What would be the purpose of

 

      their having--they have had lots of time already to

 

      present their data and had lots of mike time in the

 

      back already.

 

                DR. STEMHAGEN:  Just in terms of the

 

      deliberations that have gone on, there might be

 

      some clarifying comments.

 

                DR. CUSH:  I think, if we have questions,

 

      we can ask for clarifying comments.  I think that

 

      is what we--I would suggest--and I agree with

 

      Arthur Levin in that we should get on to discussion

 

      as quickly as possible.

 

                DR. STEMHAGEN:  I realize this is sort of

 

      in contrast to try to shorten it.  But I would like

 

      to ask that that time be awarded.

 

                DR. WOOD:  Any other thoughts on that? 

 

                                                                13

 

      Let me get a sense of the committee.  What is the

 

      committee's pleasure about that?  Yes?

 

                DR. BOULWARE:  I actually support that

 

      recommendation, too, and would suggest you give

 

      them a limited time, like you did with the public

 

      comment where you will cut them off at two minutes,

 

      so we know it will be limited.  I would be

 

      interested in the direction they plan to take.  We

 

      heard some startling news yesterday about the

 

      possible remarketing of a product that they have

 

      withdrawn.

 

                DR. WOOD:  Does anyone object to them

 

      getting two minutes apart from Dr. Cush?  Then, I

 

      think, the answer on that is that that is fine.

 

      Remind them that, in contrast to most of their

 

      experiences in the past for senior managers, the

 

      microphone will be cut off.

 

                DR. STEMHAGEN:  Thank you very much.  I

 

      think we saw evidence of that yesterday.

 

                DR. WOOD:  Right.  So they got the

 

      message; right?  Okay.  Let's move along to the

 

      first speaker, Dr. Lyketsos.

 

                       Investigator Presentation

 

                  Alzheimer's Prevention Study: ADAPT

 

                DR. LYKETSOS:  Good morning, everyone.  I

 

                                                                14

 

      do not have slides.  My name is Constantine

 

      Lyketsos.  I am a professor at Hopkins and I am

 

      presenting here today on behalf of the ADAPT study,

 

      Alzheimer's Disease Anti-inflammatory Prevention

 

      Trial.  I would like to thank the committee for

 

      inviting us to present.  I am here today with my

 

      colleague, Steve Piantadosi, who is also on the

 

      steering committee and will be available to answer

 

      any questions that might come up later on as well.

 

                I have a prepared statement that will be

 

      distributed to the committee later on today.  I

 

      delivered it to the staff this morning as I was

 

      arriving.

 

                Before I get into the statement, I just

 

      wanted to take a few moments to remind us of the

 

      public-health importance of Alzheimer's disease to

 

      somewhat set the context about how the ADAPT trial

 

      has started specifically.  Alzheimer's, as we all

 

      know, is a major public-health problem.  It is a

 

                                                                15

 

      devastating disease, typically runs a ten-year

 

      course of neurodegeneration affecting probably

 

      close to 4 or 4-and-a-half million of our citizens

 

      at present and the number is expected to rise given

 

      the aging of the population of the next several

 

      decades to approach, perhaps, 12 to 15 million,

 

      based on current projections.

 

                Because of the these public-health

 

      numbers, there has been a very significant effort

 

      in our field for the last several years to develop

 

      preventive strategies for Alzheimer's disease

 

      because, once neuronal degeneration has started,

 

      the evidence that treatments work, so far, is very

 

      weak.

 

                These preventive strategies have centered

 

      on several possible treatments but the most

 

      supported by the observational literature have been

 

      nonsteroidals with over 24 studies right now

 

      including four prospective population studies

 

      suggesting substantial reductions of risk of

 

      Alzheimer's disease perhaps with risk ratios, in

 

      some cases, as much as 0.4 or 0.5.  So it is within

 

                                                                16

 

      that context that ADAPT was started with the

 

      support of the National Institute of Aging.

 

                I will move now to reading the prepared

 

      statement.

 

                The steering committee of the ADAPT study

 

      welcomes the opportunity to present the rationale

 

      for its decision, on December 17, 2004, to suspend

 

      the NSAID treatments in ADAPT.  This presentation

 

      is important because there is much public

 

      misunderstanding about our decisions and their

 

      rationale.

 

                The ADAPT Steering Committee is deeply

 

      committed to the safety of human subjects, even

 

      more so in the context of prevention trials where

 

      risks are typically not balanced by any promise of

 

      tangible near-term benefit.  In this notable way,

 

      prevention trials differ from treatment trials

 

      whose participants may hope for relief of symptoms

 

      or improved outcomes in a condition already

 

      diagnosed.

 

                The risk:benefit balance in prevention

 

      trials is even further removed from a comparison of

 

                                                                17

 

      the benefits of a proven treatment with its

 

      acknowledged risks.  Because ADAPT has not quite

 

      completed the process of auditing and tabulating

 

      the trial's cardiovascular safety on the date of

 

      suspension, we cannot, today, present the trial

 

      safety results at the time of the decision to

 

      suspend.

 

                We defer that presentation to a

 

      peer-reviewed publication planned for the near

 

      future.  For today, we note that, even with the

 

      risk:benefit calculus of a prevention trial, these

 

      data would not, in themselves, have led to our

 

      decision to suspend either treatment.  In reality,

 

      those decisions were made in very unusual

 

      circumstances.  They reflected events external to

 

      ADAPT that raised strong concerns about the

 

      practicalities of continuing the treatments.

 

                As the advisory committee probably knows,

 

      ADAPT is a randomized, double-masked, multicenter

 

      trial of celecoxib, 200 milligrams twice daily, or

 

      naproxen sodium 220 milligrams twice daily versus

 

      placebo for the primary prevention of Alzheimer's

 

                                                                18

 

      dementia and for the prevention of age-related

 

      cognitive decline which is, in many instances, a

 

      prodrome of Alzheimer's disease.

 

                ADAPT also provides an opportunity to

 

      study the long-term safety of its treatments in a

 

      healthy elderly population.  Eligibility criteria

 

      include an age of 70 years or older at enrollment

 

      and a health history that excludes many of the

 

      known risk factor for adverse events with NSAID

 

      treatments; for example, we exclude those with

 

      preexisting uncontrolled hypertension, anemia or a

 

      history of gastrointestinal bleeding, perforation

 

      or obstruction.

 

                To provide independent recommendations

 

      regarding continuation of the trial, the ADAPT

 

      Treatment Effects Monitoring Committee, or TEMC,

 

      which, I suppose, is our term for a DSMB, meets

 

      twice a year.  In response to emerging concerns

 

      about cardiovascular risks with NSAIDs, membership

 

      of the TEMC was recently expanded to include Dr.

 

      Bruce Psaty, a physician with expertise in

 

      evaluation of cardiovascular risks in clinical

 

                                                                19

 

      trials.

 

                As an additional safeguard for participant

 

      safety, the ADAPT study officers and consultants

 

      also conduct reviews of safety data at intervals

 

      between TEMC meetings.  Amid the emerging

 

      controversy about the cardiovascular safety of

 

      selective COX-2 inhibitors, the ADAPT study officer

 

      had been relatively reassured by their periodic

 

      reviews of the celecoxib safety data. The study

 

      chair communicated this information in a telephone

 

      conversation on 15 October 2004 with Dr. Sharon

 

      Hertz at FDA.

 

                As of December 17, 2004, the data of

 

      suspension of treatments and enrollment in ADAPT,

 

      we had enrolled 2,528 participants.  Of these,

 

      2,463 had been randomized before October 1 of '04

 

      with some 20 months average duration of

 

      observation.  These participants contributed a

 

      total of 3,888 person years of follow up to

 

      analyses that were presented to the TEMC on

 

      December 10, 2004.

 

                Those analyses suggested a weak signal

 

                                                                20

 

      suggesting increased risks of cardiovascular and

 

      cerebrovascular events with naproxen.  Reviewing

 

      the data, however, we understood well the TEMC's

 

      evident conclusion that this signal was not

 

      sufficiently compelling or definitive to warrant a

 

      recommendation to suspend the treatment or to

 

      otherwise alter the protocol.  This was on December

 

      10, 2004.

 

                Thus, the study officers were surprised on

 

      December 17 by announcements that two trials of

 

      celecoxib for the prevention of recurrent

 

      adenomatous colon polyps had been suspended citing

 

      increased cardiovascular risks with treatment in

 

      one of these studies, the Adenoma Prevention with

 

      Celecoxib trial, or APC.  This news led to

 

      extensive discussion among the steering committee

 

      on that day centering on the following

 

      considerations.

 

                Number one; one arm of the APC trial had

 

      used the same celecoxib dosing as ADAPT, 200

 

      milligrams twice daily, but over a longer period of

 

      time.  News reports cited a relative risk of 2.5

 

                                                                21

 

      for cardiac events in this arm of APC.  Although

 

      this risk was reported as only "marginally

 

      significant," a greater cardiac-risk signal was

 

      reported with the higher APC dosage of 400

 

      milligrams twice daily.

 

                Thus, we took seriously the possibility of

 

      harm over time to ADAPT participants receiving

 

      celecoxib.  Especially in a prevention trial with

 

      no strong prospects of immediate benefit, we had

 

      strong misgivings about continuing celecoxib

 

      treatments.

 

                Knowing almost nothing at the time about

 

      the particulars of the APC trial and, in light of

 

      the apparent lack of risk with celecoxib in the

 

      other prevention trial, we might have discounted

 

      the APC data and continued celecoxib.  To do so,

 

      however, we would clearly have needed the

 

      concurrence of the seven IRBs that oversee ADAPT.

 

      These IRBs began almost immediately to question us

 

      about implications of the APC results and seemed

 

      likely to question a decision to continue.

 

                Even if we had persuaded them to permit

 

                                                                22

 

      continuation of celecoxib using a revised consent

 

      process, we would surely be involved in lengthy

 

      discussions with these IRBs.  In the meantime, we

 

      would be unable to offer much explanation to our

 

      participants, thereby endangering the relationship

 

      of trust that is vital to the success of long-term

 

      trials.

 

                Number three; as is common in long-term

 

      trials, ADAPT was experiencing some difficulty with

 

      adherence to treatments.  This difficulty grew

 

      following the withdrawal of rofecoxib and we

 

      expected the announcement of the APC results to

 

      exaggerate the problem further with scores of

 

      participants stopping treatment, in effect, "voting

 

      with their feet."  This would erode statistical

 

      power and increase the potential for bias in ADAPT.

 

                Thus, even though the ADAPT safety data

 

      did not, themselves, warrant suspension of

 

      celecoxib treatments.  There seemed little

 

      practical choice but to do so.

 

                We next confronted the dilemma of what to

 

      do about naproxen and its placebo.  As suggested

 

                                                                23

 

      above, we regarded the accumulated naproxen safety

 

      data as being somewhat more concerning than the

 

      celecoxib safety data.  Yet, they, also, were not

 

      compelling.  Although some post hoc data composites

 

      barely reached statistical significance--these are

 

      post hoc data composites barely reached statistical

 

      significance for naproxen versus placebo, no

 

      singular vascular event was clearly more frequent

 

      with naproxen versus placebo.

 

                Furthermore, vascular risks were not

 

      expected with naproxen treatment.  In fact, a

 

      substantial body of prior data at the time had

 

      suggested that naproxen offers some cardiovascular

 

      protection.  This lack of prior expectation cast

 

      further doubt on the meaning of the naproxen data

 

      in ADAPT which were vulnerable, in any case, to the

 

      problem of multiple comparisons.

 

                We could, therefore, have attempted to

 

      have revised ADAPT to a two-armed trial of naproxen

 

      versus placebo, instructing our participant to stop

 

      taking their "white  pills," as they are known in

 

      the study, which are celecoxib and its placebo, but

 

                                                                24

 

      continue to take their "blue pills," which contain

 

      naproxen and its placebo.

 

                However the dangers were several.

 

      Participants might end up getting confused and

 

      taking the wrong pills and many would stop taking

 

      their treatments altogether.  We faced an ethical

 

      dilemma.  The suspension of celecoxib and

 

      continuation of naproxen would have created the

 

      impression among participants and among the general

 

      public that celecoxib was risky but naproxen was

 

      "safe."  At least based on the signals from the

 

      ADAPT data, this impression would have been

 

      misleading.

 

                What would we then tell participants about

 

      the risks with naproxen as we led through the

 

      inevitable process of revised consent necessitated

 

      by the protocol revision.  Would the multiplicity

 

      of IRBs even allow us to follow this course?

 

                Finally, there was another risk to

 

      consider.  We began ADAPT expecting to see some

 

      increase with naproxen in gastrointestinal bleeding

 

      and other events.  Even though we attempted to

 

                                                                25

 

      reduce these excess G.I. risks by excluding

 

      participants with prominent risk factors other than

 

      age, the ADAPT data showed a notable increase in

 

      G.I. bleeding with naproxen versus placebo.

 

                Especially amid concerns that ADAPT was

 

      exposing its participants to potential risks that

 

      were immediate, while the trial's hoped-for

 

      benefits lay in the future, the totality of the

 

      above arguments lead the steering committee to

 

      suspend both treatments and to also suspend

 

      enrollment into ADAPT.

 

                As noted above, we expect, within a few

 

      weeks, to submit a scientific paper for peer review

 

      and publication.  The paper's focus will be on the

 

      process and rationale underlying the decision to

 

      suspend treatments and enrollment in ADAPT.

 

      Because these decisions did rely, in some measure,

 

      on the ADAPT safety data as of 10 December, the

 

      paper will, also, disclose some of these data.

 

                We are also cooperating with ongoing

 

      efforts at the NIH to investigate the

 

      cardiovascular and cerebrovascular risks of NSAIDs.

 

                                                                26

 

      In addition, the NIA and the ADAPT Steering

 

      Committee are committed to a further two years of

 

      additional safety monitoring of our participants.

 

                In preparation for a later, more

 

      definitive discussion of the ADAPT safety data, we

 

      plan to revisit a number of the adverse events to

 

      collect additional information and then to submit

 

      all information available now or later to a process

 

      of expert adjudication.  Depending on particulars,

 

      the latter process will take months.  In the nearer

 

      term, we concur with the expert opinion that,

 

      having taken these widely publicized decisions, the

 

      steering committee must fulfill its obligation to

 

      disclose its reasons for doing so based upon the

 

      data available.

 

                At the same time, we are intent that our

 

      public presentation even of the current "working"

 

      data must be at the highest attainable standards of

 

      accuracy.

 

                Thank you.

 

                DR. WOOD:  Thank you very much.  Are there

 

      questions directed to the speaker?  Dr. Nissen?

 

                DR. NISSEN:  I fully understand your

 

      rationale and I understand that the trial was

 

      fundamentally stopped because of an issue of

 

                                                                27

 

      futility.  You didn't think that you could keep

 

      people in the celecoxib arm.  That is all well and

 

      good.  The problem that occurred here is that a

 

      warning was issued on naproxen which had the effect

 

      of being the medical equivalent of screaming "fire"

 

      in a crowded auditorium.

 

                All over the country, many of us got calls

 

      from patients saying, "I want to stop my naproxen

 

      because it causes a cardiovascular risk."  I think,

 

      just a comment here, that it would have been far

 

      better to have announced that the trial was

 

      suspended for futility rather than for hazard when

 

      there was a non-statistically significant hazard.

 

      So, one man's comment.

 

                DR. WOOD:  I agree with that.  Any other

 

      comments?  Yes?

 

                DR. FARRAR:  I wonder if you could comment

 

      on the G.I. bleed component since, obviously, one

 

      of the deliberations we have to undertake is the

 

                                                                28

 

      relative problems with G.I. bleed versus

 

      cardiovascular risk.  Certainly, that was known a

 

      priori before starting the study.

 

                As you commented very carefully, that

 

      wasn't the only consideration.  But, in a drug

 

      trial where the outcome is unknown and the risk is

 

      really fairly well known, I wondered how you

 

      thought about that in terms of putting patients at

 

      risk of something on the order of a few percentage

 

      over the course of a five-year trial who might have

 

      serious complications from the G.I. bleeding.

 

                DR. LYKETSOS:  I guess you are asking me a

 

      human-subjects question.

 

                DR. FARRAR:  I am asking how, in the

 

      design of the study, obviously the choice was made

 

      to accept that risk for the unknown potential

 

      benefit of reduction in Alzheimer's disease over

 

      the course of the same trial.  I am wondering if

 

      you have any insights into how that decision was

 

      made because, clearly, there are issues there about

 

      the use of these drugs and their risks.

 

                DR. LYKETSOS:  Well, I am glad you are

 

                                                                29

 

      asking the question.  It certainly is an issue that

 

      we have spent a lot of time discussing and which we

 

      discussed with study sections, IRBs, at quite some

 

      length and continue to discuss.

 

                I think the fundamental point that I would

 

      start with is where I started my presentation which

 

      is the devastation that Alzheimer's disease brings

 

      and the fact that all the study participants were

 

      individuals who had a first-degree relative with

 

      the disease and had, therefore, personal

 

      experience.

 

                In that context, we were very careful and

 

      very clear with them about what we thought at the

 

      time the known G.I. risks were so that, in the

 

      process of consent, and that was revealed through

 

      careful discussions in the consent process as well

 

      as the consent form, the risk of G.I. bleed was

 

      stated very clearly and that that, in some cases,

 

      might lead to death.

 

                So I think we felt that this was a

 

      decision that our participants could make, given

 

      that the risks were relatively small, and the risk

 

                                                                30

 

      that they would develop Alzheimer's disease was

 

      higher and that we felt they could make the

 

      decision for themselves if they were willing to

 

      take the risk:benefit calculus as we saw it.

 

                DR. WOOD:  Dr. Gibofsky?

 

                DR. GIBOFSKY:  I share Dr. Nissen's

 

      concern about this effect of crying fire in a

 

      crowded theater.  Many of our patients called and

 

      suggested that they were going to stop their

 

      celecoxib because of the concerns that were raised

 

      from ADAPT as well.  But you raised a very

 

      interesting concern that I confess I hadn't given

 

      enough thought to and that is the difference

 

      between a prevention trial and an outcome trial.

 

                Much of our discussion here later today, I

 

      suspect, is going to focus on what action should be

 

      taken, if any, to restrict drugs based on treatment

 

      from data on prevention trials.  I would be very

 

      curious to hear you expound on that a bit more.

 

                DR. LYKETSOS:  That is an interesting

 

      question.  Let me just, if I could, because there

 

      have been three comments now--I just would like to

 

                                                                31

 

      refer you to the early part of my statement where I

 

      said the presentation is important because there is

 

      much public misunderstanding about our decisions

 

      and their rationale.

 

                Several of you pointed out that there was

 

      a cry of fire.  I don't believe that that came from

 

      the study.

 

                DR. WOOD:  We won't ask you to speculate

 

      where it came from.  There is certainly a view on

 

      that.

 

                DR. LYKETSOS:  I am not sure where it came

 

      from.  But, to address the other issue, I must say

 

      I have not given it much thought as to whether

 

      prevention-trial safety data would generalize in

 

      the way that you are thinking about it.  So I will

 

      defer on that because I think it would need a fair

 

      bit more thought by people who are more expert in

 

      that.

 

                DR. WOOD:  Dr. Fleming.

 

                DR. FLEMING:  It is my understanding, from

 

      what you are saying, that the steering committee

 

      was particularly influenced by the APC prior data

 

                                                                32

 

      not by the internal data from ADAPT; i.e., there

 

      were, from you were describing, some emerging

 

      trends that, in my words, were in the unfavorable

 

      direction but in the context of monitoring trials,

 

      we know that one has to be extremely cautious, when

 

      you are looking at data continually over time, not

 

      to overinterpret emerging trends that can easily

 

      ebb and flow.

 

                So my understanding, from what you are

 

      saying, is it wasn't that there were, at this

 

      point, some emerging trends that happen to be in

 

      the unfavorable direction on naproxen.  Rather, it

 

      was the external data on the APC trial for Celebrex

 

      that was the driving issue behind the

 

      recommendation.

 

                DR. WOOD:  Just to develop that question,

 

      what I understood you to say was you hadn't passed

 

      some stopping boundary; is that correct?

 

                DR. LYKETSOS:  I'm sorry?  I didn't hear

 

      the first--

 

                DR. WOOD:  You hadn't violated your

 

      stopping rule, or whatever stopping rules, you had

 

                                                                33

 

      for safety.

 

                DR. LYKETSOS:  I think that our TEMC, our

 

      DSMB, had opined the week before with the same data

 

      from within the trial that they felt that we should

 

      continue.  So it was interesting how the two events

 

      were back-to-back.

 

                DR. FLEMING:  I would like to come to that

 

      second.  I am leading to that.  But first I wanted

 

      to make sure that I understood what was the nature

 

      of the concern.  Is my interpretation correct?

 

                DR. LYKETSOS:  I think so.  Back to how I

 

      put it, the issue really was one of practicalities

 

      more than our internal data, is that we felt we

 

      would have to talk to IRBs and participants and

 

      tell them something about--

 

                DR. FLEMING:  Could I first understand

 

      what your sense of the evidence was.  I want to

 

      discuss that first, versus the practicality.

 

                DR. LYKETSOS:  The sense of the study

 

      evidence.

 

                DR. FLEMING:  The sense of the evidence

 

      that was the basis for the decision in terms of

 

                                                                34

 

      adverse effects.  I have heard two things.  One is

 

      the naproxen, but that was not compelling evidence.

 

      That was within the framework of emerging results

 

      that could be by chance alone when you are

 

      monitoring data frequently.  But external APC data

 

      was very influential to you.  That is what I am

 

      hearing.  Is that correct?

 

                DR. LYKETSOS:  Well, in fact, we didn't

 

      know all the details of the APC data, as I pointed

 

      out.  I think it was that plus the climate that had

 

      been created by rofecoxib coming off the market,

 

      the influence that that had to some extent on our

 

      participants, then the widely publicized APC

 

      results and the sense that, even though the data we

 

      were seeing and that our TEMC the week before had

 

      seen, did not compel us to stop treatment based on

 

      our own data, that there was now a climate created

 

      where, practically speaking, we had to stop and

 

      take stock and get more information, et cetera.

 

                So it was that sort of the decision.  I

 

      was a complicated decision and that is why it takes

 

      a three-page statement to try and explain what went

 

                                                                35

 

      through our minds.

 

                DR. FLEMING:  There may not have been, to

 

      the steering committee at this time, access to data

 

      on PRECEPT for celecoxib or to the etoricoxib, the

 

      lumiracoxib, data on naproxen that were very

 

      favorable, but you did have access to the VIGOR

 

      data which was very reassuring for naproxen and you

 

      had evidence from the CLASS trial and some other

 

      data from Celebrex.

 

                I am perplexed that you would look at the

 

      totality of these data and say that the results

 

      were conclusive in terms of at least not being able

 

      to provide information to the IRBs and to the

 

      patients and caregivers in the trial representing

 

      the totality of the data when your data-monitoring

 

      committee had looked at the totality of the

 

      evidence for benefit to risk.

 

                On a data-monitoring committee, I have

 

      always argued, don't just show me the safety data,

 

      even if we are just looking at early assessments

 

      for safety.  It always has to be benefit to risk.

 

      Even though, as you are pointing out, this wasn't a

 

                                                                36

 

      therapeutic setting, prevention trials also provide

 

      major opportunity for benefit.  Preventing major

 

      diseases is also a very significant benefit.

 

                My understanding is your data-monitoring

 

      committee, in looking at the data, looking at the

 

      benefit as well as the risk, indicated the study

 

      should continue.  How did the steering committee

 

      judge, without access to ongoing data, that benefit

 

      to risk couldn't be sufficiently favorable and that

 

      a notification to the investigators, to the

 

      patients and to the IRBs, that the monitoring

 

      committee has carefully looked at benefit and risk

 

      and that the totality of the data is beyond the APC

 

      trial when you are looking at Celebrex and

 

      naproxen?  Why wasn't that strategy pursued?

 

                DR. LYKETSOS:  First, as I pointed out in

 

      my statement, some members of the steering

 

      committee did have access to the data that the DSMB

 

      had seen.  That is the first point.  The second

 

      point is, as you point out and as I think this

 

      whole discussion points out, is these are very

 

      difficult judgment calls.  They have to take into

 

                                                                37

 

      account evidence but also practical aspects of

 

      continuing to conduct this sort of a prevention

 

      trial in this sort of a population.

 

                I think it was the judgment call, and I

 

      can tell you, there was substantial discussion

 

      around this when we had the steering committee

 

      meeting, about these very issues.  It was the

 

      collective judgement at the time that this was the

 

      right thing to do, given the various issues that I

 

      have articulated in my statement.

 

                DR. FLEMING:  I will just pursue one more.

 

      I am dismayed to hear the steering committee, some

 

      steering committee members, had access to the data.

 

      That is also a violation of the principles of

 

      monitoring trials.  It should have been in the sole

 

      possession of the data-monitoring committee.

 

                I am also distressed because I am not

 

      hearing that monitoring committee was front and

 

      center in terms of having these issues brought back

 

      to it for reassessment.  So, to me, what I am

 

      hearing raises very significant concerns about

 

      putting at risk the integrity of studies with

 

                                                                38

 

      prejudgments using only access to partial external

 

      information.

 

                DR. WOOD:  There was one other thing,

 

      though, at least the word on the street was, and

 

      you sort of mentioned that as well, I understood

 

      there was a very large number of dropouts from the

 

      trial after the Vioxx withdrawal and others and

 

      that one of the perceptions was it was no longer

 

      possible to continue the trial.  Is that true?

 

                DR. LYKETSOS:  Let me clarify that.  The

 

      adherence had been declining on an annual basis

 

      even before rofecoxib was withdrawn from the

 

      market.  So adherence was perceived as an issue in

 

      that we felt that now there were data about one of

 

      the study drugs and that that would further erode

 

      adherence.  We did not see a huge erosion in

 

      adherence with rofecoxib, specifically, but there

 

      had already been an erosion that was concerning and

 

      we anticipated a further erosion.

 

                DR. WOOD:  Right.  But the question for

 

      this committee that Dr. Fleming is pursuing

 

      vigorously, and I agree with him, is that the

 

                                                                39

 

      announcement that you all made--the announcement,

 

      as it was picked up--maybe I should put it like

 

      that--was that this trial was being stopped for a

 

      safety signal.

 

                What I heard in your statement and what I

 

      hear from you now is that the trial was being

 

      stopped for operational problems in the trial and

 

      the safety signal was a convenient moment at which

 

      to do that.  But you had operational difficulties.

 

      That is a very different interpretation and a very

 

      different interpretation for the public and

 

      patients.

 

                Is that what you are hearing, Tom?

 

                DR. FLEMING:  It certainly appears to be.

 

      It is part of what is concerning to me.

 

                DR. LYKETSOS:  I think my statement should

 

      speak for itself.  In terms of what the data were,

 

      as I have pointed out, they will be submitted very

 

      soon so that you can judge for yourselves.

 

                DR. WOOD:  Okay.  Any other questions?

 

      Sorry; Dr. Farrar.  I beg your pardon.  Dr. Farrar,

 

      go ahead.

 

                DR. FARRAR:  I think, actually, that this

 

      study provide some vitally important information

 

      with regards to our consideration of the entire

 

                                                                40

 

      class of drugs; namely, the NSAIDs.  I would like

 

      to just read on sentence from the statement.

 

                It said, "Although some post hoc data

 

      composites barely reached statistical significance

 

      for naproxen versus placebo."  Now, clearly, this

 

      discussion would be much clearer after the

 

      presentation of the data, a careful review of the

 

      data.  But Dr. Fleming noted that, in the VIGOR

 

      study, there was some reassurance about naproxen.

 

      I would like to just question that.

 

                What is very clear in the VIGOR study is

 

      that naproxen was safer than rofecoxib.  But it

 

      does not comment at all with regards to the

 

      potential risk compared to placebo.  In fact, I was

 

      surprised when I heard the statement by Dr. Fleming

 

      because, in fact, I have assumed, based on all the

 

      data that we have, that every NSAID will not fare

 

      well against a placebo.

 

                I think that this data, and probably will

 

                                                                41

 

      be supported by the publication although I don't

 

      want to try and foresee the future, but my guess is

 

      that naproxen will not fare particularly well

 

      against placebo in terms of its cardiovascular

 

      safety.  I think we need to be able to accept the

 

      fact that all of them have some risk with regards

 

      to cerebrovascular disease and this study is likely

 

      to provide the data to support that.

 

                DR. WOOD:  Dr. Nissen?

 

                DR. NISSEN:  I don't want to belabor this

 

      because we have got a lot more to discuss today,

 

      but I think it is extremely important that, as a

 

      medical community, we learn from this episode.  In

 

      the kind of media frenzy that was going on during

 

      that period of time, this announcement, this

 

      warning that was issued on a national basis about

 

      naproxen, was inappropriate, led to some panic

 

      amongst the public and we simply can't do business

 

      this way.

 

                We can't operate in this kind of a

 

      fashion.  I would urge any of the individuals who

 

      were involved in the decision to issue a warning to

 

                                                                42

 

      go back and look at what happened and try to ensure

 

      that we don't do this sort of thing again, because

 

      once this gets picked up by the media, it passes

 

      through generations of people and becomes the topic

 

      of extensive discussion and may lead patients who

 

      don't have the ability that we have around this

 

      table to filter data--they don't understand

 

      data-safety and monitoring boards.  They don't

 

      understand stopping rules.  And it caused a panic

 

      that was unnecessary and it shouldn't have

 

      happened, and I hope it doesn't happen again.

 

                DR. WOOD:  Thanks very much.  Let's move

 

      on to next speaker, Dr. Packer.

 

                  Additional Background Presentations

 

             Interpretation of Observed Differences in the

 

                  Frequency of Events When the Number

 

                           of Events is Small

 

                DR. PACKER:  Thank you, Alastair, members

 

      of the advisory committee, FDA, ladies and

 

      gentlemen.  Today I have been invited by FDA to

 

      address a specific question which is how should be

 

      interpret differences in the observed frequency of

 

                                                                43

 

      events in a clinical trial when the number of

 

      events is small.

 

                Let me just say arbitrarily that I will

 

      define, for purposes of today, what I mean by a

 

      small number of events and that would have provided

 

      less than 70 percent power to have detected a true

 

      treatment difference assuming an effect size

 

      similar to that generally encountered in clinical

 

      research.

 

                This is just a thought.  Just suppose you

 

      do a trial for a noncardiovascular indication and

 

      you note that there are 13 major adverse

 

      cardiovascular events in the placebo group and 33

 

      such events in the drug-treatment group.  How

 

      should this difference be interpreted?

 

                Many would simply perform a statistical

 

      test, derive the p-value, and get excited if the

 

      p-value were less than some arbitrary value such as

 

      0.05.  In this example, the p-value of 0.002 would

 

      suggest, to some, that this difference between 13

 

      and 33 in a trial of about 3,000 patients, would

 

      have been observed only two times out of 1,000, an

 

                                                                44

 

      effect unlikely to have been due to the play of

 

      chance.

 

                However, before getting excited, we should

 

      remember that p-values must be interpreted in some

 

      context.  P-values are most easily interpreted when

 

      they refer to predefined primary endpoints in

 

      trials adequately powered, more than 80, 90 percent

 

      power, to detect differences between treatments.

 

      However, even under such circumstances, p-values

 

      are not necessarily reproducible.

 

                Bob O'Neill and others have made the point

 

      that, if a p-value in the trial is 0.05, the

 

      likelihood of seeing 0.05 in a second identical

 

      trial is only about 50 percent.  It is only when

 

      the p-value in the first study is 0.001 that the

 

      likelihood of seeing 0.05 or less in the second

 

      identical trial is at least 90 percent.

 

                These calculations are the basis of the

 

      frequent FDA guidance that, to demonstrate

 

      persuasive evidence for efficacy, a sponsor needs

 

      to provide two trials with 0.05 or less or one

 

      trial with a very, very small p-value.

 

                But what if the event was not the primary

 

      endpoint in the study?  What, in fact, if the event

 

      was not even precisely defined before the start of

 

                                                                45

 

      the trial?  What if the trial was not adequately

 

      powered to detect a treatment difference for the

 

      endpoint?  What does a p-value mean under these

 

      circumstances?

 

                Unfortunately, this happens quite

 

      frequently in clinical trials under a variety of

 

      circumstances.  But it is particularly true in the

 

      analysis of adverse events.  So lets make a list of

 

      things to worry about when using p-values to

 

      compare the frequency of adverse events in a

 

      clinical trial.

 

                First, there are literally hundreds of

 

      adverse events in a clinical trial and, therefore,

 

      there are hundreds of possible comparisons that can

 

      be made.  Now, this is classically referred to as

 

      the multiple comparisons problem.  For example, if

 

      a typical large-scale clinical trial yields as many

 

      of 500 individual terms describing adverse events

 

      and if a p-value were calculated for each pairwise

 

                                                                46

 

      comparison, one would, of course, by chance alone,

 

      expect about 5 percent of the terms, or about 25

 

      events, at a p-value of 0.05 or less and 1 percent

 

      of the terms are about 5 events to have a p-value

 

      of 0.01 or less.

 

                The second issue in interpreting

 

      comparison of frequency of adverse events is the

 

      fact that adverse events are spontaneous

 

      nonadjudicated reports.  Now, adverse events are

 

      reported at the discretion of the investigator and

 

      then translated into standardized terms.  There is

 

      little uniformity on how an event is identified,

 

      defined or reported and this uncertainty increases

 

      when the event is in a field remote from the

 

      investigator's focus.

 

                Now, some of you may believe that you can

 

      fix this problem by carrying out blinded

 

      adjudication of events after the fact.

 

      Unfortunately, the rules guiding post hoc

 

      adjudication are inevitably influenced by the

 

      knowledge that a treatment difference has been

 

      seen.  In fact, any bar set by a post hoc process,

 

                                                                47

 

      is capable of magnifying or diluting an effect.

 

                For example, if you set very strict

 

      criteria, a committee could reduce the number of

 

      events and, therefore, reduce statistical power.

 

      By setting very loose criteria, the committee can

 

      include many questionable events and reduce the

 

      magnitude of a treatment difference.

 

                To make things more complicated,

 

      adjudication committees do not generally examine

 

      individuals who did not report an event to make

 

      sure they didn't have an event.

 

                The third issue in interpreting

 

      comparisons of frequencies is that some signals are

 

      apparently only if adverse events are grouped

 

      together.  Now, that is not much of a problem if

 

      the difference is fairly straightforward and

 

      focuses on one single event.  But things can become

 

      a little bit more complicated if the analysis

 

      requires a combining event and combining trends

 

      across two or more events in order to reach some

 

      magical level of statistical significance.

 

                Now, the problem is that these groupings

 

                                                                48

 

      are frequently constructed after the fact, making

 

      it possible to include only events that showed the

 

      trend the investigator is interested in.  For

 

      example, if an investigator believed the drug

 

      increased the risk of a major cardiovascular event,

 

      he or she might first look at myocardial infarction

 

      and stroke, but, finding little difference here, he

 

      or she might be tempted to look at other related

 

      events; for example, not seeing a difference in

 

      myocardial infarction, an investigator might be

 

      tempted to broaden the definition of a myocardial

 

      ischemic event to include sudden death or unstable

 

      angina if the differences between the groups

 

      supported some predetermined judgment.

 

                Similarly, not seeing a difference in

 

      stroke, an investigator might be tempted to broaden

 

      the definition to include a TIA.  But the

 

      possibilities of grouping is very, very large and

 

      the possibilities of finding something, if you want

 

      to be creative, are also quite large, even though

 

      these differences may be related to the play of

 

      chance.

 

                As a result, the definition of grouping

 

      may vary from study to study.  Now, some

 

      investigators try to fix this problem by setting up

 

                                                                49

 

      a uniform definition to be used across all studies.

 

      But when the definition is developed after a

 

      concern has been raised, those creating the

 

      definition have frequently already looked at the

 

      data or have communicated with those who have

 

      looked at the data, and know either consciously or

 

      subconsciously what kind of definition is required

 

      to capture the events of interest.

 

                The fourth, and what I want to focus on

 

      the most in my presentation, is the issue of

 

      interpreting comparisons of frequency of adverse

 

      events because the number of adverse events is

 

      small and, because they are small, they result in

 

      extremely imprecise estimates.

 

                Now, you may think that investigators

 

      generally understand the difficulties of analyzing

 

      small numbers of events.  For example, most

 

      investigators know that, when the number of events

 

      is small, the lack of an observed difference does

 

                                                                50

 

      not rule out the existence of a true difference.

 

      We have been taught that this should be apparent by

 

      looking at the confidence interval and, as you can

 

      see here, the confidence interval is very wide and

 

      includes the possibility of benefit and harm.

 

                So investigators, basically, consider

 

      these kind of data to be inconclusive.  But what is

 

      generally not appreciated is that, when the number

 

      of events is small, the confidence interval is

 

      necessarily so wide that it may not truly represent

 

      the range of values that would include the true

 

      effect of the drug.  As a result, even the finding

 

      of an observed difference does not necessarily

 

      prove the existence of a true difference.

 

                To illustrate this point, this slide shows

 

      the effect size and confidence intervals required

 

      to reach statistical significance in a hypothetical

 

      trial of 3,000 patients assuming a range from a

 

      very small to a very large number of events.

 

                Now, assuming the trial shows a

 

      statistically significant effect--that means that

 

      we are only going to look at this if a p-value,

 

                                                                51

 

      let's say, is less than 0.05--the smaller the

 

      number of events, the larger must be the treatment

 

      effect in order for this effect to be statistically

 

      significant and the wider the confidence intervals

 

      have to be.

 

                Put it another way, if the number of

 

      events is small, the trial will show a significant

 

      difference only if the treatment effect is very

 

      large and the estimate of the effect is very

 

      imprecise.

 

                Unfortunately, when you look at adverse

 

      events in a trial, the number of events will always

 

      be small.  This is because the trial, as you know,

 

      was designed to provide enough data to examine the

 

      primary endpoint, the trial produces a very precise

 

      estimate of, but it is not powered to look at any

 

      other analyses and, therefore, at the end of the

 

      trial, you get generally a less precise estimate of

 

      the secondary endpoint and an extremely imprecise

 

      estimate of any specific adverse event.

 

                Now, you may ask, what is wrong with an

 

      imprecise estimate?  Well, imprecise estimates are

 

                                                                52

 

      fine if the intent is to withhold judgement until

 

      more data are collected to make the estimates more

 

      precise.  But imprecise estimates are problematic

 

      if the intent is to stop and reach a conclusion.

 

                That is because, when calculated in the

 

      usual manner, p-values and 95 percent confidence

 

      intervals are most easily interpreted in the

 

      context of a completed experiment.  Unfortunately,

 

      the adverse-event data generated in a typical trial

 

      is not the result of a completed experiment.  In

 

      fact, viewed from the amount of data needed for a

 

      precise estimate, the adverse-event data in a

 

      single study only represents a snapshot of an

 

      ongoing experiment to characterize the safety of

 

      the drug.

 

                As a result, performing an analysis of

 

      adverse-event data is akin to performing an interim

 

      analysis of primary endpoint data in an ongoing

 

      clinical trial.  Now, this is important because we

 

      know a fair amount of how to interpret interim

 

      analyses in a clinical trial and here I really must

 

      apologize to Tom Fleming because what I am going to

 

                                                                53

 

      review here very quickly is borrowed heavily from

 

      his extensive work in this area.

 

                But it is really important to think about

 

      small numbers of adverse events as an interim look

 

      on a global effort to characterize the safety of a

 

      drug.

 

                Now, as you know, when you look at interim

 

      analyses in a clinical trial, one plots the

 

      treatment difference represented by a z-score

 

      against the amount of information that we have, and

 

      that is generally represented by the fraction of

 

      expected events.

 

                We start the trial at zero effect and zero

 

      information.  At the end of each interim analysis,

 

      we add a point until we get to get to the end of

 

      the study.  Now, if we have assigned an alpha of

 

      0.05 to the endpoint, we want to make sure that we

 

      evaluate the treatment difference seen at the end

 

      of the trial against an alpha of about 0.05 which

 

      generally corresponds to a z-score of about 2.0.

 

                Now, some might think, naively, that,

 

      during the course of a study, the observed

 

                                                                54

 

      difference between treatments will be so

 

      predictable that we would observe a linear march

 

      between the start of the study and the end of the

 

      trial.  But know that when the amount of data is

 

      small, things tend to bounce around a lot, so much

 

      so that early results can be very misleading.

 

                It is sort of like the situation of trying

 

      to predict the results of an election when only 1

 

      percent of the precincts have been reported and

 

      they are not even representative.  So, as a result,

 

      if we got excited about any difference in z-score

 

      more 2.0 early in the trial, we would be getting

 

      excited about effects that were not likely to be

 

      seen or sustained if we had more data even though a

 

      z-score of 2.0 would normally correspond to a

 

      p-value of less than 0.05.

 

                In fact, the smaller the amount of data,

 

      the more things can bounce around a lot, the more

 

      it is likely that what we will be seeing will be

 

      due to the play of chance.  Therefore, to prevent

 

      investigators from reaching a conclusion when the

 

      estimates are imprecise, statisticians,

 

                                                                55

 

      particularly Tom, have recommended that

 

      investigators refrain from getting excited about

 

      nominally significant z-scores when the amount of

 

      data is scarce.

 

                Specifically, they have proposed that

 

      boundaries must be crossed before we can feel

 

      comfortable that an effect seen early is likely to

 

      be present at the end of an experiment.

 

                Now, Tom, in particular, has proposed a

 

      curvilinear boundary like this.  There are many

 

      other boundaries that have been performed by

 

      others.  But this is very, very commonly used in

 

      the United States.  This represents a boundary with

 

      an alpha of 0.05 for a primary endpoint.  It sort

 

      of looks like this.  Because it is curvilinear, to

 

      be significant at the 0.05 level, the treatment

 

      difference must be extreme when the amount of

 

      information is small as would be the case early in

 

      the study.

 

                However, as the trial proceeds, treatment

 

      differences required to conclude that there is an

 

      effect at the 0.05 level decreases and become

 

                                                                56

 

      closer and closer to a z-score of about 2.0 at the

 

      end of the study.

 

                Now, this is a very different thought

 

      process and a very different approach than getting

 

      excited about a p-value less than 0.05 no matter

 

      when you observed it during the study.  For

 

      example, a z-score of 2.5--that is right

 

      here--would be meaningful if seen at the end of the

 

      study but it wouldn't be considered significant if

 

      seen early in the study even though the nominal

 

      p-value at this time is less than 0.05.

 

                Now, if the number of events is small, the

 

      difference would need to be far more extreme--say,

 

      a z-score up here--to be meaningful at the 0.05

 

      level.

 

                Here is a specific example.  This is an

 

      old cardiovascular trial.  This is the Coronary

 

      Drug Project.  It was carried out more than 30

 

      years ago.  It included a comparison of clofibrate,

 

      a lipid-lowering drug, and placebo on coronary

 

      events.  At four separate times during the study,

 

      the difference in favor of clofibrate was

 

                                                                57

 

      statistically significant at a nominal p of 0.05 or

 

      less.  But, at the end of the trial, there was no

 

      difference between placebo and clofibrate.  The

 

      difference seen early in the trial was related to

 

      the imprecision inherent when analyzing small

 

      numbers of events.

 

                In fact, if a boundary had been used in

 

      this study, at no time during the trial would the

 

      treatment effect have crossed the boundary and led

 

      to the conclusion that clofibrate was better than

 

      placebo.

 

                Now, let me say this kind of fluctuation

 

      early in a study is very, very common.  There are

 

      even examples that at treatment has been associated

 

      with a nominally significant adverse effect which

 

      later was reversed during the course of the trial

 

      and became statistically significant at the end of

 

      the study.

 

                Now, I should mention that the boundary

 

      that I have shown you is a boundary with an alpha

 

      of 0.05.  This means, when the boundary is crossed,

 

      the p-value for the treatment effect is less than

 

                                                                58

 

      0.05 not less than the nominal p-value that

 

      corresponds to the disease score that allowed the

 

      boundary to be crossed.

 

                Now, for each p-value or each alpha, there

 

      is a separate boundary.  The requirement for

 

      strength of evidence as it becomes more stringent,

 

      the boundary is shifted upward and to the right.

 

                You might ask why am I going through all

 

      this.  Because analyzing data derived in an

 

      underpowered trial raises the same concerns as

 

      analyzing data derived from an underpowered interim

 

      analysis in an adequately powered study.

 

                The cardiovascular field is replete with

 

      examples of how misleading small numbers of events

 

      can be.  Let me give you a few examples.  For

 

      example, in an early pilot trial, the ACE/NEP

 

      inhibitor, Omapatrilat, reduced the risk of a major

 

      cardiovascular event by 47 percent when compared

 

      with an ACE inhibitor.  As you can see, the

 

      confidence intervals are extremely wide because the

 

      analysis here was based on only 39 events.

 

                Later, a definitive trial was carried out

 

                                                                59

 

      that recorded nearly 1900 events.  There was no

 

      difference between Omapatrilat and the comparator

 

      ACE inhibitor on the same endpoint in the same

 

      population.

 

                Here is another example.  In an early

 

      pilot trial, amlodipine reduced the risk of a major

 

      cardiovascular event by 45 percent, small p-value

 

      but wide confidence intervals.  Later, in a

 

      definitive trial which recorded four times as many

 

      events, there was no effect of amlodipine on the

 

      same endpoint in the same population using the same

 

      investigators.

 

                There are even examples when the effect

 

      seen in a pilot trial was reversed when the

 

      definitive study was carried out.  Two examples.

 

      In two pilot trials, both in heart failure, one

 

      with the drug Vesnarinone, one with the drug

 

      Losartan, both drugs significantly reduced the risk

 

      of death--not a minor endpoint; death--by 50 to 60

 

      percent.  But these benefits were seen in trials

 

      that were each recorded fewer than 50 events and

 

      thus produced treatment estimates with extremely

 

                                                                60

 

      wide confidence intervals.

 

                When both drugs were reevaluated in

 

      definitive trials that recorded ten times as many

 

      events, both drugs were associated with increased

 

      risks of death, in one case, significant at the

 

      less than 0.05 level.

 

                Now, notice that the confidence intervals

 

      of the treatment effect in the definitive trials do

 

      not overlap with the confidence intervals of the

 

      treatment effect in the early pilot studies.  So

 

      here we have an effect, two examples, of an

 

      underpowered trial that showed a  significant

 

      benefit whereas the definitively powered study

 

      showed significant harm.

 

                Here is another example.  This is a

 

      meta-analysis of a small number of trials looking

 

      at the effect of magnesium in acute myocardial

 

      infarction.  A meta-analysis of a number of studies

 

      showed intravenous magnesium associated with the

 

      striking reduction in mortality, a 55 percent

 

      reduction in risk of death, but wide confidence

 

      intervals, a very small p-value, in a fairly large

 

                                                                61

 

      study.

 

                This effect appeared to be reinforced

 

      smaller treatment effect but wide confidence

 

      intervals and then, subsequently, in a definitive

 

      trial that recorded 4,000 deaths, there was a

 

      nearly significant adverse event of magnesium on

 

      the same endpoint in the same population.

 

                Now, again, please note that the

 

      confidence intervals of the treatment estimate in

 

      this definitive study do not overlap at all, with

 

      the confidence intervals of the estimates in the

 

      earlier moderately sized study, and not at all in

 

      the meta-analysis.  Again, this is really a

 

      reflection of the imprecision inherent in looking

 

      at small numbers of events.

 

                Let me give you one final example because

 

      it actually deals with an adverse effect.  In an

 

      early pilot trial with extended-release

 

      metoprolol--this is a study that looked at a very

 

      small number of events, about 20 events, showed a

 

      three-fold increase in the risk of hospitalization

 

      of heart failure in the metoprolol group compared

 

                                                                62

 

      with the placebo group.  Look at the confidence

 

      intervals here.  They go from about Washington to

 

      California, very, however, nominally significant

 

      treatment effect.

 

                When this trial was replicated in a

 

      similar population with exactly the same drug,

 

      exactly the same formulation, exactly the same

 

      dose, there was now a reduction in the frequency of

 

      hospitalization for heart failure.  Let me just

 

      emphasize, this was recorded as an adverse event in

 

      this earlier trial.

 

                So what have we learned from all this?

 

      Well, a couple of thoughts.  To achieve statistical

 

      significance in an underpowered analysis, the

 

      effect size must be extreme and the estimate must

 

      be imprecise.  Yet the more extreme the effect, the

 

      more imprecise the estimate, the less likely it

 

      will be reproduced in a definitive trial.  That is

 

      why I think, of all the things that we can worry

 

      about in looking at adverse events, the most

 

      worrisome is the imprecision inherent in the

 

      analysis of small numbers of events.

 

                Let me just close with a few final

 

      thoughts.  You might ask, based on all of this,

 

      what should we do.  Well, I think the first step,

 

                                                                63

 

      perhaps the most important first step, is to

 

      develop an approach to analyzing data in trials

 

      with small numbers of events which actually

 

      accurately reflects the true imprecision of the

 

      treatment effect estimate and its statistical

 

      significance.

 

                Let me just emphasize one thing, and I

 

      just want to put this as a proposal.  In no way,

 

      would I propose this as a definitive solution but,

 

      to get the discussion going, this might be an

 

      interesting first way of thinking about this.

 

                The conventional way of comparing small

 

      numbers of events is to calculate 95 percent

 

      confidence intervals followed by the derivation of

 

      the p-value.  However, the conventional calculation

 

      of the confidence intervals incorporates into it a

 

      z-score that the investigator designates as the

 

      target value for statistical significance.  For

 

      example, most statisticians, in calculating a

 

                                                                64

 

      confidence interval, would simply use a z-score of

 

      about 2.0.

 

                And they would do that because that is the

 

      critical value for the z-score at the end of an

 

      adequately powered trial with an alpha of 0.05.  So

 

      what they would do is they would take this z-score

 

      and they will use it to calculate the confidence

 

      interval.  What a lot of people, I think, fail to

 

      realize is that this z-score is not the critical

 

      value for decision making if one looks early in the

 

      same experiment.

 

                Early in that experiment, the critical

 

      value for a z-score should be determined by the

 

      interim monitoring boundary appropriate for the

 

      information content, not the z-score at end of the

 

      study.

 

                Now, if one uses the boundary z-score in

 

      the calculation of the 95 percent confidence

 

      intervals, the confidence intervals here will be

 

      much, much wider resulting in a p-value that will

 

      no longer be statistically significant.  Now this

 

      is important because everyone talks about p-values

 

                                                                65

 

      at these meetings.  I showed you these data before.

 

      Conventionally calculated, the p-value would be

 

      0.002 meaning the likelihood of chance alone being

 

      2 in 1000.

 

                Well, if, in fact, if one recognized that

 

      the data here really result in a very imprecise

 

      estimate and one incorporates the thinking process

 

      of an O'Brien-Fleming boundary into this, as a

 

      reflection of this imprecision, then the confidence

 

      intervals now truly reflect the imprecision in the

 

      estimate and now the p-value is a lot interesting

 

      than it was before.

 

                Now, the use of boundary-adjusted

 

      confidence intervals would, I think, appropriately

 

      describe the great uncertainty inherent in the

 

      analysis of small-numbers events, hopefully

 

      markedly reducing the false-positive error rate.

 

                In spite of using a boundary-adjusted

 

      confidence interval, adverse effects that are known

 

      to be characteristic of specific drugs would

 

      generally remain statistically significant.

 

      However, this approach, and it is just a thought

 

                                                                66

 

      experiment, would not provide a way to interpret

 

      trends observed in imprecise data.

 

                So, lastly, let me just conclude with some

 

      thoughts about what we should do with worrisome

 

      trends in imprecise data.  The first thing we could

 

      do is believe in those that are biologically

 

      plausible.  However, we need to be very careful

 

      here.  Everyone knows physicians can always be

 

      relied on to propose a biological mechanism to

 

      explain the validity of an unexpected and

 

      potentially preposterous finding simply because it

 

      happens to have an interested p-value.  Anyone who

 

      doesn't believe this, you know, I would be happy to

 

      show you overwhelming evidence that this is the

 

      case.

 

                Second, is we could look for confirmatory

 

      evidence in other studies reminding that we

 

      shouldn't be selective.  But, even if every study

 

      showed the same trend, how would you know that you

 

      had enough evidence to reach a conclusion?  Some

 

      have proposed doing a cumulative meta-analysis in

 

      which each trial is considered to represent an

 

                                                                67

 

      interim analysis on the way to a final judgement.

 

                Indeed, Salim Yusef has proposed that, as

 

      each trial is added to the meta-analysis, that one

 

      use interim monitoring boundaries to interpret this

 

      cumulative meta-analysis.  This has, certainly, a

 

      considerable amount of appeal.

 

                Let me just emphasize.  Salim has, in

 

      fact, underscored the fact that the conditions here

 

      are not identical those that exist for a true

 

      interim analysis.  In the case of a true interim

 

      analysis, we generally know that the types of

 

      patients in studies are similar at all observation

 

      points.  Here it is different.

 

                In the case of a cumulative meta-analysis,

 

      the types of patients in studies differ across the

 

      various trials.  So, as a result, Salim has

 

      proposed that, when reaching a conclusion based on

 

      data that has been combined across trials, that a

 

      boundary more strict than 0.05 be used.

 

                Now, he has specifically outlined the

 

      importance of this using the example of intravenous

 

      magnesium.  I showed you the data on intravenous

 

                                                                68

 

      magnesium in myocardial infarction.  When the early

 

      trials with magnesium were carried out, the z-score

 

      of greater than 2.0 was crossed early.  As the

 

      cumulative evidence occurred, the initial boundary

 

      of 0.05 was crossed.

 

                But then a large study, when added to the

 

      other cumulative analyses, brought this treatment

 

      effect down to a 0 level.  So Salim, and others, in

 

      fact, have emphasized that, when you are using a

 

      meta-analysis approach and using intra-monitoring

 

      boundaries, that maybe one should require a p-value

 

      of less than 0.05 or even, perhaps, a small

 

      p-value.

 

                Let me say that most of the effects the

 

      committee has seen over the past two days would not

 

      come even close to meeting these criteria.

 

                Now, some of you may say, why not avoid

 

      all of this uncertainty and simply carry out an

 

      adequately powered definitive trial with the

 

      adverse event as the primary endpoint.  Is this

 

      crazy?  No; it is not crazy at all.  Sponsors

 

      pursue encouraging trends.  Most are disappointed,

 

                                                                69

 

      but they will pursue them.  Sponsors, therefore,

 

      should have an obligation to pursue discouraging

 

      trends realizing that most of them probably won't

 

      be confirmed either.

 

                On a definitive trial can address

 

      ascertainment and classification biases as well as

 

      concerns about multiplicity of comparisons and

 

      imprecision of the data.  However, can we really

 

      expect sponsors to pursue every adverse trend?

 

      There are some obvious limitations to doing this.

 

      Furthermore, if you could decide which adverse

 

      trend you wanted to pursue, how easy would it be to

 

      carry out the trial intended to definitively

 

      evaluate an increased risk of an adverse effect?

 

                Can you imagine the consent forms for the

 

      IRBs for such a study?  Some may say that we are

 

      being too stringent here, the that criteria of

 

      raising a safety concern need not be as stringent

 

      as the criteria for establishing efficacy.  But I

 

      am not so sure that the criteria for establishing

 

      efficacy and safety should be that different.

 

                As a rule, we are very strict in reaching

 

                                                                70

 

      conclusions about efficacy because saying that

 

      there is a benefit when there is none means that

 

      millions will be treated unnecessarily and subject

 

      to side effects and cost.  Now, although some might

 

      advocate being less strict in reaching conclusions

 

      about safety, please remember; saying that there is

 

      an adverse effect when there is none means that

 

      millions will be deprived of an effective

 

      treatment.

 

                In conclusion, the findings of controlled

 

      trials are most easily interpreted when they

 

      represent the principal intent of the study.  A

 

      non-principle finding is subject to many

 

      interpretive difficulties many of which we have

 

      reviewed; ascertainment biases, inflated

 

      false-positive rates due to multiplicity of

 

      comparisons and, the one I have emphasized the

 

      most, the imprecision of estimates inherent in the

 

      analysis of small numbers.

 

                I think FDA, industry and academia remain

 

      in a quandary as to how to respond in a responsible

 

      fashion to observe differences in the reported

 

                                                                71

 

      frequency of adverse events.  Let me just

 

      emphasize, my presentation shouldn't be construed

 

      as favoring one particular side in all the

 

      discussions that have occurred.  In my view,

 

      regardless of one's position, it is critical to

 

      understand the limitations of what we know and to

 

      resist the temptation to reach conclusions before

 

      we are justified to do so.

 

                I think only by recognizing our ignorance

 

      will we be able to take the first step towards

 

      developing a rational approach that is in the

 

      interest of all patients.

 

                Thank you.  I will be happy to answer any

 

      questions.

 

                DR. WOOD:  Dr. D'Agostino?

 

                DR. D'AGOSTINO:  Thank you very much,

 

      Milt.  I have a couple of questions that I think, I

 

      hope, are relevant to our deliberations.  In terms

 

      of your sense of large and the idea of chasing

 

      after a safety event and making more out of it than

 

      one should, we have a study approved where there

 

      was a serious up-front prestated deliberation to

 

                                                                72

 

      make sure they had good ascertainment and

 

      adjudication of cardiovascular events, and they

 

      come up with 45 versus 25 events, carefully

 

      collected.

 

                I am struck by that's being small, but I

 

      am also struck by the carefulness in which it was

 

      done, say, as opposed to the APD where they did an

 

      interim analysis that has those problems.  Could

 

      you comment on, say, the approved study?

 

                DR. PACKER:  I think that, when you have

 

      incomplete data, as you would if you have

 

      small-numbers events, you need to be a lot more

 

      careful about the thinking process.  That doesn't

 

      mean you can't make judgments.  It doesn't mean you

 

      can't incorporate a set of principles that would

 

      guide decision making by looking at the totality of

 

      the evidence and bringing to the process what you

 

      inherently believe.  I think that is what the

 

      committee needs to do today.

 

                What I really wanted to address, however,

 

      is how hard this is and that the normal

 

      reliance--as you know, clinical investigators,

 

                                                                73

 

      because they don't understand p-values, rely on

 

      them.  What I am trying to do is to explain that,

 

      in fact, we are less certain about what we know

 

      here than we, perhaps, should be.

 

                DR. D'AGOSTINO:  But that is on the

 

      approved, studies, it was reasonable, too.

 

                DR. PACKER:  I think you need to take that

 

      in the totality of the carefulness in which it was

 

      done, the prospective nature of it.  But, remember,

 

      in all the examples that I showed you, the trend

 

      seemed sometimes very striking trends in early

 

      pilot trials that were prespecified, adjudicated

 

      endpoints but, because they were small-number

 

      events with very imprecise estimates, the

 

      definitive trial was non-confirmatory.

 

                So just because it is up-front and

 

      predefined--

 

                DR. D'AGOSTINO:  That is my question, yes.

 

      That is my question.  You still end up with small

 

      numbers.  Let me have just a couple of other

 

      questions.  The second question is really bothering

 

      me very much in terms of how we would recommend

 

                                                                74

 

      trials.  If you decide--if the group decides and

 

      suggests to the FDA that there should be more

 

      trials, more randomized clinical trials, the

 

      sponsors are, then, going to have to go back and

 

      say, well, they are going to set up a trial saying

 

      the null hypothesis that the relative risk is 1.0

 

      versus the relative risk is not 1.0.

 

                Now, the best thing a sponsor can do is to

 

      run a very sloppy study and they will accept that

 

      null hypothesis because the confidence intervals

 

      will so wide and they will contain 1.0.  The

 

      alternative is to sort of do a noninferiority type

 

      idea that you end up the study, you end up with the

 

      confidence interval, and that confidence interval

 

      has to be below something like 1.3.

 

                Do you have advice for us if you did this

 

      sort of second approach?  We are dealing with rates

 

      like 1 percent.  Could we live with a 1.3 relative

 

      risk that you rule out, a 1.3 relative risk?

 

      People may be dying if you do that.  So how do you

 

      respond to that?

 

                DR. PACKER:  I wish I knew the answer to

 

                                                                75

 

      that.  I think that it depends on the type of

 

      adverse reaction.  It depends on the particular

 

      drug.  It depends on the vulnerability of the

 

      patient population.  All of these need to be

 

      factored together with the actual feasibility of

 

      doing the study.

 

                The one thing I would say is that one

 

      learns very little by doing a lousy trial.  So,

 

      doing a good trial is the only way to get a

 

      reasonable answer or reasonable estimate of the

 

      answer.

 

                DR. D'AGOSTINO:  Just one more.  I will

 

      make it quick.  In these trials, in many of these

 

      trials, people just won't stay in the trial.  Can

 

      you give us some advice on how to deal with the

 

      drop-out--now, there are rules that you could say,

 

      the individual wants to leave, has decided to leave

 

      because the blood pressure is building up or

 

      because of G.I. problems building up.

 

                To say, we are only going to look at that

 

      individual for 14 more days after they leave, to

 

      me, is a problem because if the blood pressure is

 

                                                                76

 

      building up, they may be on their way and it may

 

      take two or three months before they get an M.I.

 

      and so forth.  So you have got the sort of

 

      dropouts, terminations, that are part of the

 

      protocol but you also have the individuals who just

 

      stop coming.  And they could be substantial.  So,

 

      any advice to us?

 

                DR. PACKER:  Gee, as you know, when we do

 

      trials for superiority, the effort that we put into

 

      adherence is extreme.  We really want people to

 

      stay on treatment and we organize the trials to do

 

      everything we can to ethically and reasonably

 

      maintain adherence.

 

                I take your point that, if the trial were

 

      a noninferiority trial, it is possible that the

 

      investigators and sponsor might be less motivated

 

      recognizing that poor adherence works in their

 

      favor.  I think that there needs to be a reasonable

 

      effort--I mean, you can maintain adherence in most

 

      trials if you really, really want to.

 

                DR. D'AGOSTINO:  Thank you.

 

                DR. WOOD:  I suspect we are not going to

 

                                                                77

 

      solve that problem today.  Dr. Shapiro?

 

                MS. SHAPIRO:  Just a comment on your

 

      comment.  We all know, of course, that the Federal

 

      Regulations require that participants be allowed to

 

      withdraw and not be badgered into staying.  But

 

      what I really wanted to talk about was your

 

      observations about how it is wrong to suggest that

 

      we should not chase safety quite as rigorously

 

      because we will, then, deprive ourselves and others

 

      of information and access to effective treatment.

 

                I don't think it is as simplistic as that,

 

      in that, when we are looking at potential harm or

 

      safety problems, we have to look not only at

 

      likelihood that it exists but prevalence and

 

      severity.

 

                So I think that your response to that

 

      approach has to take account of those factors as

 

      well.

 

                DR. PACKER:  Let me try to reframe my

 

      response.  You can't isolate benefit from risk.

 

      The judgment as to whether a drug should be used on

 

      an individual basis or on a population basis has to

 

                                                                78

 

      be the relative value of benefit to risk.  You may

 

      decide that you don't even want to pursue a safety

 

      trend in a non-fatal event when you know the drug

 

      prolongs life.  That would be a very reasonable

 

      judgment.

 

                On the other hand, you might want to

 

      vigorously pursue a very serious safety is in a

 

      drug for a symptomatic or cosmetic condition.  So

 

      the risk-to-benefit relationship is the one that

 

      has to be vigorously defined.

 

                MS. SHAPIRO:  Right.  I am sure you will

 

      agree with this; you also have to factor in

 

      prevalence of the condition and likely use of that

 

      drug in the population.

 

                DR. PACKER:  That's right.  But it is

 

      always--it is risk to benefit.  The goal here is

 

      not to say that the risk-to-benefit relationship

 

      can be altered, simply because you want to

 

      emphasize one part or another, has to be in the

 

      context of the clinical problem and looked at from

 

      the patient point of view.

 

                DR. WOOD:  Dr. Cush?

 

                DR. CUSH:  I have two questions.  One, I

 

      need some education.  You were frequently referring

 

      to very wide confidence intervals where it didn't

 

                                                                79

 

      seem so wide.  It was only, like, 0.3 and 0.4

 

      where, obviously, when it ranged from 1.0 to 8.0,

 

      that is very wide.  But you used those terms in

 

      both situations.  Could you explain the differences

 

      there?

 

                DR. PACKER:  Actually, I have used "wide"

 

      to refer to extremely wide, moderately wide and

 

      wide.

 

                DR. CUSH:  And narrow would be--

 

                DR. PACKER:  Narrow is less than wide.

 

                DR. CUSH:  Okay.

 

                DR. PACKER:  Let me try.  All the examples

 

      that I showed you that I characterized as wide

 

      truly reflected estimates that had a high degree of

 

      uncertainty associated with it.  On the benefit

 

      side, benefits that range from an 80 percent

 

      reduction in risk on the high side to a 20 percent

 

      reduction in risk--remember, and I guess I should

 

      emphasize this and I guess Tom would reinforce this

 

                                                                80

 

      dramatically, the concept of how these curves

 

      looked like in terms of the width is not

 

      symmetrical on both sides of 1.0.  The lowest you

 

      can go below 1.0 is 0.  So wide confidence

 

      intervals below 1.0 can be 0.2 to 0.8.  Those would

 

      be wide confidence intervals.  There is no limit

 

      for estimates greater than 1.0, so you can have 1.0

 

      to 24 on the adverse side of this.  So you have to

 

      sort of think about what is wide differently when

 

      you are looking at estimates below 1.0 than when

 

      you are looking at estimates above 1.0.  Maybe that

 

      would be helpful.

 

                DR. CUSH:  That does help.  Secondly, you

 

      have told us that when we are dealing with

 

      low-numbers adverse events and that being very

 

      imprecise and hard to make conclusions from, is it

 

      even less valid or even greater error to, then,

 

      take that data derived in one situation, like in an

 

      Alzheimer's trial, and then try to generalize that

 

      to the general population?

 

                DR. PACKER:  But we do that all the time.

 

      There is a general sense that efficacy is not

 

                                                                81

 

      extrapolatable across diseases but safety that is

 

      not disease-specific is extrapolatable.

 

                Let me put it this way.  If we didn't do

 

      that, the problem that I put forward would be

 

      really impossible, really impossible.  So I

 

      actually feel comfortable extrapolating safety data

 

      across indications as long as the safety item is

 

      not disease-specific.

 

                DR. WOOD:  Dr. Shafer?

 

                DR. SHAFER:  Thanks.  That was actually a

 

      very informative presentation and I can confirm the

 

      distance from Washington to California.

 

                There are really two questions here that I

 

      think we need to bifurcate.  One of them involves

 

      the scientific question of getting at the truth,

 

      whatever that is.  I appreciate everything you say

 

      and, prior to a drug being approved, at least

 

      ideally, there would be adequate time and resources

 

      to do exactly what you are proposing.

 

                But there is a second question which is

 

      how to inform clinical and regulatory decision

 

      making based on imprecise information following

 

                                                                82

 

      approval because, in that setting, a daily decision

 

      is being made by patients and their physicians as

 

      to whether or not they need to take the drug.

 

                One question about how to approach these

 

      sorts of imprecise data when, in fact, a daily

 

      decision is occurring, is can you take the

 

      confidence bounds for both the risk and the benefit

 

      and integrate those over the public-health hazard

 

      and the public-health benefit to try to incorporate

 

      the entire--both the point estimates but also the

 

      uncertainty about them into the regulatory

 

      decision-making process?

 

                DR. PACKER:  Oh, wow.  Just a couple of

 

      comments.  One, the precision of the estimates on

 

      efficacy is almost always more precise, much more

 

      precise, than the estimates on safety.  So you have

 

      this very precise estimate on efficacy.  You have

 

      this very imprecise estimate, in general, on

 

      safety.  And you try to sort of integrate them and

 

      you have to now weigh them because it could be that

 

      the efficacy thing you are looking at is really

 

      important and the safety is sort of not very

 

                                                                83

 

      important.  Or it could be other way around, the

 

      efficacy is sort of very small--the efficacy is

 

      small, but the safety is a big risk.

 

                DR. SHAFER:  That is exactly the question.

 

                DR. PACKER:  You might think that someone

 

      in the world might be clever to create a

 

      statistical model that would allow that to take

 

      place.  I am actually much more comfortable with

 

      people doing that than statistical models doing

 

      that.  Somehow, people have the ability to

 

      integrate all of this, especially a group of people

 

      have an ability to integrate this, much better than

 

      any mathematical model.

 

                I would be very uncomfortable if someone

 

      were actually to propose a mathematical model that

 

      replaced the human, very important human, element

 

      here.

 

                DR. WOOD:  Dr. Farrar.

 

                DR. FARRAR:  Every example that I have

 

      seen to date in looking at the risks in

 

      overinterpreting data seem to go from being a

 

      positive study to a negative study.  I wonder about

 

                                                                84

 

      the other way around and whether there are any

 

      inherent differences in thinking about it the other

 

      way around, the bottom line being that if you have

 

      ten studies that show no safety issue with a

 

      well-measured process, whether you can then say,

 

      well, maybe the 11th study is going to show it

 

      somehow.

 

                DR. PACKER:  I think you need to find out

 

      how much information there is in each study, how

 

      easily or how appropriate it is to combine the data

 

      across the studies to determine how precise the

 

      estimates, after you have collected and integrated

 

      all of the data, and put that into a judgement as

 

      to how much data you actually need to be confident

 

      about the precision of the estimate.

 

                So there isn't a uniform way of thinking

 

      about.  It is not like you will know it when you

 

      see it.  There is  some guidance, some mathematical

 

      guidance, that needs to be incorporated into the

 

      thinking process.

 

                DR. WOOD:  Dr. Domanski.

 

                DR. DOMANSKI:  You know, I am not nearly

 

                                                                85

 

      as sophisticated, really, Milton, as you are about

 

      this sort of thing nor about some of the people in

 

      the room, but I am a little bit concerned about

 

      some of the examples.  I will give you one.  I

 

      don't think ISIS 4 was a definitive trial of

 

      magnesium, because I know something about that.  We

 

      did the MAGIC study which was a very large study.

 

                Like ISIS 4, it was negative, but ISIS 4

 

      was substantially different methodologically in

 

      terms of when that was given.  I think that example

 

      actually, to be honest, is fairly misleading as a

 

      result.  I think it is an example of a stopped

 

      clock is right twice a day.  But, yeah; it came out

 

      right.

 

                But I a worried if that is the basis for

 

      this--that kind of thing is the basis for this

 

      discussion across more of the landscape.

 

                DR. PACKER:  Let me emphasize, Mike, that

 

      I knew that if I picked one study and gave you an

 

      example of one st that I would be at great risk

 

      because everyone knows something about these

 

      studies more than what I know about these studies

 

                                                                86

 

      although some of the studies I actually mentioned

 

      were studies I was personally involved with and

 

      think that I know a little more about them.

 

                So I just wanted to--I would not

 

      overemphasize--and, in fact, one might

 

      appropriately underemphasize--the magnesium

 

      example.  But the other examples, time and time and

 

      time and time again.  It is just like reaching

 

      conclusions during a very early part of a study

 

      based on interim monitoring.  When you have small

 

      numbers of events, the estimates are very imprecise

 

      and may not reflect what happens at the end of a

 

      complete experiment.  That is just a general

 

      principle.

 

                I take your point about ISIS 4 but the

 

      number of examples here is just overwhelming.

 

                DR. WOOD:  It is important, Milton, to

 

      remember, we have replication for two of these

 

      drugs and these safety signals here.  So it is not

 

      just single studies.

 

                Dr. Furberg.

 

                DR. FURBERG:  Milton, I think that was a

 

                                                                87

 

      great presentation.  I think, for balance, it would

 

      be nice if you can have examples showing the other

 

      side, how trends in smaller studies were confirmed

 

      in definitive trials.  And I know plenty of those.

 

                DR. PACKER:  Oh, yes.

 

                DR. FURBERG:  That was never discussed.

 

      You are painting a dark picture saying you can't

 

      trust smaller studies.  You are right.  You never

 

      know where you are going to end up and you need to

 

      be careful.  But don't say that you can't rely on

 

      those.

 

                DR. WOOD:  I was actually on the advisory

 

      committee that turned down Vesnarinone, that looked

 

      at that study.  There were lots of issues that came

 

      up at that time that led us to do that.  So it

 

      wasn't just that there was a study that was

 

      compelling and that people went with that.

 

                Dr. Nissen?

 

                DR. PACKER:  Curt, let me just say that--I

 

      think your point is very, very important.  What I

 

      have not done is shown many, many examples of

 

      interim monitoring in trials where the early

 

                                                                88

 

      results were reflective of the endpoint.  I have

 

      not shown a whole host, probably more than I could

 

      think of, of all of the pilot trials where the

 

      initial trends encouraged someone to pursue it and

 

      that the second study was, in fact, very

 

      confirmatory.

 

                Let me just make my point clear.  It is

 

      just not as reliable as we think it is.  It is not

 

      that it is worthless.  I do not want to say that.

 

      If I have implied that, then I do not want to imply

 

      that.  I just want to say that the risk of error

 

      early when you have small-number events is much,

 

      much greater than when you have a much more precise

 

      estimate at the end of the trial.

 

                My plea here is that when you don't know,

 

      the best thing you can do is say, "I don't know."

 

      And that is my only plea.

 

                DR. WOOD:  Milt, when you have two trials

 

      that replicate one another, with a p-value of less

 

      than 0.05, if that was an efficacy endpoint, we

 

      would approve on the basis of that; correct?

 

                DR. PACKER:  That's right.

 

                DR. WOOD:  But you are telling us that,

 

      when it is a safety endpoint, we should not act on

 

      that.  I think it is counterintuitive.

 

                                                                89

 

                DR. PACKER:  No, no, no.

 

                DR. WOOD:  Hang on.  That seems to me

 

      counterintuitive.  We have, for two of these drugs,

 

      two randomized trials that replicate the outcome.

 

      In three of the four trials, the outcome was

 

      predefined, adjudicated and so on.  That is about

 

      as good as any drug that has been approved on the

 

      U.S. market that I can think of.

 

                DR. PACKER:  Let me just add one

 

      dimension, Alastair, to the thinking process and

 

      that is that when you have a p less than 0.05 on

 

      two trials, on the primary endpoint because it is

 

      efficacy, you have two trials that were designed

 

      for the endpoint and have fairly narrow confidence

 

      intervals and precise estimates.

 

                That is not the same concept as having a p

 

      less than 0.05 on two imprecise estimates which are

 

      combined together.

 

                DR. WOOD:  No; I understand that very

 

                                                                90

 

      well.  I think we all do.  The issue here is both

 

      of the second trials--both of the second

 

      trials--were designed to test the safety issue that

 

      was in the first trial even though they were

 

      efficacy studies.  So it is not like they were just

 

      two trials that fell on the ground from Mars that

 

      arrived with something.  These were designed, at

 

      least according to the sponsors, to check for that

 

      outcome.

 

                So I think you are overselling the point a

 

      bit.

 

                Let's move on.  Dr. Jenkins?

 

                DR. JENKINS:  I found the presentation

 

      very interesting and I wanted to probe a little bit

 

      further on the APPROVe study because that is the

 

      one that I think we were feeling very comfortable

 

      with the finding in APPROVe.  Yet, I went back to

 

      Merck's presentation, and their prospective plan

 

      was actually to combine three studies that were

 

      going to be placebo versus rofecoxib in three

 

      different populations.

 

                Their plan was to have 25,000 patients to

 

                                                                91

 

      evaluate the cardiovascular signal.  Now, in

 

      APPROVe, presumably, they had stopping rules that

 

      the Data Safety Monitoring Committee saw an extreme

 

      effect that met those criteria so they stopped the

 

      study.  But I am just interested in hearing your

 

      thoughts about how should we interpret APPROVe

 

      where the stopping rule is met for an individual

 

      study when the prespecified plan was to have three

 

      studies combined for 25,000 patients.

 

                DR. PACKER:  Gee, I must say that I am

 

      delighted to have everyone ask me the hard

 

      questions for this afternoon.  I sort of think that

 

      this is what this committee has to do.  I only

 

      wanted to add a dimension to the thinking process

 

      here.  I don't come with any answers on how to put

 

      all of the data together.  All of the points on how

 

      to synthesize these data, I am very comfortable

 

      with the human process of doing so as long as the

 

      human process incorporates an understanding of how

 

      difficult and imprecise this is and the fact that,

 

      in the past, although it has led to predictions

 

      that came true, it also led to predictions that did

 

                                                                92

 

      not come true.

 

                DR. JENKINS:  I think, more specifically,

 

      the point I was trying to get you to comment on is

 

      not the overall interpretation of the rofecoxib

 

      data but the fact that there was a plan for 25,000

 

      patients in three studies.  What I am trying to

 

      understand is how should we, then, interpret a

 

      finding from one of those three studies where an

 

      interim analysis crossed the stopping boundary and

 

      met the criteria for stopping the study.  What

 

      weight should we give to that finding in that

 

      single study?

 

                DR. PACKER:  I don't think there is a

 

      precise answer to that.  Any time you deviate from

 

      your preplanned attack on the conduct of analysis

 

      of a trial, you weaken, to varying degrees, the

 

      precision of the estimate and the confidence you

 

      have in the data that you are looking at.

 

                DR. WOOD:  Dr. Nissen?

 

                DR. NISSEN:  Milt, there is an additional

 

      subtlety here.  Let me see if I can drill down with

 

      you on it.  What we have here is a class of drugs

 

                                                                93

 

      where we have multiple trials within the class.  So

 

      what we are asked to do is not necessarily, in some

 

      respects, for each individual drug, say, well, do

 

      we have replication or not.

 

                But if we take the position that this is a

 

      class effect, then we have got four, or perhaps,

 

      five trials.  This came up once before.  It was

 

      kind of controversial.  I think you may have been

 

      on the committee at the time when we had the

 

      angiotensin-receptor blockers for renal protection.

 

      What the two companies did with two different drugs

 

      is they stipulated that the other could use the

 

      data from the other company's trials as supportive.

 

                So the reason that this is really much

 

      harder is that we have a lot of trials here.  We

 

      may not have reached all the evidence in an

 

      individual drug, but we have trials across the

 

      class of drugs.  I wonder if you have any thoughts

 

      about this because it is obviously a difference

 

      between studying a single agent and studying a

 

      class of agents.

 

                DR. PACKER:  I think that, Steve--I mean,

 

                                                                94

 

      that is why the process works best when there are

 

      human beings involved in the thinking process.

 

      There is no predetermined sense that one should

 

      bring to the process--that you confine the analysis

 

      only to one drug.  What you should allow yourself

 

      to do is look at the data with one drug, look at

 

      the data with drugs that you think are related.

 

                If there are data that you think are in a

 

      drug that really isn't related, you might want to

 

      analyze that separately or do it both ways to see

 

      if it is consistent.  There is no statistical

 

      formula that can guide the very important human

 

      process here.

 

                My major point is that the precision that

 

      most clinical investigators think exists here isn't

 

      as precise as we think it is.  But that doesn't

 

      mean that you--and Curt would emphasize this--that

 

      doesn't mean that you can't put together your own

 

      picture of the totality of the data and bring to it

 

      a sense of whether it reaches some critical level

 

      of concern.

 

                In the absence of precision, you have got

 

                                                                95

 

      to do that.  But don't forget inherently that the

 

      data are imprecise.

 

                DR. WOOD:  Curt, do you want to say

 

      something else?  No.  Then let's move on.  The next

 

      speaker is Bob Temple who we are going to confine

 

      to his seat.

 

                DR. TEMPLE:  Alastair, I have a question.

 

      What am I supposed to do about my slides?  Can

 

      someone show them for me?  I will delete many of

 

      them.

 

                DR. WOOD:  Okay.  You can come up here if

 

      you do it quickly.

 

                DR. TEMPLE:  I don't care where I'm from.

 

      I really don't.

 

                DR. WOOD:  Then Kimberly will work the

 

      slides for you.

 

                DR. TEMPLE:  Okay; if Kimberly will do

 

      that.

 

                 Issues in Projecting Increased Risk of

 

            Cardiovascular Events to the Exposed Population

 

                DR. TEMPLE:  I was not in any way trying

 

      to address the main issues the committee is

 

                                                                96

 

      grappling which is about what to do about these

 

      drugs.  But it seems to me you can't help noticing

 

      that there is some data we would all like to have

 

      that we don't have and that is what I was trying to

 

      address.

 

                Obviously, the main thing we are worried

 

      about is the effect of the COX-2-selective NSAIDs

 

      on cardiovascular outcomes, notably death, stroke

 

      and heart attack.  But are particularly interested

 

      in the single drug effects, whether they are all

 

      the same.  We are interested in whether we are

 

      looking at true class effects of differences.

 

                We also can't help noticing there is not a

 

      lot of long-term data on the nonselective NSAIDs

 

      and, of course, has been pointed repeatedly, some

 

      of them are sort of selective anyway.

 

                There is major interest in possible

 

      differences in the subpopulations that might be a

 

      different risks.  I think there are mechanistic

 

      considerations, how much of this is really likely

 

      to be platelets and could there be a blood-pressure

 

      effect.  The importance of that, to me, is that it

 

                                                                97

 

      is not quite clear what to do about platelet

 

      effects, but, conceivably, you could manage a

 

      blood-pressure effect if that was a problem.

 

                There is a lot of importance and interest

 

      in the dose and dose interval.  And it is important

 

      to think about how long studies have to be to

 

      detect these things.  Obviously, some of trials

 

      seem to have shown things in a matter of seven or

 

      eight months.  There is some suggestion that some

 

      of the effects need much longer to detect.

 

                Skip the next one.

 

                With respect to cardiovascular effects,

 

      the main question is whether everything is really

 

      answered.  You know, there are lots of studies, as

 

      Alastair was pointing out.  They are not perfectly

 

      consistent, maybe, but there are a number of

 

      studies with a number of drugs that seem to be

 

      showing the same thing.

 

                I guess, to me, they don't seem entirely

 

      consistent.  There are a number of possible reasons

 

      for that.  One is that there really are differences

 

      between drugs, or at least between doses.  Another

 

                                                                98

 

      is that even the best controlled studies sometimes

 

      give different answers.  Another is that small

 

      effects are difficult to evaluate in epidemiologic

 

      and even controlled studies.  Then the last is that

 

      the effects may be population-dependent.  That has

 

      been discussed.

 

                So it does seem to me there is more to

 

      learn.  Skip the next.  We all know that.  Platelet

 

      effects.

 

                One of the things that seems important to

 

      pin down and I don't think it has been pinned down

 

      yet is the possibility that blood pressure is a

 

      significant part of all this, that there is some

 

      impression that Vioxx has bigger blood-pressure

 

      effects than the other drugs, but I don't think

 

      there is what we would call adequate data on the

 

      effects of all these.

 

                By adequate data, I mean data that gives

 

      you information about the effect of drug over the

 

      entire dosing interval, that has pinned down dose

 

      response and that has pinned down the effect of

 

      different dosing intervals.  There is an

 

                                                                99

 

      impression, though, that these drugs can reverse

 

      the effect of other anti-hypertensives, perhaps,

 

      especially, ones that work through the renal and

 

      angiotensin system.  They seem to have, at least

 

      some of them, an effect on blood pressure generally

 

      and then there are isolated reports of hypertension

 

      in trials reported as adverse reactions, clearly

 

      more common in the treated groups.

 

                I have a bunch of slides showing that

 

      elevated blood pressure is bad for you.  You can

 

      deduce that from epidemiologic effects, from a

 

      mountain of clinical studies.  The most recent

 

      study that of interest, which I will not

 

      describe--keep going--in detail is a study that

 

      Steve Nissen knows about called CAMELOT which you

 

      can read as saying that a change in blood pressure

 

      of even 5 millimeters of mercury systolic and 3

 

      diastolic might have a reduction of about

 

      33 percent in the kinds of events we are talking

 

      about in people whose diastolic pressure is only

 

      about 100.

 

                That is not definitive.  This is a subset

 

                                                               100

 

      of the data and you can look at my slide to see

 

      what I did.

 

                As I said, we don't know as much about the

 

      blood pressure as we should.

 

                So a crucial question is in the larger

 

      assessment of cardiovascular effects; what can we

 

      really study more.  My own view is that, given

 

      VIGOR and fairly consistent epidemiologic findings,

 

      it would be difficult to study 50 milligrams of

 

      rofecoxib.  I doubt you could write a proper

 

      informed consent.

 

                I take Milton's concern to heart but I

 

      guess my own view is there is probably enough

 

      information about that.  But what you could with

 

      respect to other things depends on what you

 

      believe.

 

                Suppose you believe that the

 

      cardiovascular risk of 200, 400, of celecoxib is

 

      not entirely clear.  One polyp study says yes and

 

      other studies are not so clear.  And you believe,

 

      also, that a class effect is uncertain or, more

 

      particularly, that the effect might not apply to

 

                                                               101

 

      certain doses and certain dose intervals even if

 

      you are inclined to believe that the class does

 

      have a problem.

 

                If you also believe that more needs to be

 

      known about the long-term use of all NSAIDs,

 

      including those that are nominally COX-2-selective

 

      and those that are not, if you believe that new

 

      COX-2-selective agents conceivably could be

 

      developed with appropriate information, and if you

 

      believe the pharmacology gives hypotheses that need

 

      to be tested, not necessarily just believed--sorry

 

      Garret--then here is what you might be able to do.

 

                Again, I am not, in any way, saying who

 

      should do this.  This will be a massive

 

      undertaking.  But it does seem to me that there is

 

      information we all collectively need as a

 

      community.  So I am calling it an ALLHAT study for

 

      anti-inflammatory drugs.

 

                This is just one of what people could

 

      dream up as what might be compared.  The drugs, it

 

      seems to me, one might think about putting in it

 

      include ibuprofen, which we think probably ought to

 

                                                               102

 

      be neutral, not bad.  It may not have the platelet

 

      effects you want.  Naproxen--I am embarrassed to

 

      say this but I am letting myself be affected by the

 

      epidemiology studies.  Naproxen sort of looks good.

 

      You might even say it is at least a placebo, but I

 

      am not quite ready to say that.

 

                Diclofenac seems a good model of a regular

 

      NSAID that is really COX-2-selective, at least to a

 

      degree.  Celecoxib possibly at more than one dose,

 

      although, maybe for caution, one would want to

 

      think about the lower dose first.  Then I have two

 

      other groups that I will be interested in people's

 

      comments on, and I am not totally sure you could

 

      bring these off.

 

                But could one include an aspirin full-dose

 

      study.  We know it is an effective agent in

 

      arthritis accompanied by a proton pump inhibitor.

 

      Now, you would have to first show that proton pump

 

      inhibitors really do block the ulcerogenic effects

 

      of aspirin.  That is a short-term study and maybe

 

      one could do that.  So I will be interested in

 

      whether people think you can bring that off.

 

                The reason for doing it is we know the

 

      effects of aspirin are not unfavorable and we think

 

      they are probably favorable in at least many

 

                                                               103

 

      populations, in populations at high risk and

 

      probably not unfavorable in people at low risk.

 

                The last one that seems worth considering,

 

      and my understanding is that, in many parts of the

 

      world, at least osteoarthritis is treated this way,

 

      to use acetaminophen plus codeine added as needed

 

      and try to do something about the constipation.

 

                That would be as close to a true placebo

 

      group as I think you can get in a setting like

 

      this.  So it seems quite interesting.

 

                It is worth saying if one had a new single

 

      agent, my suggestion, and one still thought that

 

      drugs like this should be developed, that the

 

      single agent might be compared to naproxen and I

 

      would still hope for one of the other last two

 

      comparisons as a true placebo.

 

                Obviously, these are all people who need

 

      chronic pain medications.  You would want O.A. and

 

      R.A. stratified.  I don't believe you could use the

 

                                                               104

 

      APAP group for rheumatoid arthritis but others may

 

      not agree with that.  You probably want to study a

 

      range of cardiovascular risks but you probably

 

      would want to study the lower-risk people first.

 

                The reason I say that is anyone with known

 

      coronary-artery disease really has to be given

 

      aspirin just because that is part of treatment and

 

      it isn't clear yet, to me, how aspirin interacts

 

      with the COX-2-selective drugs.  You would think it

 

      would make them unselective but the data don't seem

 

      to necessarily say that.

 

                A good question is how big the sample

 

      would have to be and that depends on what you want

 

      to find out.  If you are really trying to compare

 

      the drugs with a true placebo, they wouldn't have

 

      to be that large to rule out, say, a two-fold risk

 

      or something like that.  We have seen studies with

 

      about 1,000 per group that have distinguished

 

      between drugs.  So that is not so huge.

 

                But if you really wanted to get at whether

 

      one drug is a little bit different from another,

 

      you are talking about studies of massive kind.  I

 

                                                               105

 

      have asked various numerically qualified people and

 

      the general impression is that if you wanted to

 

      rule out a 20 or 30 percent difference, you are

 

      talking about 50,000 per group.  That is beyond my

 

      hopes even for ALLHAT 2.

 

                Obviously, the outcomes of major interest

 

      are cardiovascular death, stroke, AMI and bleeding.

 

      I have heard some thoughts that maybe heart failure

 

      should be looked at in addition but I wouldn't make

 

      that the primary endpoint.  I think you can look at

 

      that separately.

 

                A big problem is what to do about blood

 

      pressure.  My first thought was that you would

 

      monitor it and treat anything over 120 over 80, but

 

      that really isn't standard practice.  So a question

 

      I would raise is whether one could leave people to

 

      go to 130 over 90, would that be acceptable.

 

                A question one could raise is why do this

 

      at all?  Do you really need these drugs?  We have

 

      heard fairly strong feelings that G.I. intolerance

 

      is not trivial.  But my answer is more that we

 

      really don't know enough about the whole range of

 

                                                               106

 

      these drugs.  There is no question that people are

 

      going to get something for their arthritis.  I am

 

      not entirely comfortable with looking at the data

 

      and saying we know what we need to.

 

                You could sort of deduce that naproxen

 

      usually looks pretty good.  It usually beats what

 

      is there except we just heard about a study where

 

      it was a little worse.  But it is not clear where

 

      ibuprofen comes.  It doesn't show the same thing.

 

      It seems to me there is a serious population need

 

      to find out about these things and to understand

 

      more whether all selectivity is the same.

 

                We have been through diclofenac at length

 

      and it is not clear what one needs.  So I think the

 

      idea of doing a large study has weight.

 

                If you believe that it is really all

 

      settled, that cardiovascular risk is clearly

 

      increased with all of the COX-2-selective agents,

 

      ignoring for now which ones are actually selective,

 

      there still are things one might want to know.

 

                It might be of interest to do a study that

 

      still would have the ibuprofen and naproxen groups

 

                                                               107

 

      and might still have my aspirin or APAP groups.

 

      One might consider trying a celecoxib with the

 

      addition of aspirin.  I know the results of that

 

      have not shown that any adverse effect seems to be

 

      mitigated, but that still doesn't make much sense

 

      and it might be something one could still want to

 

      test.  It would seem that if you added aspirin to a

 

      selective agent, you ought to have a de facto

 

      unselective agent.  Of course, that presumes

 

      mechanism and you shouldn't presume mechanism.  You

 

      should test it.

 

                Anyway, those are my thoughts.  I think my

 

      main point is that there is really a very important

 

      need for better information on the whole array of

 

      these drugs and the kind of study needed to do that

 

      is mind-boggling large.  However, people are

 

      already undertaking studies with 25,000 and 30,000

 

      patients already.  So it is not as outlandish as I

 

      would have said it was before we started this

 

      process.

 

                Thank you.

 

                DR. WOOD:  Okay.  I am just interested,

 

                                                               108

 

      why didn't you suggest a PPI with naproxen?  For

 

      your ALLHAT study, why didn't you suggest a PPI

 

      with naproxen?

 

                DR. TEMPLE:  That is a fair question.  I

 

      think the answer on--what did I suggest it with?

 

                DR. WOOD:  With aspirin.  It doesn't

 

      matter.

 

                DR. TEMPLE:  I will tell you the reason.

 

      Full-dose aspirin is just plainly impossible to use

 

      because of massive G.I. intolerance.  I believe,

 

      historically based, it is worse than we expect with

 

      naproxen.  So I thought you had to do it there

 

      urgently.  You could do it with naproxen, too.

 

      That would be okay.

 

                I have to point out that we do not have

 

      definitive labeling or evidence that those drugs

 

      really do prevent this but we have heard about some

 

      studies that suggest it.  I do think that is an

 

      early thing to discover.

 

                DR. WOOD:  Okay.  Understood.  Let's move

 

      straight on to Bob O'Neill's presentation who also

 

      is going to do it from his seat.

 

                  Issues in Projecting Increased Risk

 

           of Cardiovascular Events to the Exposed Population

 

                DR. O'NEILL:  I won't go through the

 

                                                               109

 

      slides.  I might point your attention to a few of

 

      them.  I will try and do this in five or ten

 

      minutes.

 

                DR. WOOD:  Do you want us to have the

 

      slides up, Bob?

 

                DR. O'NEILL:  What I was asked to do is

 

      essentially provide a framework.  This is a very

 

      difficult problem of projecting risk to the

 

      population.  Very little has been published about

 

      how to do this appropriate so I was intending to go

 

      through sort of the logic and the framework of how

 

      you might think about this.

 

                It requires the integration of exposure

 

      data at the national population level and it needs

 

      information relative to how long people are on

 

      drugs and it uses information from the clinical

 

      trials as well as from the epidemiology studies to

 

      the extent that they are relevant to the question

 

      that is being asked.

 

                This is a very difficult problem.  It was

 

      not intended to give any estimate, any single

 

      number.  It was intended to show how hard it is to

 

      get there and, at the end of the day, how variable

 

      and sensitive the estimate might be to all the

 

      assumptions you have to make.

 

                                                               110

 

                So I used the Vioxx VIGOR and APPROVe

 

      studies as an example of the process that one might

 

      go through.  I made the point that event

 

      definitions and many things matter.  But I guess if

 

      there is anything that I would like people to take

 

      home is that time matters.  The hazard rate

 

      matters.  And the hazard ratio matters as a

 

      function of time when you do any of these

 

      projections.

 

                I would just recall two slides.  One would

 

      be the VIGOR study which is Slide 12 so that

 

      everybody could remind themselves and Slide 16.

 

      The VIGOR study shows a separation of curves.

 

      Behind that is what is called a hazard rate.  I

 

      believe the data supports that the escalation of

 

      the risk increases with duration of exposure. 

 

                                                               111

 

      Merck and we have talked about this in the past and

 

      sort of have different views of this, but we seem

 

      to feel that that risk does escalate.

 

                That does not mean that there is no risk

 

      in that picture early on.  I think David Graham has

 

      made this point that it may be a power issue but,

 

      nonetheless, it is what it is and I am not

 

      convinced that the epidemiological studies at this

 

      stage add anything to our knowledge about early

 

      risk for the points I made yesterday because I

 

      think time zero matters in terms of looking at the

 

      risk, in terms of how long you are on.

 

                The next slide is Slide 16 which is the

 

      APPROVe study.  Similar pattern, only delayed a

 

      year.  So instead of the curve separating at

 

      approximately six months, four months, they

 

      separate a little later on.  The idea here is that

 

      the relative risks that are summary relative risks

 

      for both of these trials, for VIGOR, for thrombotic

 

      event, it is approximately 2.28 and, for APPROVe,

 

      it is approximately 1.92 for confirmed thrombic

 

      events is an average relative risk averaged over

 

                                                               112

 

      all the time points so that the relative risk at

 

      different times is a function of time.

 

                That is an important concept when, then,

 

      you go and you look at the national projection of

 

      how many people are exposed for how long a period

 

      of time.  I won't go through that because they are

 

      in the slides.  But we have no data in the United

 

      States to do this.  So we did a projection based

 

      upon the IMS National Prescription data, another

 

      separate database that allowed us to look at how

 

      long exposure, success of exposures, might be to

 

      get an idea of how long individuals may stay on the

 

      drug.

 

                Surprisingly enough, a very small

 

      percentage of the millions of people that are

 

      prescribed the drug are on the drug for more than a

 

      year.  That is in one of the slides on the

 

      Caremark.  So what this meant is you multiply all

 

      these estimates which, essentially, are time.  We

 

      calculated a time-specific difference in absolute

 

      incidence rates for the different trials, made a

 

      projection and essentially used in that projection

 

                                                               113

 

      a number of assumptions many of which are not

 

      verifiable, and then came up with some crude

 

      estimate of what might even be an upper bound on a

 

      confidence interval for any estimate.

 

                We probably don't believe it because there

 

      is no real methodology to support that estimate but

 

      nonetheless to say that an estimate is very

 

      variable.

 

                So the bottom line, and the conclusions

 

      here, given the time frame, is that purpose of the

 

      projection effort was essentially just to

 

      provide--this is the last slide; it is Slide 47--it

 

      is essentially to provide a framework for

 

      considering how you would think about developing an

 

      estimate and to provide a range of estimates and,

 

      also, essentially, to point out that there are many

 

      limitations to any estimate that you would provide.

 

                We are not supporting any, or putting

 

      forward any,  one estimate but I do believe that we

 

      need to understand this problem by moving away from

 

      summarizing nonproportional hazards in person

 

      years.  It is not a good idea.  It begs the

 

                                                               114

 

      question as to whether the risk is constant or

 

      whether the risk is dependent on time.

 

                If there is one problem with the

 

      epidemiological literature, it constantly reports

 

      person-year risk as opposed to every one of the

 

      clinical trials we have seen presents a

 

      Kaplan-Meier curve that looks at the time-dependent

 

      risk.  Unless you understand that, you can't come

 

      to grips with comparing one drug to another.

 

                You can't come to grips with comparing a

 

      drug to itself.  If you look at the VIGOR study

 

      relative to the approved study, they are in

 

      different populations.  One is in a population of

 

      R.A.  The other is in a polyp prevention trial.

 

      One is at 50 milligrams.  The other is at 25

 

      milligrams.

 

                There are many things that need to be

 

      sorted out.  So the point here is that this is a

 

      very difficult exercise to project.  This was just

 

      a framework to say, here is how you might think

 

      about it.  Most of the estimates are fraught with a

 

      lot of danger and have to have many caveats placed

 

                                                               115

 

      on them were you to bank on any one estimate alone.

 

                That is pretty much my bottom line.

 

                DR. WOOD:  Bob, just to make sure

 

      everybody in the audience understands what you are

 

      talking about with estimates, what you are talking

 

      about are absolute numbers of people--

 

                DR. O'NEILL:  An estimate of the absolute

 

      numbers of individuals that might have been at risk

 

      and had these events if they were exposed--if they

 

      were exposed.  This is a model projection.

 

                DR. WOOD:  Right.  I just wanted to

 

      clarify that.  So it is not the relative risk.  It

 

      is not the same as what Milt was talking about.

 

                DR. O'NEILL:  Right.  Exactly.  This is a

 

      long discussion to get into the concept of

 

      attributable risk in its own right.  Given the

 

      time, I wouldn't be able to do that.

 

                DR. WOOD:  So you are talking about the

 

      number of people, these sort of numbers that are

 

      out there.

 

                DR. O'NEILL:  Right; to go through that

 

      exercise.  It is hard enough to interpret a single

 

                                                               116

 

      study or a collection of studies.  To go to an

 

      estimate of what the increased number of events

 

      might be at the exposed level is what this effort

 

      was about, all the different, five different

 

      separate interlinked but disparate databases that

 

      you would need to get there to make this kind of an

 

      estimate.

 

                DR. WOOD:  Okay.  Good.  Thanks.

 

                DR. WOOD:  We will take a few minutes, a

 

      very few minutes, for questions to the last two

 

      speakers and then we will take a break and be back.

 

      So the panel needs to remember that they are eating

 

      into their break.

 

                Dr. Nissen?

 

                DR. NISSEN:  Quickly, Bob, Bob Temple.

 

      The difficulty, of course, in the ALLHAT study is

 

      that it is very--it seems unlikely that it will get

 

      done.  So the question is, putting some constraints

 

      on this, and I thought about this last night in

 

      some detail into the wee hours of the morning, it

 

      seems to me that what we really need for this class

 

      of drugs is a reference standard.  That reference

 

                                                               117

 

      standard, unlike many studies, can't be placebo

 

      because you can't treat arthritis patients with

 

      placebo.

 

                So I would submit to you that, if you are

 

      going to do comparisons, that the reference

 

      standard, the best reference standard we have, is

 

      naproxen because we know as much about it as

 

      anything else.  We think it is, at worst, neutral

 

      and maybe a little better than neutral.

 

                So I would argue that, if you want to do

 

      ALLHAT light, then what you do is you test every

 

      agent both that stay on the market and that are

 

      proposed to bring onto the market against naproxen

 

      with an adequately sized trial and you set an upper

 

      bound, which we have to talk about, about what the

 

      upper bound of hazard you are willing to accept is,

 

      and the test that you run is on efficacy and on

 

      cardiovascular hazard.

 

                If your drug is beaten by naproxen, you

 

      don't make it.  If you can show equivalence within

 

      a reasonable upper bound of naproxen, then we would

 

      be pretty comfortable--I think I would be pretty

 

                                                               118

 

      comfortable that the drug is not going to create a

 

      hazard.

 

                What do you think about that strategy?

 

                DR. TEMPLE:  That is actually--I went

 

      through it very fast, but that is actually what I

 

      said at the bottom of one slide.  I still would

 

      like to know better whether the naproxen is less

 

      bad or is really good.  Therefore, as I said on the

 

      slide, in my heart, I would like to see somebody

 

      try to give full-dose aspirin for a while because

 

      we are really pretty sure that won't be bad.

 

                I think the community, in the long run,

 

      needs that.  Who is going to do it?  That is a

 

      perfectly good question.  I do want to point out,

 

      though, that the way some of the trials were done,

 

      like TARGET, they could have given answers on some

 

      of this, or at least closer.  But, because they did

 

      separate trials, instead of randomizing to each of

 

      the treatments, that was obscured.

 

                You could have had a very substantial

 

      naproxen-ibuprofen comparison, but you didn't get

 

      it because of the structure of the trials.  So I

 

                                                               119

 

      think it is very important to randomize to each of

 

      the treatments, obviously, whatever it is.  But

 

      that would be my best guess at the moment.  But, in

 

      line with what Alastair asked before, when you do

 

      naproxen and you are looking at G.I. effects, do

 

      you add a proton pump inhibitor?  I think you need

 

      a little more information before you do that, but

 

      you might say that, which then raises the

 

      fundamental question of how much help you get from

 

      being COX-2-selective.

 

                DR. WOOD:  Dr. Cryer?

 

                DR. CRYER:  I wanted to comment on several

 

      of the questions, Dr. Temple, that you raised as

 

      well to ask a question.  I guess I will just ask

 

      the question first.  When you say "full-dose

 

      aspirin," are you referring to full

 

      anti-inflammatory doses of aspirin, 3.9 grams a day

 

      or--okay.

 

                DR. TEMPLE:  Which I assume most people

 

      will not tolerate and there will be huge bleeding.

 

      So you have got to do something.

 

                DR. CRYER:  Right.  See, I think that is a

 

                                                               120

 

      non-practical experiment design and I think we have

 

      come a long way from 3.9 grams of aspirin per day,

 

      particularly because of the concerns of the adverse

 

      events, the silicysm, the G.I. events.  Clearly,

 

      100 percent of those people are going to have

 

      gastric ulcerations assessed endoscopically.

 

                So I also would prefer one of the newer

 

      NSAIDs, traditional NSAIDs, in that comparison.

 

                With regard to--

 

                DR. TEMPLE:  Actually, before you leave

 

      that, do you know what would happen if you added a

 

      proton pump inhibitor to aspirin?

 

                DR. CRYER:  Not at 3.9 grams a day.  I

 

      don't think anybody thought that would be a

 

      feasible design.

 

                DR. TEMPLE:  Short term, then, just to

 

      look at endoscopic ulcers.

 

                DR. CRYER:  I don't know and I don't think

 

      that it will ever be known.

 

                DR. TEMPLE:  Then I won't get the answer.

 

                DR. CRYER:  What I do know is that, if you

 

      give 3.9 grams of aspirin per day in the

 

                                                               121

 

      short-term, greater than 90 percent of your

 

      patients who take aspirin will have endoscopic

 

      ulceration.  I don't know what the effect of the

 

      PPI would be.

 

                I wanted to address your last kind of

 

      question that you threw out there of whether or not

 

      a short-term study would show that celecoxib plus

 

      80 milligrams of aspirin would have a favorable

 

      effect, a G.I. effect, compared to a non-selective

 

      NSAID.  Those experiments have been done.

 

                With respect to endoscopic ulcer, COX-2

 

      plus aspirin equals traditional NSAID.  With regard

 

      to hospitalizations, having said that, there is a

 

      recent study not yet published, epidemiologic study

 

      from Canada, indicating that COX-2 plus aspirin,

 

      hospitalizations for that are less than

 

      hospitalizations for non-selective NSAIDs plus

 

      aspirin.  Then we have outcome studies not yet

 

      fully published in the abstract form which indicate

 

      that events on COX-2 plus aspirin are similar to

 

      events on non-selective NSAID plus aspirin--G.I.

 

      events.

 

                DR. TEMPLE:  It is possible that if you

 

      add aspirin--I mean, it is sort what I would

 

      expect--is that you would get something that is a

 

                                                               122

 

      lot closer to being--in a cardiovascular sense, a

 

      lot closer to being just a regular NSAID and maybe

 

      you would still have some residual advantage in a

 

      G.I. sense.

 

                But, I must say, the data so far don't

 

      show that.  But they didn't seem definitive to me.

 

                It raises the question of--you know, the

 

      idea of COX-2 selectivity is, at least, in part, a

 

      conceptual and promotional idea.  As Garret pointed

 

      out the first day, five or six of those old drugs

 

      that aren't coxibs are COX-2-selective.  So there

 

      is a whole range.  My feeling is we need to

 

      understand the consequences of what all that means

 

      and there is a somewhat artificial separation

 

      between the coxibs and the others because those old

 

      drug at least are partially selective and may have

 

      some of the same properties.

 

                So one of my hopes that we could look at a

 

      range of these.

 

                DR. CRYER:  With respect to your last

 

      comment, I am entirely in agreement with that.

 

                DR. WOOD:  Let's move on.  Dr. Cush?

 

                DR. CUSH:  ALLHAT, I like the intention of

 

      it.  I would suggest, though, that if you are going

 

      to have a study long enough to pick up some of

 

                                                               123

 

      these events, a year or two, it is going to be

 

      very, very hard to keep O.A. patients on one of

 

      those drugs.

 

                So maybe actually stratifying according to

 

      pure COX-2-specific drugs to COX-2-selective drugs

 

      to the non-selective drugs that are more

 

      predominantly COX-1 and then having a totally

 

      nonsteroidal, non-nonsteroidal group, which would

 

      be the Tylenol group you talked to or other

 

      analgesic agents might work over the long term.

 

                DR. TEMPLE:  That would answer a lot of

 

      the questions.  My real hope--you have a better

 

      idea whether it is possible than I do--is that you

 

      could actually find a population that could be

 

      given what we are pretty sure is a

 

      cardiovascular-neutral treatment.  That is really

 

                                                               124

 

      the only way to pin this down and it does seem

 

      worth pinning down.

 

                DR. WOOD:  Dr. Hennekens?

 

                DR. HENNEKENS:  I think I gleaned from Dr.

 

      O'Neill that if we determine there is a class

 

      effect that it varies not just by drug and dose but

 

      by duration of therapy.  From Dr. Temple, the

 

      comment that--I am very attracted to the concept of

 

      what I would call a large simple trial rather than

 

      an ALLHAT trial.  I think there is merit in seeing

 

      aspirin studied in therapeutic doses and I think

 

      there is evidence that anti-inflammatory effects

 

      are seen a doses far lower than the 3.9 grams.

 

                But the question I have for Bob is there

 

      are three currently marketed FDA-approved coxibs.

 

      So would you include valdecoxib and 25 milligrams

 

      of rofecoxib in your design?

 

                DR. TEMPLE:  Part of the reason I didn't

 

      address that is I figured that is what the

 

      committee is going to talk about.  I was willing to

 

      say that the celecoxib data look funny enough so

 

      that you might consider it.

 

                DR. WOOD:  That is part of what we are

 

      going to discuss.

 

                DR. TEMPLE:  That is what you are going to

 

                                                               125

 

      discuss so I didn't address it.

 

                DR. WOOD:  Let's move that to later.  Dr.

 

      Domanski?

 

                DR. DOMANSKI:  I will pass.

 

                DR. WOOD:  Dr. Abramson?

 

                DR. ABRAMSON:  Thank you.  I want to

 

      probably say something rather naive in support of

 

      the study, Bob, and that is that we are at a moment

 

      where we can do a paradigm shift, meaning that

 

      study that you propose is an important one but it

 

      is very large and it is going to be very hard to

 

      get any resources to do that.

 

                I think we are at a moment where for the

 

      companies and the FDA and the government to think

 

      about a collaborative study where, if you have a

 

      drug that has some--this information is important,

 

      that we put together a collaboration among industry

 

      to do a multi-arm study of multiple drugs.  It is

 

      something, you know, in the osteoarthritis field,

 

                                                               126

 

      the companies have supported largely this

 

      osteoarthritis initiative through the NIH to look

 

      at outcomes in large numbers of patients.

 

                I think what we need is a similar COX-2

 

      initiative where either with the FDA or the NIH

 

      participating, with collaboration among industry,

 

      we are doing a multi-armed large study with

 

      biomarkers, with pharmacogenomics studies, with

 

      genetics and other blood pressure, but try and do

 

      it in a utopian way.

 

                I think everyone here wants to get the

 

      right answer, whether it is in industry or here at

 

      the table.  This could be a good opportunity to do

 

      something very differently than we have done before

 

      in a large trial.

 

                DR. TEMPLE:  I don't disagree at all.  I

 

      mean, some of the drugs are generic.  They don't

 

      have any company that is massively interested in

 

      them.  So it is going to be a mixture of

 

      government, generosity and a wide variety of other

 

      things that are scarce.  So I don't know how

 

      to--you noticed I didn't have a slide on how to do

 

                                                               127

 

      this.

 

                DR. WOOD:  Dr. Ilowite?

 

                DR. ILOWITE:  Just a minor point.  I

 

      understand the need for a cardiovascular-neutral

 

      anti-inflammatory drug in an ALLHAT study.  But I

 

      was a little confused because I am aware of some

 

      literature directed at people who are interested in

 

      Kawasaki disease suggesting that high-dose

 

      anti-inflammatory aspirin is actually prothrombotic

 

      because of differential effects on prostacycline

 

      and thrombotics.

 

                DR. TEMPLE:   There are aspirin studies

 

      going back to at least moderate doses that show

 

      beneficial effects.  It is not just 80 milligrams.

 

      It is certainly at least a gram a day.  Some of the

 

      early ones were more than that.  That is worth

 

      thinking about.  I am encouraged by the thought

 

      that you might be able to get away with doses less

 

      than 3 grams.  So I didn't know that it was

 

      considered prothrombotic.  I thought aspirin always

 

      looked good.  But that is not up to grams.  I don't

 

      think any of the studies have done anything like

 

                                                               128

 

      that.

 

                DR. WOOD:  We will give Dr. Fleming the

 

      last word.

 

                DR. FLEMING:  I am just debating whether

 

      to do it now or after the break.

 

                DR. WOOD:  Let me help you.  Go ahead.

 

                DR. FLEMING:  Now?

 

                DR. WOOD:  After the break will be great.

 

                DR. FLEMING:  All right.  I will wait.

 

                DR. WOOD:  We will take a break and then

 

      we will be back here in ten minutes.

 

                (Break.)

 

                DR. WOOD:  Okay, folks.  Let's get

 

      started.  The next presentation will be given by

 

      Sharon Hertz who is Deputy Director of the

 

      Division.

 

                DR. HERTZ:  Thank you.  I am just going to

 

      spend a very few minutes summarizing some of our--

 

                DR. WOOD:  Let me, in fact, just before

 

      Sharon begins--Sharon Hertz has passed out a

 

      handout that includes a lot of her slides.  In the

 

      interest of time, she has graciously agreed to

 

                                                               129

 

      delete some of these slides and just focus on a

 

      smaller subset of what is in the handout.

 

                However, the committee does have the

 

      handout and the committee may find that handout

 

      useful for referring to some of the data.

 

                DR. HENNEKENS:  Alastair, a quick comment.

 

      I want to make a quick clarification on the earlier

 

      comment about pro-inflammatory effects of high

 

      doses of aspirin.

 

                DR. WOOD:  Sorry; I missed that.  About

 

      what?

 

                DR. HENNEKENS:  In the randomized trials,

 

      135 randomized trials with over 212,000 randomized

 

      subjects, whether the doses of aspirin are 75

 

      milligrams or up to 2 grams a day, there are

 

      significant cardiovascular benefits to aspirin even

 

      at high doses.  The issue, as Bob pointed out, at

 

      the high doses, is not that there is a reversal of

 

      the benefit but that the side effects are

 

      increased.

 

                So I think that is an important point to

 

      make.

 

                DR. ILOWITE:  I just wanted to say that in

 

      pediatrics, we think of anti-inflammatory doses as

 

      100 milligrams per kilogram.  So those are the

 

                                                               130

 

      doses I was speaking of.

 

                DR. GIBOFSKY:  Finally, the high-dose

 

      aspirin that would be necessary to treat patients

 

      with rheumatoid arthritis of 3.9 grams or greater

 

      would have significant problems on the stomach, as

 

      Dr. Cryer said, significant problems on the hearing

 

      of the patient and significant problems, perhaps,

 

      on other organ systems as well.  It is not a study

 

      that could be easily undertaken.

 

                DR. HENNEKENS:  I won't debate the value

 

      of the study of 3.9 grams of aspirin but, from the

 

      perspective of anti-inflammatory effects, they have

 

      been observed at doses of 2 grams of aspirin a day

 

      and, in fact, there are randomized studies going on

 

      directly comparing that somewhat higher doses of

 

      maybe 1 to 1-and-a-half grams a day might have

 

      significant anti-inflammatory as well as

 

      anti-atherogenic effects as measured by endothelial

 

      function, nitric oxide formation and other

 

                                                               131

 

      parameters.

 

                So I don't think that the traditionally

 

      high doses are the ones that necessarily would need

 

      to be done.  But I don't want to debate whether we

 

      should be studying doses of 4 grams of aspirin.

 

                DR. WOOD:  What you are telling us,

 

      Charlie, is that you are comfortable that there is

 

      an antithrombotic effect at the high doses of

 

      aspirin.  Is that right?  Okay.  Good.

 

                Dr. Cush wants to say something.

 

                DR. CUSH:  Again, you need not

 

      anti-inflammatory doses but analgesic doses which

 

      can be substantially lower.  I do want to make a

 

      statement with regard to a study that wasn't

 

      presented here that I think is germane and we

 

      should know about it, and this is quick.  There is

 

      a very large trial that is NIH supported that is

 

      called the GATE study, glucosamine in

 

      osteoarthritis of the knee.

 

                This is a 1588 study that is completed and

 

      is currently being analyzed.  That Data Safety

 

      Monitoring Board of the study has analyzed it for

 

                                                               132

 

      cardiovascular risk because there is a Celebrex

 

      arm.  There are five arms in this 1500-patient

 

      study; placebo, Celebrex 200 milligrams once a day,

 

      glucosamine only, chondroitin sulfate only, and

 

      glucosamine and chondroitin sulfate.

 

                The outcome here, in a six-month trial, is

 

      pain reduction in osteoarthritis in the knee.

 

      Because of all this press and what not, they have

 

      looked at the safety outcomes and they have not

 

      shown any increase in cardiovascular events

 

      including M.I., any difference between the Celebrex

 

      group and the other four control groups.

 

                DR. WOOD:  Let's move on to the program.

 

      Dr. Hertz?

 

                    Summary of Meeting Presentations

 

                DR. HERTZ:  There are now several versions

 

      of my slides around and you are free to look at

 

      whichever interests you.  There is one correction

 

      on the lumeracoxib slides from the original set

 

      where I substituted the word diclofenac for

 

      ibuprofen.  So those of you looking at those slides

 

      just be aware of that, please.

 

                What I am really just going to do now is

 

      just focus down again some of the reasons why we

 

      are here.  This would not be the current slide set.

 

                                                               133

 

      Any help here?

 

                Looking at the most recent set that were

 

      handed out, and we will just work from there

 

      because there is not a lot of data anymore to

 

      present, but, basically, I want to just point out

 

      that we are here because we do recognize that pain

 

      drugs are critically important, that the

 

      COX-2-selective NSAIDs have been extensively

 

      studied and there are, over time, studies that

 

      revealed new potential uses as well as new risks.

 

                We need to determine how we feel about

 

      these risks.  Are they limited to individual

 

      products?  Are they applicable across the group of

 

      COX-2 selectives and how far does this extend to

 

      the nonselective anti-inflammatories.

 

                There is a slide that describes--

 

                DR. WOOD:  Sharon, apparently everybody

 

      has hard copies of your slides.

 

                DR. HERTZ:  Right.

 

                DR. WOOD:  So if you want to just go

 

      through them and refer to the slide number, that

 

      would probably be helpful to people.

 

                DR. HERTZ:  Okay.  If we go to the third

 

      slide, you can get a sense of the sizes of the

 

      databases that were presented in the individual

 

                                                               134

 

      reviewer descriptions of FDA reviews.

 

                A couple of points.  The numbers there

 

      reflect predominantly patients on the drug of

 

      interest as opposed to the entire database.  The

 

      outcome studies are more reflective of the entire

 

      populations including comparators.  These drugs

 

      were assessed and have been assessed over time in

 

      fairly large numbers of patients.

 

                I think it is useful to note that we have

 

      not approved, in this country, all of the

 

      COX-2-selective NSAIDs that have come to us in

 

      applications for a variety of reasons.  Some of

 

      these may be related to cardiovascular-risk

 

      assessment.  Some may be related to

 

      non-cardiovascular-risk assessment which we really

 

      haven't gotten into in this setting.

 

                In addition, you may also note that

 

      parecoxib has not yet been approved in this country

 

      although it has been approved elsewhere.  So I

 

      think that we have a lot of issues to consider with

 

      these products.

 

                When we reviewed the studies that have

 

      been presented, we see that there is some increased

 

      risk for cardiovascular events but one of the key

 

      issues here is that the results are not consistent

 

                                                               135

 

      across studies and across situations.  We also have

 

      seen that there is risk that is being associated

 

      with some of the nonselective products.

 

                So we have a story of conflicting data.  I

 

      am up the Slide 5.  We have data that has been

 

      present across short- and long-term studies, the

 

      epidemiologic studies.  The challenge is to compare

 

      across populations, across comparators.  It is

 

      striking that sometimes very similar study designs

 

      have very different results.

 

                It is possible there is more than one

 

      mechanism.  Again, the data has been inconsistent

 

      with the NSAIDs.  We also have conflicting

 

                                                               136

 

      information coming back on what occurs in the

 

      context of concurrent aspirin use.  It is really

 

      unclear if aspirin use has a truly meaningful

 

      effect on whether there is any G.I. benefit of the

 

      COX-2-selective products.  That has not been clear

 

      either.

 

                I have been asked to point out that, in

 

      addition, time to onset of risk is something that

 

      we need to consider very importantly, too, which,

 

      again, is something that is evident when we look at

 

      the study data and important in our deliberations

 

      for this.

 

                So, in spite of this conflicting data and

 

      the many questions, we have to move forward.  We

 

      have to determine what the role of approved

 

      products are on the market today, what additional

 

      studies are necessary, what studies would be most

 

      helpful.

 

                I am going to summarize and combine some

 

      of the questions that we have posed.  These are

 

      questions we dearly would like input from the

 

      committee.  To start, if we think about the first

 

                                                               137

 

      three questions, does the available data support a

 

      conclusion that celecoxib, rofecoxib and valdecoxib

 

      significantly increase the risk of cardiovascular

 

      events.  Does the overall risk-versus-benefit

 

      profile for each of these support marketing in the

 

      U.S.  If yes, in whom?  And which of the potential

 

      benefits of celecoxib or the others outweigh the

 

      potential risks and what actions would you

 

      recommend that we consider implementing to ensure

 

      safe use?

 

                I think it is also important to understand

 

      that some of these answers are going to depend on

 

      if we think that this is a fairly uniform class

 

      effect and, if not, we are going to have weigh the

 

      amount of information available for each of the

 

      products.  It is not the same.  We don't have the

 

      longer outcome studies, for instance, with

 

      valdecoxib at this point.

 

                Question 4 asks if the available data

 

      support a conclusion that one or more of the

 

      COX-2-selective agents increase the risk of

 

      cardiovascular events and what is the role of

 

                                                               138

 

      concomitant aspirin in attempting to mitigate that

 

      risk.  What additional clinical trials or

 

      observational studies, if any, would you recommend

 

      as essential for us to further evaluate celecoxib,

 

      rofecoxib and valdecoxib?

 

                What about to further evaluate the

 

      potential G.I. benefits for these same products?

 

      Would you recommend that the labeling for these

 

      products include information regarding the absence

 

      of long-term controlled clinical-trial data

 

      assessing potential cardiovascular effects and if

 

      you have a recommendation for how that should be

 

      conveyed in terms of warnings, boxes and such.

 

                What additional trials would be essential

 

      to evaluate the nonselective nonsteroidal

 

      anti-inflammatory drugs particularly with respect

 

      to cardiovascular risk?  Similarly, what will now

 

      become essential for products under development

 

      prior to approval to help gain approval?

 

                We have to determine what studies would be

 

      necessary to evaluate the cardiovascular risk of

 

      these products and how much information do we need

 

                                                               139

 

      to know about the gastrointestinal risk?  If

 

      preapproval studies recommended as essential do not

 

      demonstrate an increased risk for a cardiovascular

 

      event, how would you propose the FDA handle that

 

      information in the labeling?  Would the absence of

 

      a cardiovascular-risk signal preclude the need for

 

      any warnings or precautions in the labeling of a

 

      new product or should we rely more on a class

 

      warning or precaution in the absence of a signal of

 

      increased risk in the preapproval databases?

 

                If you think a class warning is

 

      appropriate, please advise with particular

 

      attention to whether you recommend it apply to all

 

      NSAIDs or only COX-2-selective NSAIDs.

 

                So I want to thank everybody here for

 

      their time and their commitment to helping us

 

      through this extremely challenging program and we

 

      really look forward to hearing your deliberations

 

      and your recommendations.

 

                Thank you.

 

                DR. WOOD:  Thank you very much.

 

                The companies have also asked for two

 

                                                               140

 

      minutes to respond.  We all heard the rules

 

      yesterday so it is two minutes.  Microphone gets

 

      turned off two minutes later and just keep moving.

 

                           Sponsor Responses

 

                DR. HARRIGAN:  Could I have Slide No. 1.

 

      This is Harrigan from Pfizer.  What I would like to

 

      do is first to summarize what we know about

 

      celecoxib and what we think that tells us about the

 

      benefit:risk equation for that drug.

 

                I make the point in this slide about

 

      Celebrex being extensively studies and to remind

 

      the committee of the contrast of the very widely

 

      used nonspecific NSAIDs.  On the next point, we see

 

      that efficacy has been demonstrated in arthritis

 

      pain and familial adenomatous polyposis.  Our

 

      prescription data and observational study data tell

 

      us that approximately three-quarters of patients

 

      who are taking celecoxib are receiving daily doses

 

      of 200 milligrams or less.

 

                Celebrex does have a favorable G.I. safety

 

      profile, a point emphasized by the very relevant

 

      G.I. safety findings that we heard about this

 

                                                               141

 

      morning from ADAPT compared to over-the-counter

 

      doses of naproxen.

 

                Cardiovascular risk was not detected in

 

      the setting of treating arthritis patients

 

      understanding all the caveats about that data that

 

      we have heard over the past two days.  In APC, an

 

      increase in cardiovascular risk was reported

 

      apparently in a dose-related pattern.  In contrast,

 

      two additional long-term placebo-controlled trials

 

      did not find evidence of increased cardiovascular

 

      risk at daily doses of 400 milligrams.

 

                The comment about the ADAPT findings is

 

      supported by the initial announcements from

 

      National Institute of Aging.  We await that data

 

      with great interest, particularly given the size,

 

      the duration in the elderly population study which

 

      would lead us to believe, expect, that the number

 

      of events in that trial will exceed the number of

 

      events in either or both of the other two trials

 

      combined.

 

                The final ADAPT data and the polyp

 

      efficacy data will make significant contributions