1

 

                DEPARTMENT OF HEALTH AND HUMAN SERVICES

 

                      FOOD AND DRUG ADMINISTRATION

 

                CENTER FOR DRUG EVALUATION AND RESEARCH

 

 

 

 

 

 

 

 

 

 

 

 

                            JOINT MEETING OF

 

                  THE ARTHRITIS ADVISORY COMMITTEE AND

 

                  THE DRUG SAFETY AND RISK MANAGEMENT

 

                           ADVISORY COMMITTEE

 

 

                               VOLUME II

 

 

 

 

 

 

 

 

 

 

                      Thursday, February 17, 2005

 

                               8:00 a.m.

 

 

 

 

 

 

 

                          Hilton Gaithersburg

                           620 Perry Parkway

                         Gaithersburg, Maryland

                                                                 2

 

                        P A R T I C I P A N T S

 

      Alastair J.J. Wood, M.D., Chair

      Kimberly Littleton Topper, M.D. Executive Secretary

 

      ARTHRITIS ADVISORY COMMITTEE MEMBERS

 

      Allan Gibofsky, M.D., J.D., Chair

      Joan M. Bathon, M.D.

      Dennis W. Boulware, M.D.

      John J. Cush, M.D.

      Gary Stuart Hoffman, M.D.

      Norman T. Ilowite, M.D.

      Susan M. Manzi, M.D., M.P.H.

 

      DRUG SAFETY AND RISK MANAGEMENT ADVISORY COMMITTEE

      MEMBERS

 

      Peter A. Gross, M.D., Chair

      Stephanie Y. Crawford, Ph.D., M.P.H.

      Ruth S. Day, Ph.D.

      Curt D. Furberg, M.D., Ph.D.

      Jacqueline S. Gardner, Ph.D., M.P.H.

      Eric S. Holmboe, M.D.

      Arthur A. Levin, M.P.H., Consumer Rep.

      Louis A. Morris, Ph.D.

      Richard Platt, M.D., M.Sc.

      Robyn S. Shapiro, J.D.

      Annette Stemhagen, Dr.PH., Industry Rep.

 

      FDA CONSULTANTS (VOTING)

 

      Steven Abramson, M.D.

      Ralph B. D'Agostino, Ph.D.

      Robert H. Dworkin, Ph.D.

      Janet Elashoff, Ph.D.

      John T. Farrar, M.D.

      Leona M. Malone, L.C.S.W., Patient Rep.

      Thomas Fleming, Ph.D.

      Charles H. Hennekens, M.D.

      Steven Nissen, M.D.

      Emil Paganini, M.D., FACP, FRCP

      Steven L. Shafer, M.D.

      Alastair J.J. Wood, M.D. (Meeting Chair)

                                                                 3

 

                  P A R T I C I P A N T S (Continued)

 

      FDA CONSULTANTS (NON-VOTING)

 

      Byron Cryer, M.D. (Speaker and Discussant)

      Milton Packer, M.D. (Speaker only)

 

      NIH PARTICIPANTS (VOTING)

 

      Richard O. Cannon, III, M.D.

      Michael J. Domanski, M.D.

      Lawrence Friedman, M.D.

 

      GUEST SPEAKERS (Non-Voting)

 

      Garret A. FitzGerald, M.D.

      Ernest Hawk, M.D., M.P.H.

      Bernard Levin, M.D.

      Constantine Lyketsos, M.S., M.H.S.

      FDA (CDER)

 

      Jonca Bull, M.D.

      David Graham, M.D., M.P.H.

      Brian Harvey, M.D.

      Sharon Hertz, M.D.

      John Jenkins, M.D., F.C.C.P.

      Sandy Kweder, M.D.

      Robert O'Neil, Ph.D.

      Joel Schiffenbauer, M.D.

      Paul Seligman, M.D.

      Robert Temple, M.D.

      Anne Trontell, M.D., M.P.H.

      Lourdes Villalba, M.D.

      James Witter, M.D., Ph.D.

      Steven Galson, M.D.

      Kimberly Littleton Topper, M.S., Executive

      Secretary

                                                                 4

 

                            C O N T E N T S

 

      Call to Order:

                Alastair J.J. Wood, M.D., Chair                  5

 

      Conflict of Interest Statement:

                Kimberly Littleton Topper, M.S.                  5

 

      Interpretation of Observational Studies

      of Cardiovascular Risk of Non-steroidal Drugs

                Richard Platt, M.D., M.S.                        8

 

      Review of Epidemiologic Studies on

      Cardiovascular Risk with Selected NSAIDs

                David Graham, M.D., M.P.H.                      37

 

      Committee Questions to Speakers                           89

 

                          Arcoxia (etoricoxib)

                      Merck Research Laboratories

 

      Sponsor Presentation

                Sean P. Curtis, M.D.                           152

 

      FDA Presentation

                Joel Schiffenbauer, M.D.                       189

 

                              Lumiracoxib

                        Novartis Pharmaceuticals

 

      Sponsor Presentation

      Introduction

                Mathias Hukkelhoven, Ph.D.                     201

 

      Gastrointestinal and Cardiovascular Safety

      of Lumiracoxib, Ibuprofen, and Naproxen

                Patrice Matchaba, M.D.                         205

 

      Open Public Hearing                                      236

 

      FDA Presentation (Lumiracoxib)

                Lourdes Villalba, M.D.                         336

 

      Committee Questions to Speakers                          346

 

      Committee Discussion                                     410

 

                                                                 5

 

                         P R O C E E D I N G S

 

                             Call to Order

 

                DR. WOOD:  Let's get started and welcome

 

      back to another day.  We are going to begin as on

 

      the agenda seeing we worked late last night.

 

                A couple of housekeeping things first.  As

 

      they say in the movie theater, please turn off your

 

      cell phones. We don't have the one that sort of,

 

      you know, spars you into space if you do that, the

 

      ejector seat, but then please don't answer your

 

      calls in here, so we don't have to hear the

 

      beginning of your conversation.

 

                Kimberly, are you going to read the

 

      conflict of interest?  Okay.  Go ahead.

 

                     Conflict of Interest Statement

 

                MS. TOPPER:  The following announcement

 

      addresses the issue of conflict of interest with

 

      respect to this meeting and is made as part of the

 

      record to preclude even the appearance of such.

 

                Based on the agenda, it has been

 

      determined that the topics of today's meeting are

 

      issues of broad applicability and there are no

 

                                                                 6

 

      products being approved.  Unlike issues before a

 

      committee in which a particular product is

 

      discussed, issues of broader applicability involved

 

      many industrial sponsors and academic institutions.

 

      All special government employees have been screened

 

      for their financial interests as they may apply to

 

      the general topics at hand.

 

                To determine if any of the conflict of

 

      interest existed, the agency has reviewed the

 

      agenda and all relevant financial interests

 

      reported by the meeting participants. The Food and

 

      Drug Administration has granted general matter

 

      waivers to the special government employees

 

      participating in this meeting who require a waiver

 

      under Title 18, United States Code Section 208.

 

                A copy of the waiver statements may be

 

      obtained by submitting a written request of the

 

      agency's Freedom of Information Office, Room 12A-30

 

      of the Parklawn Building.

 

                Because general topics impact so many

 

      entities, it is not practical to recite all

 

      potential conflicts of interest as they apply to

 

                                                                 7

 

      each member, consultant, and guest speaker.  FDA

 

      acknowledges that there may be potential conflicts

 

      of interest, but because of the general nature of

 

      the discussions before the committee, these

 

      potential conflicts are mitigated.

 

                With respect to FDA's invited industry

 

      representative, we would like to disclose that Dr.

 

      Annette Stemhagen is participating in this meeting

 

      as a non-voting industry representative acting on

 

      behalf of regulated industry.

 

                Dr. Stemhagen's role on this committee is

 

      to represent industry interests in general, and not

 

      any one particular company.  Dr. Stemhagen is vice

 

      president of Strategic Development Services for

 

      Covance Periapproval Services, Inc.

 

                In the event that the discussions involve

 

      any other products of firm not already on the

 

      agenda for which FDA participants have a financial

 

      interest, the participants involved and their

 

      exclusion will be noted for the record.

 

                With respect to all other participants, we

 

      ask in the interest of fairness that they address

 

                                                                 8

 

      any current or previous financial involvement with

 

      any first whose products they may wish to comment

 

      upon.

 

                Thank you.

 

                DR. WOOD:  Thank you.

 

                Let's go right to the first speaker, Dr.

 

      Platt, who is going to tell us about observational

 

      studies.

 

               Interpretation of Observational Studies of

 

               Cardiovascular Risk of Nonsteroidal Drugs

 

                       Richard Platt, M.D., M.S.

 

                DR. PLATT:  Thanks.  The framers of the

 

      meeting thought it would be useful at this point to

 

      have a discussion about observational studies to

 

      put us all on the same page.

 

                There was a view by some that the

 

      expertise around the table might be uneven and it

 

      would be worthwhile to have some discussion about

 

      some of the basics.  It is clear that that is not

 

      the case.

 

                I realize that a number of the people here

 

      have written a book and several of my teachers are

 

                                                                 9

 

      here, so to that extent, I think we can either make

 

      this a quick discuss or use this as an opportunity

 

      for a real interactive discussion, because there

 

      are some hard questions here and no matter how we

 

      sort we out, we are going to be left with less than

 

      in the way of firm answers than we would like.

 

                I also understand that there is a point of

 

      view that says that there are lies, damn lies, and

 

      observational studies, so part of what I think is

 

      worth doing is using this time maybe to take our

 

      temperature about whether and under what

 

      circumstances we can put weight on observational

 

      studies.

 

                We saw a version of this slide last night

 

      actually in the last presentation about why perform

 

      observational studies at all, because I subscribe

 

      to the general view that all things being equal, a

 

      clinical trial, a randomized trial is more

 

      credible, provides more information than an

 

      observational study.

 

                The problem is all things aren't always

 

      equal and so there are reasons to ask what we can

 

                                                                10

 

      learn from observational studies.

 

                I think the most important of them is no

 

      matter how well a clinical trial is designed, the

 

      individuals who are recruited and consented to a

 

      clinical trial are inherently going to be different

 

      from the actual population of users, and if we want

 

      to understand how an agent performs among real

 

      users in the way they actually use the drug, then,

 

      I think there is no escape but to look to

 

      observational studies.

 

                Additionally, observational data is by

 

      definition there, so when a pressing question

 

      arises, sometimes observational data is the first

 

      way we can get insight into the relationship

 

      between the drugs we care about and the exposures.

 

                I think in that regard, these studies can

 

      often be thought of as helping us identify the

 

      areas in which it would be most fruitful to invest

 

      in full-blown randomized trials.  We will never

 

      live in a world where we are able to do all the

 

      randomized trials we care about.

 

                I know that Charlie Hennekens' landmark

 

                                                                11

 

      randomized trial of aspirin was preceded by, as I

 

      recollect Charlie, a large number of observational

 

      trials, it made you think that it was reasonable to

 

      do those randomized trials, so observational

 

      studies can be useful in that regard.

 

                Finally, when we are talking about trying

 

      to understand effects that are relatively unusual,

 

      we stress even the largest clinical trials.  We

 

      talked yesterday about the fact that the most

 

      recent drug approvals have used much larger

 

      populations in the NDA phase than had been studied

 

      in the old days, and yet they are still small

 

      compared to the numbers needed to parse out

 

      relatively small differences.

 

                There are a lot of different kinds of

 

      observational trials.  I have listed a few of the

 

      most common.  The ones between the lines here are

 

      the ones that are really the subject for discussion

 

      here.

 

                Tom Fleming made the absolutely correct

 

      and somewhat counterintuitive point that it is

 

      often more difficult to do good observational

 

                                                                12

 

      studies of relatively common outcomes than rare

 

      ones, and because of that, the group of studies

 

      that I think at least are reasonable to consider

 

      for looking at relatively common outcomes are

 

      case-control studies, nested case-control studies

 

      and cohort studies.

 

                We have examples of each in the materials

 

      that have been handed to us.  The study by Kimmel

 

      is a pretty traditional case-control study.  The

 

      studies by Ray are cohort studies, as is the Aramis

 

      study.  The study by Dave Graham, the Solomon study

 

      are nested case-control studies.

 

                Just as a quick reminder, the

 

      distinguishing feature of cohort studies is the

 

      fact that the study population is defined on the

 

      basis of whether people are exposed to the drug or

 

      not, and then we look forward to what happens to

 

      them.  In that way, they are exactly comparable to

 

      clinical trials, with the big difference that the

 

      assignment to drug is not randomized.

 

                The strengths of those compared to

 

      case-control studies are you have a reasonable shot

 

                                                                13

 

      at the outset of selecting individuals who are

 

      representative of the group that you are trying to

 

      study, and if you organize the study properly, you

 

      have a reasonably good chance of getting unbiased

 

      exposure assessments.

 

                The weaknesses, particularly of

 

      observational cohort studies is that just because

 

      individuals had the right drug exposure at the

 

      outset, they may change that.  You can deal with

 

      that with an intention-to-treat design, but you pay

 

      for a price for that, and in observational studies,

 

      loss to followup is a big problems.

 

                We are particularly plagued by that

 

      because the large majority of the observational

 

      studies we are working in are ones that use

 

      administrative data from one sort of health plan or

 

      another, and individuals move in and out of health

 

      plans, so that it becomes difficult to follow them

 

      over time.

 

                Case-control studies, remember are ones

 

      that start with individuals who have the outcome we

 

      care about, myocardial infarction or myocardial

 

                                                                14

 

      infarction and sudden death, and compares them to

 

      individuals who haven't had that experience, then,

 

      you look back and ask what their drug exposures

 

      are, the reasons for doing those studies are that

 

      they are, first of all, very efficient studies.

 

                You don't have to study thousands and

 

      thousands. You can study as many cases as you find

 

      and a reasonable number of controls, and you can

 

      look back and classify exposure however is most

 

      useful, and that is a very convenient and versatile

 

      feature of case-control studies.

 

                The big weaknesses are that it is very

 

      hard to assure oneself that the cases and the

 

      controls are really representative of the

 

      populations that you care about, and for

 

      conventional case-control studies, for instance,

 

      the study by Kimmel that we are going to look at,

 

      it takes a lot of work to be sure that people who

 

      know what they have already experienced an MI don't

 

      differentially report their exposure to the drugs

 

      that we care about.

 

                That can be for all sorts of reasons and

 

                                                                15

 

      it might not even be wrong, but the individual who

 

      has had an MI and might be just thinking harder

 

      about whether he or she had been exposed to a drug

 

      that we care about.

 

                By the way, nested case-control studies,

 

      for instance, the study that David Graham did is a

 

      hybrid that really, in my view, draws many of the

 

      strengths from both designs, that is, because

 

      nested means the case-control study is nested in a

 

      defined population, so it has a lot of the

 

      strengths of cohort studies and some of the

 

      efficiencies of the case-control studies.

 

                The differences between the observational

 

      studies and randomized studies are pretty clear.

 

      Randomized trials have the tremendous advantage

 

      that there is lots more reason to expect the

 

      treated and untreated groups to be comparable to

 

      one another.

 

                There is a lot more opportunity to be sure

 

      that the outcome assessment and adherence to

 

      treatment are good or at least well known, and we

 

      have reviewed the difference for the observational

 

                                                                16

 

      studies.

 

                I think it is worth making the point that

 

      there are a substantial number of similarities

 

      between observational and randomized studies.  Just

 

      because we randomize individuals in randomized

 

      studies, it doesn't mean that the treated and

 

      untreated groups are comparable.

 

                We talked about a study yesterday that was

 

      a randomized trial where there was a substantial

 

      imbalance in important risk factors.  So, it is

 

      incumbent no matter what kind of study you do, I

 

      think to look for comparability, and both studies

 

      have as potential weaknesses that there are risks

 

      of false positive results and doing subgroup

 

      analyses and multiple comparisons increases that

 

      risk.

 

                We talked a fair amount about that

 

      yesterday, and both are at risk for false negative

 

      results.  That can be partly because the studies

 

      may not be powered well enough either because there

 

      is insufficient sample size or individuals aren't

 

      studied for a long enough duration to see the

 

                                                                17

 

      biological effects that we care about, or a

 

      vulnerable group just isn't included.

 

                That is a problem with both kinds of

 

      studies and I think all studies have to be

 

      evaluated on their own merits, so let's just step

 

      through the various places where observational

 

      studies might be into trouble or at least the

 

      things that need careful assessment when we look at

 

      these studies.

 

                The first is are we studying the right

 

      outcomes. It is essentially impossible in any of

 

      these observational studies to use the kind of

 

      rigorous adjudication that is a hallmark of the

 

      randomized study, so I think we are going to have

 

      to ask ourselves are these outcomes good enough.

 

                The several kinds of outcomes in the

 

      studies that we have been asked to look at are

 

      hospitalized MIs.  The case-control study by Kimmel

 

      uses survivors.  It had to use survivors because

 

      they were collecting the exposure information by

 

      interview after the individuals had left the

 

      hospital, so if we care about all MIs, then, that

 

                                                                18

 

      study isn't going to tell us what we want to know.

 

                Some of the studies use MI and

 

      out-of-hospital sudden death by linking to vital

 

      statistics records.  I think that is probably the

 

      closest we can get in observational studies to the

 

      intention-to-treat all outcome designs of the

 

      randomized trials, and some of the studies use

 

      composite designs.

 

                You have to ask are these outcomes

 

      measured appropriately.  Most of the studies that

 

      we are looking at use some form of automated

 

      medical record or claims data that have been, in my

 

      view, reasonably well validated.  That is, there is

 

      a moderate literature showing that claims data are

 

      not so bad for studying acute myocardial

 

      infarction. They have sensitivities in the 90s and

 

      positive predictive values in the 90s.

 

                So, they are not perfect and I think we

 

      will have to ask as we review the studied can the

 

      amount of uncertainty that we know exists in those

 

      account for the effects that we see, or could they

 

      obliterate effects that we would like to see and

 

                                                                19

 

      which aren't there.

 

                My sense is that that is probably not a

 

      sufficient explanation to dismiss the studies that

 

      we are looking at. The issue of bias is one that I

 

      think always has to live as a sub-text, but quite

 

      frankly, in the studies that do outcomes in the way

 

      we have been describing, I don't think that is a

 

      serious problem.

 

                For cohort studies, we have to ask are we

 

      studying the right population, and here I think we

 

      really do have to stop and ask carefully.  One is

 

      are these people selected from the population under

 

      study.  I think in most of these examples, they are

 

      reasonably representative, that is, a study of the

 

      people of Ontario or members of a large health

 

      plan.

 

                I think that the data systems that are

 

      used to identify the individuals in the cohort are

 

      good enough to give us reasonable belief that we

 

      are identifying either all the people or a

 

      representative sample of them.

 

                I think there is a fair question of

 

                                                                20

 

      whether they are representative of the larger

 

      population.  We could ask are health plan members

 

      systematically different from the general

 

      population of individuals who are taking these

 

      medications.

 

                The range of studies we have include

 

      health plan members.  I think that there is

 

      reasonable information that they probably are

 

      representative, at least with respect to the drug

 

      myocardial infarction outcomes that are studied.

 

      Studies in Medicare and population-based studies,

 

      such as those in Canada, I think also give us

 

      reason to think that they are representative.

 

                But there is an important consideration

 

      about whether there are issues about the way

 

      clinicians practice in those setting that might

 

      have a serious impact on selecting individuals.  In

 

      particular, to the extent that formularies are

 

      restrictive of, say, newer or more expensive drugs

 

      like the COX-2 inhibitors, but I think we have to

 

      ask very carefully whether the factors that would

 

      influence the prescribing of one class of drugs

 

                                                                21

 

      over another is likely to seriously impact the risk

 

      of these outcomes.

 

                Additionally, if there are cost

 

      differentials for these drugs, it may be that there

 

      is some form of self-selection that causes

 

      individuals who are sicker to receive these drugs,

 

      and I think that it is incumbent on us to expect

 

      that to be a problem in every one of these

 

      observational studies and to ask how well do these

 

      studies do in adjusting for that.  I will circle

 

      back to that in a moment.

 

                I think we have to be concerned about

 

      whether we are studying people who have had prior

 

      NSAID exposure, in which case we would be worried

 

      about survivor biases, of finding the individuals

 

      who are relatively immune to these problems.

 

                Finally, there are study design issues

 

      about whether there are restrictions of eligibility

 

      that might importantly color the data.  For

 

      instance, at least one of the studies we are

 

      looking at requires individuals to have received at

 

      least two dispensings of a nonsteroidal agent in

 

                                                                22

 

      order to be eligible.

 

                That means that you have to live long

 

      enough to have two dispensings, so it certainly

 

      doesn't tell us anything about the early effects of

 

      these drugs, and it might in an important way color

 

      the results with regard to later exposure.

 

                There is an important question which is

 

      not unique to the observational studies, which is

 

      who are the right comparators.  We had a number of

 

      discussions about that yesterday.  I think that all

 

      the issues that we discuss with regard to the

 

      clinical trials are applicable here.  In

 

      particular, there is a lot of reason to want to

 

      compare to other nonsteroidal users because that

 

      gives the best chance of having a group that is

 

      similar with regard to underlying disease status

 

      and presumably risk of myocardial infarction.

 

                Similarly, it is possible to say that if

 

      you really care about COX-2 selective agents, you

 

      should compared one COX-2 selective agent to

 

      another.

 

                That leaves us in the uncomfortable

 

                                                                23

 

      situation of not knowing what is the risk compared

 

      to no use at all, so we have some comparisons that

 

      do look at non-users or at least remote users, and

 

      that has its strengths.  It has the big weakness,

 

      of course, of putting us at risk of making

 

      comparisons against groups that are unrelated.

 

                So, we are really talking here of mostly

 

      about a study like the Kimmel study, not the nested

 

      case-control study.  The other kinds of concerns

 

      that raise red flags are the real concern about

 

      losing cases who make the group who are studied

 

      unrepresentative.

 

                I would point out to you, for instance,

 

      that in the Kimmel study, only half of the MI

 

      survivors who were identified were actually

 

      interviewed and therefore part of the formal

 

      analysis.

 

                We already talked about the fact that

 

      since that study was limited to MI survivors, that

 

      restricts us to a less serious set of outcomes.

 

                The other problem that really bedevils

 

      conventional case-control studies is knowing

 

                                                                24

 

      whether the group of people who are selected as

 

      comparators are really comparable.

 

                I think that is one of the reasons that

 

      there is so much interest in doing nested case

 

      control studies, because at the end of the day it

 

      is really extremely difficult to satisfy oneself

 

      that controls really are appropriate.

 

                Much of what we need to be concerned about

 

      in these studies is understanding exposures.  Part

 

      of the issue is understanding how to characterize

 

      exposure.  This is both a strength and a weakness

 

      of these studied.

 

                You will remember I made the point at the

 

      outset that if we want to understand how drugs work

 

      in actual practice, that we have to do

 

      observational studies.  On the other hand, that

 

      means we have to find a reasonable way to

 

      characterize these drugs.

 

                We talked yesterday I think about all the

 

      important issues of understanding whether we had to

 

      look at absolute dose or cumulative effects or

 

      whether the effects start early or whether they

 

                                                                25

 

      start late.

 

                I think that the best of the studies that

 

      we are looking at tackle a number of these issues.

 

      I will mention in a minute some of the ways that

 

      these studies have gone about that.

 

                I think in terms of ascertaining exposure,

 

      it is probably reasonable to put the most reliance

 

      on the studies that use administrative databases of

 

      pharmacy dispensing, but I will just make the point

 

      that we have to be clear that these studies are

 

      done in situations where we have reason to expect

 

      that the administrative databases are correct.

 

                I think all the studies we are reviewing

 

      are ones where the investigators were careful to

 

      know that the individuals really had a drug benefit

 

      that was operating at the moment, that would likely

 

      find the prescription drug exposures that we care

 

      about, but as a general proposition, you can't

 

      assume that that is the case.

 

                Most health plans have some kind of

 

      restrictions on benefits that might lead

 

      individuals to change their benefit status, so

 

                                                                26

 

      there would be periods of time when we might know

 

      that they had an MI, and we might not know that

 

      their drug exposure is at the moment.

 

                I will return to a point that we touched

 

      on yesterday, which is that although almost all of

 

      the studies that we are talking about report their

 

      results as relative risks, a 2-fold increase in

 

      risk, a 70 percent decrease in risk.  What we

 

      really care about is the absolute difference in

 

      risk.

 

                So, that is not different between

 

      observational studies and randomized studies, but I

 

      think it is really a critical piece of our thinking

 

      about the problem that we are dealing with.

 

                The second thing that is just worth

 

      recalling is that when we talk about a 95 percent

 

      confidence interval, that our expectation about

 

      where the true value lies is not uniformly

 

      distributed over that interval.

 

                Our best guess about where the true value

 

      lies is around the point estimate, and if that

 

      point estimate is wrong, the large majority of the

 

                                                                27

 

      uncertainly is pretty close to that point estimate,

 

      so that it is particularly not helpful, in my view,

 

      to pay enormous attention to p values.

 

                The difference between a p value of 0.05,

 

      as shown here, and a p value of 0.01 and a p value

 

      of 0.13 is not all that enormous in terms of the

 

      biological impact.

 

                I think one of the things that is a

 

      particular concern that we need to pay attention to

 

      in these studies is the fact that it is easy to

 

      look at a lot of different comparisons, and to the

 

      extent that we do that, we are going to have to

 

      just be careful to know that the strength of any

 

      one comparison is weaker than it appears to be.

 

                For instance, this is a quote from one of

 

      the studies that we are looking at.  We undertook

 

      an observational study examining the association

 

      between rofecoxib, celecoxib, other nonsteroidals

 

      and myocardial infarction.

 

                Well, there is no primary hypothesis

 

      there, and the results for all of the

 

      nonsteroidals.  They are all interesting to look

 

                                                                28

 

      at, they are all associated with p values.  Those p

 

      values are all relatively too extreme given the

 

      fact that there are so many comparisons.

 

                It is a problem for randomized trials.  We

 

      talked about subgroup analyses.  It is important to

 

      do those studies, those subgroup analyses, but

 

      absent having specified a principal hypothesis at

 

      the outset, I think that we have difficulties in

 

      knowing how much weight to put on any particular

 

      one.

 

                We talked a lot about confounding.  That

 

      is one of the most important concerns in randomized

 

      trials.  I know you all know what confounding is.

 

      It wasn't obvious to me when I was making these

 

      slides that everyone knew that, but the example, so

 

      that we have it in mind is if what we know is drug

 

      A versus drug B, and MI or no MI, and we don't take

 

      into account important confounders, we can get

 

      importantly incorrect results.

 

                So, here is an example of an aggregate

 

      analysis with a relative risk of 1.5 among 2,000

 

      people who are exposed to two drugs.  If you break

 

                                                                29

 

      it apart and see that in the high-risk group, drug

 

      A accounted for 80 percent of the exposure, and in

 

      the low-risk group, drug B accounted for 80 percent

 

      of the exposure, you see that in each of those two

 

      categories, the high-risk group and the low-risk

 

      group, that, in fact, there is no association

 

      between drug and outcome, but you have to take them

 

      apart to do that.

 

                Well, the good news is if you know what

 

      the confounders are, and you have measured them

 

      accurately, it is possible to adjust for them, and

 

      all of the studies we are looking at do a pretty

 

      job of adjusting for the confounders that we know

 

      about, so I guess one of the questions is how well

 

      do they do at identifying the important

 

      confounders.

 

                I would say not bad on a lot of that.

 

      That is, if you take, for example, the Graham study

 

      or the studies that Wayne Ray did in Tennessee

 

      Medicaid, there are a number of strengths.  I will

 

      sort of stop and back up on the things that make

 

      these look like relatively more credible studies in

 

                                                                30

 

      the scheme of the factors that we care about.

 

                They are inception cohorts of nonsteroidal

 

      users, that is, they are individuals who had to

 

      have been members of the health plan for at least a

 

      year before they received their nonsteroidal.

 

                There was a lot of information about their

 

      underlying medical status that was available to the

 

      investigators using both claims data and medical

 

      record data to ascertain cardiovascular disease

 

      along a number of dimensions, utilization of

 

      procedures like surgery or angioplasty or

 

      diagnostic procedures that are intended to find

 

      cardiovascular disease, hospitalizations, emergency

 

      room visits, and a substantial amount of

 

      information about the medications that these

 

      individuals took that was related to or plausibly

 

      related to cardiovascular risk factors.

 

                Those large number of factors were used to

 

      create separate risk models using only the

 

      unexposed, and then to use those risk models to

 

      create risk indexes for the individuals to use as

 

      an adjuster for underlying cardiovascular risk.

 

                Is it perfect?  No.  Is it pretty good?

 

      It seems to me that it meets the sniff test of

 

      saying that it has a reasonable chance of

 

                                                                31

 

      identifying important confounding.

 

                Unfortunately, there are a number of

 

      important confounders for which health care systems

 

      typically don't have good data, like smoking, OTC

 

      NSAID use, obesity, family history, and those are

 

      typically much more problematic.

 

                Some of these studies have worked pretty

 

      hard to try to either deal with it or understand

 

      whether it could be an important problem.  One of

 

      the handouts we had, for instance, was the study by

 

      Schneeweiss and colleagues who looked back at one

 

      of the studies by Solomon that was performed in the

 

      Medicare data set, and asked how important could

 

      these unmeasured confounders be.

 

                They actually had access to information

 

      from the Medicare Beneficiary Survey that asked

 

      representative Medicare beneficiaries detailed

 

      questions about many of the things that we would

 

      are about.  They weren't the people who were

 

                                                                32

 

      involved in that case-control study, but if you

 

      assume that the beneficiary survey, members were

 

      representative and they gave plausible answers, it

 

      is possible to extrapolate back to the source

 

      population, and the take-home message from that

 

      work, the answer didn't change very much, which is

 

      really what we want to know, not sort of the

 

      absolute difference, but whether those unmeasured

 

      confounders are important enough that they could

 

      cause a difference.

 

                I think we still have to be concerned at

 

      the end of the day, we still have to be concerned

 

      about residual confounding as a potentially

 

      important problem.

 

                One way I think that we can draw relative

 

      assurance from that work of adjusting for

 

      confounding is to ask how much did the estimate of

 

      risk change between the unadjusted and the adjusted

 

      result.

 

                I think there is a world of difference

 

      between an unadjusted result of 10 and an adjusted

 

      result of 1.5, and having an unadjusted result of

 

                                                                33

 

      1.6 and an adjusted result of 1.5.  The former, I

 

      think the reasonable assumption is we arguably

 

      haven't been able to deal with confounding in a way

 

      that would let us believe that 1.5 means something.

 

                I think there is a much stronger case to

 

      be made when adjusting for important confounders

 

      that we know about doesn't change the risk estimate

 

      very much, that that is a relative more credible

 

      answer.

 

                Having said that, I think that

 

      observational studies are best at finding relative

 

      risks that are more than 2.  I think that I would

 

      pay some attention to relative risks of 1.5.  I get

 

      very nervous about adjusted relative risks of 1.2.

 

                That doesn't mean that they are not right

 

      and I don't ignore them, but if we ask is that for

 

      sure the answer, my response to that is I am just

 

      less certain about that.

 

                I think we are always left at the end,

 

      while we spend a lot of time thinking about and

 

      adjusting for confounding, and I think we can do a

 

      pretty good job of that, it is much harder to

 

                                                                34

 

      adjust for misclassification, and it is essentially

 

      impossible to adjust for bias.

 

                So, I think one of the things we have to

 

      ask about is are there plausible sources of

 

      misclassification and bias, and if there are, in

 

      which direction do they work and would they

 

      seriously change our interpretation.

 

                We talked about the fact that absolute

 

      differences are the important ones that we care

 

      about.  We have already started to look at data

 

      that talks about person level risk and population

 

      level risk, so beyond saying that at the end of the

 

      day, I think these are the answers that we really

 

      need to talk about, not about relative risk.

 

                Personally, I think that we need two kinds

 

      of answers.  One is what is the information that

 

      patients and their physicians need to have to make

 

      decisions for them personally about whether to

 

      accept certain kinds of treatments in exchange for

 

      certain kinds of anticipated benefits.

 

                I think there is a population level

 

      concern that we have to have that emerges from the

 

                                                                35

 

      same set of analyses, but takes on a different

 

      form.

 

                So, you will be pleased to know that I am

 

      wrapping it up now, and I would say that both the

 

      cohort and nested case-control designs, which are

 

      the bulk of the observational studies that we are

 

      looking at, are relatively strong ones and I think

 

      deserve the committee's real attention.

 

                I am sorry that not every one of these

 

      studies prespecified a primary hypothesis that we

 

      can attend to, but we should whenever possible do

 

      that.  Even though we don't find important effects

 

      in some of these studies, I think it is important

 

      to recognize that they don't exclude one.

 

                As I have said, I am least certain about

 

      attaching great weight to relatively small excess

 

      risks even understanding that when they are

 

      extrapolated to a large population, they could

 

      account for very important public health problems.

 

                Finally, I would say that the things that

 

      support the studies' conclusions are the fact that

 

      when we do subgroup analyses and look for

 

                                                                36

 

      dose-response effects, that they strengthen the

 

      cause-effect relationship, and I think that there

 

      is reason to look for consistency across studies.

 

                I take the point that was made yesterday

 

      that it is possible that a dozen studies of

 

      naproxen could all have the same underlying bias

 

      that shift the point estimate in the same

 

      direction, but it is not so clear to me what that

 

      bias is.

 

                So, I think that we would have to have a

 

      reasonable idea of what might explain consistent

 

      differences across studies and ask if they are of

 

      sufficient magnitude to explain that.  As I say, I

 

      am not clear that there are those kinds of biases.

 

                I think we have to be cautious about the

 

      fact that residual confounding bias and

 

      misclassification are all issues with these

 

      studies.  So, I think that while they add to our

 

      discussion, they have to be considered in light of

 

      the fact that they are imperfect vehicles.

 

                Thanks.

 

                (Applause.)

 

                DR. WOOD:  Thanks very much.

 

                Let's just go straight on to the next

 

      speaker and then we will take questions for Dr.

 

                                                                37

 

      Platt after David Graham's talk.

 

                The next speaker is Dr. David Graham from

 

      the FDA.

 

                   Review of Epidemiologic Studies on

 

                Cardiovascular Risk with Selected NSAIDs

 

                       David Graham, M.D., M.P.H.

 

                DR. GRAHAM:  Good morning.  Today, I will

 

      give a review of epidemiologic studies and

 

      cardiovascular risk with selected NSAIDs.  I will

 

      be evaluating epidemiologic data from the published

 

      literature plus two currently unpublished studies

 

      that I have evaluated.

 

                My focus will be on providing estimates of

 

      risk of acute myocardial infarction in the setting

 

      of the use of COX-2 selective NSAIDs or naproxen,

 

      although I will have some comments in light of

 

      yesterday's discussion about other NSAIDs on those,

 

      as well.

 

                The methodology was to do a PubMed search

 

                                                                38

 

      by specific NSAIDs and then cross-check the

 

      citations in those articles to see if there are

 

      other articles I had missed.

 

                I would also like to take this moment to

 

      thank Dr. Crawford for his leadership in making it

 

      possible for me to present some of our preliminary

 

      data from a study in California Medicaid, which Dr.

 

      Gurkiepal Singh from Stanford and I recently

 

      completed.

 

                Before I get into the substance of my

 

      talk, I just want to comment a little bit on excess

 

      cases and projecting to the national population

 

      what was the impact of rofecoxib use, and I am

 

      doing this for two reasons - one, because it has

 

      been a source of controversy and concern.  We cite

 

      a number in a paper that I and others have

 

      published from Kaiser Permanente in which we made

 

      an estimate of the impact of rofecoxib use.

 

                Tomorrow, FDA will present its estimation

 

      of the number harmed by rofecoxib, modeling

 

      randomized clinical trial survival curves.  A

 

      couple of things I would like the Committee just to

 

                                                                39

 

      be aware of when they see that data tomorrow.  It

 

      assumes a grace period at the beginning of use that

 

      is based on the VIGOR study and the APPROVe, 6-week

 

      grace period in which there is no difference in MI

 

      or increased risk of MI, and the first six weeks of

 

      high-dose use with the first 18 months of low-dose

 

      use of rofecoxib.

 

                As I will show later in my talk, I believe

 

      that this is unreliable due to low statistical

 

      power early on, because we are only talking about

 

      in each of these studies a handful of cases early

 

      on in the study.  Two or three cases of MI and wide

 

      confidence intervals, you could have divergence of

 

      the curves very early.

 

                The epi studies, however, that I will

 

      present will show that there is a 3- to 50-fold

 

      more events to work with, more statistical power,

 

      and it suggests a different outcome.

 

                The second is, is that the patient

 

      enrolled in randomized clinical trials are

 

      generally healthier than patients in the real

 

      world.  So, if you are going to model what is the

 

                                                                40

 

      number of people who have been harmed in the

 

      population, you have got to assume what is the

 

      background rate that you are modeling off of.

 

                If you use a background rate from healthy

 

      people to model what is happening in the population

 

      of people who really aren't so healthy, who have a

 

      higher background rate, you will underestimate the

 

      actual population impact.

 

                So, in any event, now on to the substance

 

      of my talk.

 

                The next three slides provide a very dense

 

      overview of the major features of each of the

 

      epidemiologic studies that I reviewed.  I am

 

      looking at COX-2 usage in acute myocardial

 

      infarction.

 

                You can see that they are grouped in

 

      several groups.  The top three studies I consider

 

      from an epidemiologic perspective to be stronger

 

      studies to have been done better.  In terms of the

 

      things that Dr. Platt just talked about, I thought

 

      that these studies were the stronger studies.

 

                The next two studies from the published

 

                                                                41

 

      literature I thought were less strong, and I will

 

      describe why.  Finally, I have separated out these

 

      last two studies, one submitted by Merck to the

 

      FDA, performed by Ingenix, and the other, the

 

      Medi-Cal study that Dr. Gurkiepal Singh and I have

 

      recently completed of unpublished studies, so they

 

      are separated out from the group.

 

                You can see we are talking about different

 

      source populations, and so if we can see

 

      consistency of results across different

 

      populations, different age groups, and different

 

      study designs, I think that that adds support to

 

      the notion that there is a real effect.

 

                If we begin to see that there is a lack of

 

      consistency across the studies, then, many of the

 

      things that Dr. Platt talked about before need to

 

      be considered sort of the individual level of the

 

      studies, so what might explain why one study shows

 

      something and another one doesn't.

 

                This next slide shows the case definitions

 

      and in a number of cases that we were working with

 

      to come up with the relative risk estimates that I

 

                                                                42

 

      will show you.

 

                All of the studies began with hospitalized

 

      acute myocardial infarction.  Several of the

 

      studies were able to link members of their base

 

      cohorts to death certificate data to identify

 

      sudden cardiac deaths, as well.  So, those are the

 

      ones that have the +Sudden Cardiac Death.

 

                The asterisk next to the Kimmel study is

 

      to remind me and to remind you that the Kimmel

 

      study was based on nonfatal MIs only.  By their

 

      design, they had to interview their cases in

 

      person, so the patient had to survive their

 

      myocardial infarction to be interviewed.  So, there

 

      are those differences in study design.

 

                In the end, what is very important in an

 

      epidemiologic study in dealing with this issue I

 

      think in particular, is what is the statistical

 

      power of the study, and that is driven primarily by

 

      the number of events in the exposed group that we

 

      have to deal with.

 

                So, in this column here, you will see the

 

      total number of cases of myocardial infarction that

 

                                                                43

 

      were identified in each of the studies.  The

 

      asterisk next to the Ingenix study 628 is to remind

 

      me that in that study, they identified about 1,700

 

      MIs in total, but they excluded 1,100 of the MIs

 

      because they occurred in people who weren't exposed

 

      to an NSAID at the time of the myocardial

 

      infarction.  So, as a result, they left them out,

 

      because in the previous slide, when we look at the

 

      reference group, most of these studies used either

 

      non-use or remote use as the comparator.  The

 

      Ingenix study used active treatment with either

 

      diclofenac or ibuprofen.

 

                I would like to say one thing about

 

      reference groups.  Dr. Platt brought it up before.

 

      In this issue, I don't believe that there is a

 

      single best or optimal reference group.  What you

 

      really want to do is get as close as you can to a

 

      placebo group that has been randomized and has all

 

      the risk factors of the people who are getting the

 

      drug.

 

                In the observational world we can't get

 

      there, and so at the end of the day, if you want to

 

                                                                44

 

      do a study, you are in a sense forced to pick among

 

      the least evil of that you think, and then it has

 

      to do with how you define things.

 

                So, non-users, for example, could be

 

      viewed as being close to the placebo group, they

 

      are not getting the drug.  The problem is people

 

      who don't use drugs tend to be healthier than

 

      people who do use drugs, so that raises a host or

 

      problems.

 

                Yes, we can try to adjust for confounding

 

      and the like, but you are still left with that

 

      concern that they may be, in some way that we can't

 

      measure, different from the people who get the

 

      drug.

 

                In the study I did, and in several other

 

      studies that people have done, we opted to use

 

      people who had been treated with NSAIDs in the

 

      past, but weren't currently taking an NSAID at the

 

      time of the event or the study, the reasoning there

 

      that whatever the selection factors are that lead

 

      to a patient getting an NSAID, that some of those

 

      selection factors are there in people who

 

                                                                45

 

      previously received NSAIDs.

 

                That is still not a perfect group, though,

 

      because you could argue that patients who are no

 

      longer taking NSAIDs might be healthier than people

 

      who are currently taking NSAIDs.

 

                Finally, the problem that is posed by

 

      using an active comparator.  If you have an active

 

      comparator, and I am comparing another drug to an

 

      active comparator, and I see a difference, I don't

 

      know what it means.  I need some place to anchor

 

      the result, and for that reason, although none of

 

      them are perfect, I believe that the non-use and

 

      the remote use analyses at least give us a way of

 

      pegging results, and if we want to compare one drug

 

      to another drug, if we had that common reference

 

      point, at least it allows us to accomplish that.

 

                The one other thing I would like to point

 

      out about the number of cases is that for

 

      rofecoxib, especially at the high doses of

 

      rofecoxib, most of these studies had relatively few

 

      exposed cases.  The exception is the California

 

      Medicaid study where we had 157 exposed cases to

 

                                                                46

 

      the higher dose of rofecoxib.

 

                Now, this is a very busy slide and I won't

 

      spend a lot of time going over it, but I will be

 

      happy to answer questions later.

 

                Basically, before we heard there are

 

      unmeasured risk factors in automated databases that

 

      frequently can't be accounted for, aspirin use and

 

      smoking are among the most common.  So, you can see

 

      here that most of these studies, that information

 

      isn't obtainable.

 

                Kimmel was able to get both because they

 

      interviewed the patients, the cases and the

 

      controls.  In the Medi-Cal study, it turns out that

 

      aspirin is reimbursed, and so we have a handle on

 

      it there.

 

                In the Graham study, a survey of controls

 

      was done to see what these unmeasured factors might

 

      look like in the source population.  The Solomon

 

      study did the same thing, relying on the Medicare

 

      Beneficiary Survey that Dr. Platt talked about

 

      before.

 

                Important limitations I think that need to

 

                                                                47

 

      be highlighted are that in the Mamdani study, they

 

      excluded patients who had less than 30 days of

 

      NSAID use, so the survivor bias Dr. Platt talked

 

      about before, in my view, is  big concern with this

 

      study, and for that reason I ranked it in sort of

 

      that category of low quality studies.

 

                In the Kimmel study, as Dr. Platt also

 

      mentioned, there was low participation rate.

 

      Basically, half of the cases and half of the

 

      controls who approached volunteered to be in the

 

      study.  More importantly I think in that study, and

 

      it's unfortunate, is that there was what I would

 

      refer to as the potential for, in quote "reverse

 

      recall bias."

 

                Normally, with recall bias, we think oh, I

 

      have had a heart attack, I am going to remember

 

      more efficiently what happened to me immediately

 

      before the heart attack compared to some control

 

      where I say to the control what were you doing four

 

      months ago on this particular day.

 

                That is the classic recall bias.  This

 

      situation I think had what I would describe as

 

                                                                48

 

      reverse recall bias.  They interviewed the people

 

      who had heart attacks within four months of getting

 

      out of the hospital - what happened to you the day

 

      and the week before you had your heart attack four

 

      months ago.

 

                For the controls, they call them on the

 

      phone and they way what happened to you yesterday

 

      and the week before, so it is actually the reverse.

 

      The controls actually would have better recall of

 

      what they were actually doing than the cases

 

      potentially, and we will see how this is reflected

 

      in some of the results.

 

                Finally, with the Medi-Cal study, I think

 

      the single greatest concern for the committee in

 

      considering these data (a) that it is preliminary

 

      data, and (b) that this is a new database for

 

      research purposes.

 

                For that reason, I am just including a

 

      slide to orient people to that.  The other

 

      databases are ones that have been used before.

 

      This is a database that only in the last two years

 

      has come online to be sort of a quality sufficient

 

                                                                49

 

      to begin contemplating doing studies.

 

                Its strengths are that it is very large,

 

      it captures aspirin use, it doesn't censor people

 

      by age.  It combines Medicare coverage when you go

 

      over the age of 65 with the prescription benefits

 

      of Medicaid, so you get the drugs and the outcomes.

 

                Matching has been done to multiple cause

 

      of death tape, so that we have death data in this

 

      database up through 2002.  We didn't include it in

 

      the data I will show today because we really want

 

      the information up through 2004.

 

                Once people get into Medicaid or Medicare,

 

      they don't tend to drop out.  The limitations are

 

      that we can't get medical records, and that is

 

      something to understand, and that is a very

 

      complicated database.  Dr. Singh from Stanford who

 

      is the principal investigator for our Medi-Cal

 

      work, and who has worked to bring this database

 

      online, spent two years putting things together and

 

      working out the kinks in it before contemplating

 

      doing research with it, so at least you understand

 

      the limitations of that.

 

                There is always the concern about

 

      unmeasured risk factors and Dr. Platt talked about

 

      that.  I want to review for you very briefly some

 

                                                                50

 

      of the evidence from the published literature where

 

      efforts were made to look at what unmeasured

 

      confounding looked like and did it differ across

 

      NSAID type.

 

                In our study using Kaiser Permanente data,

 

      we did a survey, a random survey of random sample

 

      of controls, and we looked at aspirin use, smoking,

 

      and over-the-counter NSAID use.  You say see by

 

      NSAID that there really was not significant or

 

      substantial differences in the distribution of

 

      these risk factors.

 

                So, if they don't vary in the control

 

      group, they can't really confound that observation

 

      that you see very much.

 

                In the Solomon study, these are the data

 

      from the beneficiary survey.  Dr. Platt already

 

      mentioned a further analyses of these data that

 

      showed that the actual impact of all these

 

      unmeasured confounders on the measure of the

 

                                                                51

 

      relative risk at the end was measured in the

 

      hundredths of an odds ratio, so if the odds ratio

 

      was 1.34, adjusting for these things and projecting

 

      it out would change it to maybe 1.35 or 1.33.  We

 

      are talking about minuscule differences, not

 

      qualitatively important differences.

 

                Finally, in the Kimmel study, they also,

 

      through their interview, were able to see that for

 

      most of these factors, there was similarity across

 

      NSAID groups except for current smoking where the

 

      rofecoxib group had much lower current smoking than

 

      any of the other NSAID groups, but for past

 

      smoking, it was more than the other NSAID groups or

 

      the remote groups, and if you added these two

 

      together, the rofecoxib was very similar to these,

 

      but the celecoxib group had more smoking.

 

                My own conclusion from this is that yes,

 

      it is possible that some of these unmeasured risk

 

      factors could be influencing the results.  I don't

 

      think that there is strong evidence that there is a

 

      systemic bias that would sort of lead to

 

      interfering with trusting the results and thinking

 

                                                                52

 

      that these factors are confounding the observations

 

      that we see.

 

                So, first, I will talk about rofecoxib,

 

      then I will talk about celecoxib, then I will talk

 

      about valdecoxib in terms of epidemiologic data.

 

                These studies on the left, with their

 

      reference groups, are the ones that looked at

 

      myocardial infarction with rofecoxib.  What I have

 

      shown is for all doses and where it was present

 

      less than or equal to 25 milligrams and over 25

 

      milligrams, what the fully adjusted odds ratio and

 

      95 percent confidence intervals were.

 

                These studies varied in the extent of

 

      adjustment that they did.  The Ray and the Graham

 

      studies each adjusted for about 30 cardiovascular

 

      risk factors.  The Solomon study was a somewhat

 

      smaller number, Mamdani was a somewhat smaller

 

      number.  Kimmel, they adjusted for somewhere in the

 

      20s, the Ingenix study somewhere in the 20s, the

 

      Medi-Cal study adjusted for about 40 cardiovascular

 

      risk factors.

 

                What you can see is when you look across

 

                                                                53

 

      the All Doses is that, in general, the point

 

      estimates were elevated and for many the 95 percent

 

      confidence intervals excluded 1.

 

                More importantly, though, is looking at

 

      the low dose and the high dose data because we know

 

      from the clinical trials data, and we would suspect

 

      it on just pharmacologic grounds, that if there is

 

      an association that it might be worse with the

 

      higher dose than with the lower.

 

                So, four studies provide us estimates at

 

      the low and the high doses, the Wayne Ray study and

 

      our study from California Medicaid, and then the

 

      two unpublished studies, one from Ingenix and the

 

      other from California Medicaid.

 

                We see there that in three of the four

 

      studies, there is an elevation in the point

 

      estimate.  In the Graham study, it included one.

 

      When we look over 25 mg, we see greater consistency

 

      although in the Ingenix study, there is this

 

      paradoxical finding of sort of basically a neutral

 

      relative risk.  I don't have an explanation for why

 

      that happened, but it makes me concerned to some

 

                                                                54

 

      extent about what was going on in that study,

 

      because it is a result that goes in a very

 

      unexpected direction.

 

                What I would like to point out, because I

 

      will come back to it again, is that when we are

 

      dealing with drug safety, and the goal now is what

 

      risk can I exclude, if my job is--now I am not

 

      talking about efficacy anymore, what I am talking

 

      about is safety--if my job is to protect the public

 

      from harm, what risk can I exclude based on the

 

      data that I have, I believe that is much more

 

      relevant to look at the upper bound of the

 

      confidence interval than the lower bound.

 

                What traditionally happens is we look at

 

      the lower bound of the confidence interval and we

 

      say if it includes one, there isn't a problem, but

 

      the biggest reason, as Dr. Platt showed in his

 

      previous slide, for a wide distribution and a wide

 

      confidence interval in your study, is that the

 

      study doesn't have enough statistical power to get

 

      you a narrow enough confidence interval to say that

 

      you have the 95 percent certainty that you want.

 

                So, if your mission is above all else I

 

      want to do no harm, that I want to protect patients

 

      from harm, then, based on the data you have, I

 

                                                                55

 

      would submit that the upper bound of the confidence

 

      interval provides greater assurance to patients,

 

      and then if you are going to compare a benefit to a

 

      drug, that you might want to consider that benefit

 

      against that upper bound of the confidence

 

      interval, because that is compatible with the data.

 

      In any event, that is my view, and not the FDA's.

 

                This is a slide from California Medicaid.

 

      It is preliminary data and I wanted to present it

 

      to you, because what it shows is a dose-response to

 

      rofecoxib from 12.5 mg up to and through 50 mg.

 

                You can see that we have very wide

 

      confidence intervals for some of them, and that is

 

      a reflection of the limited number of cases, but I

 

      want to point your attention to the very narrow

 

      confidence intervals in the 12 to 25 mg and in the

 

      25 to 50 mg, just to point out that in the previous

 

      slide here, where we are talking about what are

 

      these point estimates, that now you can what we

 

                                                                56

 

      have done is we have fleshed them out a little bit

 

      more.

 

                Another comparison that I think is

 

      important to consider, certainly it was for us,

 

      when we did our study in Kaiser Permanente, was at

 

      the time there were two COX-2 selective inhibitors

 

      on the market, celecoxib and rofecoxib.

 

                The bigger study raised a question about

 

      high-dose rofecoxib.  Our question as researchers

 

      was, and public health scientists, was, well, let's

 

      suppose that rofecoxib increases the risk of

 

      myocardial infarction.

 

                We don't know that it does, but let's

 

      suppose that it does, what about celecoxib, because

 

      it actually had a larger share of the market, and

 

      if it turned out that these drugs have a benefit,

 

      and that benefit is worthwhile, then, it would make

 

      more sense from a practical perspective to use the

 

      drug that had a better safety profile.

 

                So, to us, it was very natural to want to

 

      compare rofecoxib to celecoxib, and so several of

 

      the epidemiologic studies felt similarly and in

 

                                                                57

 

      their design they included that analysis, and some

 

      of them it was, as Dr. Platt said, part of a we are

 

      going to make comparisons of everything against

 

      everything.

 

                The Solomon study, for example, did that.

 

      They did not state in that study what their prior

 

      hypothesis was. In our study, we did state it.  I

 

      mean yes, in a sense we had multiple comparisons,

 

      but we were interested in two different things.  We

 

      were interested in rofecoxib versus remote use, and

 

      we were interested in rofecoxib versus celecoxib,

 

      but we thought it beforehand and we planned that

 

      analysis.

 

                But in any event, what we say is, when you

 

      look at the all dose analysis, in all of the

 

      published studies, rofecoxib increased the risk

 

      compared to celecoxib.  When we looked at low dose

 

      rofecoxib, we see the increased risk.  When we look

 

      at the high doses of rofecoxib to celecoxib, again,

 

      we see the same pattern.

 

                Dr. Platt, in his talk before, talked

 

      about relative risks, risk differences, individual

 

                                                                58

 

      risk, and population risk.  The next two slides are

 

      intended to address this at the level of the

 

      individual and at the level of population.

 

                What I have done on this slide--and these

 

      slides now, no one should interpret this as meaning

 

      this is what actually happened in the

 

      population--the next slide is going to have numbers

 

      on it that are for illustrative purposes only, to

 

      help the committee understand what does a relative

 

      risk of 1.3 translate into at the individual level

 

      and at the level of population.

 

                Your typical COX-2 user is somebody in

 

      their 60s who has several other health problems, so

 

      I went to the National Center for Health Statistics

 

      and got the myocardial infarction rate for 65- to

 

      74-year-old men in the United States.  That rate

 

      turns out to be 1 per 50 per year.

 

                What I did is I took that as the

 

      background rate and I said if I have an individual

 

      using this drug with that background rate and then

 

      I applied to that person the relative risks or odds

 

      ratios found in these studies that are shown in the

 

                                                                59

 

      previous slides, what would the excess risk to the

 

      person be, sort of what would that risk difference

 

      translate to for the individual.

 

                For example, in the Ray study, if you

 

      remember, for 25 mg or less, the odds ratio was

 

      1.02.  Basically, it doesn't change.  If we based

 

      it on the point estimate, that 0.02 would translate

 

      to 1 out of 2,500 in a year increased risk of heart

 

      attack.

 

                Another way to view that number is, is

 

      that is the number needed to harm.  If I treated

 

      2,500 65- to 74-year-old men for a year with

 

      rofecoxib, and the rate was 1.02 that Ray found,

 

      treating 2,500 patients would produce 1 extra heart

 

      attack.

 

                Now, with the other studies that found

 

      higher estimates for the lower doses of rofecoxib,

 

      you can see that the number needed to harm ranges

 

      from about 90 to 200.  That is saying for every 90

 

      people to every 200 people I treat with low-dose

 

      rofecoxib, I would generate 1 other case.

 

                For high doses, because the relative risks

 

                                                                60

 

      were higher, the number needed to harm becomes

 

      lower.

 

                I have also shown it based on the upper

 

      bound of 95 percent confidence interval to show you

 

      that based on the data we have at hand, these are

 

      the excess risks that are consistent with the data,

 

      and from a public policy perspective, from a public

 

      health perspective, that is what I react to, and

 

      when I want to see a benefit and say does benefit

 

      exceed the risks, well, I want to know what is a

 

      real benefit in the population in terms of reduced

 

      hospitalization, lives saved, and does that benefit

 

      exceed what I can say is possibly the risk of these

 

      products.

 

                At the population level, now we have gone

 

      from an individual.  Remember in the Wayne Ray

 

      study we said it is 1 out of 2,500.  Well, that

 

      would translate to 400 additional cases of heart

 

      attack if we treated a million men who were 65 to

 

      74 years old, and we treated them with rofecoxib

 

      low dose for a year.

 

                With the others, you can see that those

 

                                                                61

 

      relative risks that might not look so impressive,

 

      that 1.23, that 1.30, that 1.4, that it projects

 

      out to a substantial number when you multiply it by

 

      the large number of people who use these products.

 

                For high doses it ends up being even

 

      greater, and then if we focus on the upper bound of

 

      the confidence interval, we again see that the

 

      numbers are larger still.  This very high number in

 

      our study was the result of our having low

 

      statistical power in addressing the high dose

 

      rofecoxib.

 

                One other question that I think is

 

      important to consider is when does the risk of

 

      myocardial infarction with rofecoxib kick in.  Now,

 

      we have seen data yesterday presented by both FDA

 

      and by Merck of various survival curves.

 

                We saw the bigger curve that showed the

 

      separation after about 6 weeks with an overall

 

      relative risk of about 5.  We saw, for the APPROVe

 

      study, this close overlapping line at about 18

 

      months, and then they diverge with an overall

 

      composite hazard ratio of about 2.

 

                I would submit to the committee that the

 

      reason for the failure of these studies to show

 

      divergence of the line shortly after the drugs are

 

                                                                62

 

      used are low statistical power, that they just

 

      don't have enough events to show it, and as a

 

      result, you can interpret because of the low

 

      statistical power you basically--how to describe

 

      it--you presume that there is nothing there, and

 

      you err on the side of the drug rather than erring

 

      on the side of what could the risk be to the

 

      population.

 

                If you really want to know what is going

 

      on in the population, then, you want to reduce the

 

      uncertainty.  The more uncertainty you have, if you

 

      act basically on the lower bound of that confidence

 

      interval, which is what you are doing when you are

 

      saying the risk doesn't begin until 18 months, you

 

      are basically saying that the absence of evidence

 

      is evidence of absence.

 

                I would say that in safety, what it is, is

 

      you just don't have enough power.

 

                Looking at the epidemiologic studies, I

 

                                                                63

 

      think that we have evidence to suggest that the

 

      risk begins much earlier.  I will point it out, and

 

      you guys and women can consider it for yourselves.

 

                In the Graham study, when we looked at low

 

      dose and high doses of rofecoxib, 50 percent of our

 

      cases at the low dose and at the high dose had used

 

      at the time--remember these are inception cohorts,

 

      so these people, their total use, this was 1.8

 

      months, this was 2.7 months--50 percent of our

 

      cases occurred within 2 to 3 months of starting the

 

      drug.

 

                That is a lot of power and that really

 

      speaks against the notion that the risk is

 

      backloaded, you know, it is for the low dose, that

 

      the risk doesn't happen until after 18 months.

 

      Nobody in our study was on rofecoxib for more than

 

      about 15 months.  I think that was the longest

 

      duration of use we had in our study.

 

                Now, in the Solomon study, they looked at

 

      the low dose and the high dose, and they presented

 

      data in several ways.  One is that they grouped

 

      things in 1 to 90 days, and what they showed was

 

                                                                64

 

      that for both the low dose and the high dose, there

 

      was evidence or risk early on.

 

                The Kimmel study, for all its

 

      deficiencies, most of it was low dose rofecoxib,

 

      and almost all the patients used it for less than

 

      12 months.  So, their finding on rofecoxib, if

 

      anything, would also speak to that the low dose

 

      effect kicks in long before 18 months.

 

                Finally, the Solomon and the Ingenix study

 

      looked at the first 30 days of use of these

 

      products, and both of them found elevated odds

 

      ratios of 4 for cardiovascular risk in the first 30

 

      days.

 

                Now, in both of these studies, they didn't

 

      separate it out by low dose and high dose, so this

 

      is a composite, but in both studies, about 85

 

      percent of the use to 90 percent of the use was low

 

      dose.

 

                So, basically, what I am concluding from

 

      this slide is that risk of myocardial infarction

 

      with rofecoxib begins when rofecoxib use begins,

 

      and that the inability to separate out those curves

 

                                                                65

 

      is based on the fact that if you were to count the

 

      actual number of events in the bigger study in the

 

      first 6 weeks, we are probably talking about 3 or 4

 

      events, and if you look at the confidence

 

      intervals, you are going to see they are wide.

 

                For the APPROVe study, the same thing

 

      holds, that you have too few events.  The whole

 

      study had 45 events, and I don't recall how many of

 

      those were on rofecoxib and how much of those were

 

      on placebo, but when you think about it, compare

 

      that and then look at the epidemiologic studies,

 

      and look at the number of cases that were in the

 

      epidemiologic studies, and for all their problems,

 

      and we can talk about those, they suggest there is

 

      a big discordance, and I think the answer, the

 

      reason is absence of statistical power in the

 

      clinical trials.

 

                In the epidemiologic literature, this has

 

      been recognized, and people have written papers

 

      saying that when you are trying to summarize the

 

      overall risk from a survival study, and you want to

 

      look at specific time periods, that you are better

 

                                                                66

 

      off taking the overall risk estimate for the entire

 

      study than focusing on a small segment at a time

 

      because of this issue of low statistical power, so

 

      I didn't invent this.

 

                Now, switch over to celecoxib.  There are

 

      a number of studies that have been done to look at

 

      celecoxib risk.  What I have tried to do here is

 

      plot out for you the relative risk or the odds

 

      ratio, the author of the study, and then the point

 

      estimates in the 95 percent confidence intervals.

 

                What you will see basically is that for

 

      most of these studies, there is no evidence of a

 

      protective or an injurious effect except for the

 

      Kimmel study that found a substantial protective

 

      effect.

 

                Remember the Kimmel study and what I

 

      believe is this reverse recall bias, as well as the

 

      low participation rate, and I personally discount

 

      that study.  The committee can decide for

 

      themselves that they want to do.

 

                What about celecoxib lower dose versus

 

      higher dose?  Well, unfortunately, the only place

 

                                                                67

 

      where this is adjusted, is looked at are in the two

 

      unpublished studies. We have the Ingenix study and

 

      we have the Medi-Cal study.

 

                What I would focus your attention on are

 

      the low dose and high dose, the low dose and the

 

      high dose.  What we see is in both studies,

 

      evidence of a dose response.  Now, the 95 percent

 

      confidence interval in the Ingenix study includes

 

      1, but the point estimate is pretty elevated.  That

 

      is 1.18 or so at 400 mg.

 

                In the Medi-Cal study, we go from 1.01 up

 

      to about 1.24.  Here, you can see the 95 percent

 

      confidence intervals.

 

                What I would conclude from this, although

 

      they are unpublished studies, that there is

 

      evidence of a dose response at the higher doses of

 

      celecoxib do confer an increased risk of myocardial

 

      infarction.

 

                I should point out that in the Medi-Cal

 

      study, the methodology that we used in that study

 

      is the exact methodology that we used in our Kaiser

 

      Permanente study that Dr. Platt before was gracious

 

                                                                68

 

      enough to say is one of the better done studies.

 

                There are no published studies on

 

      valdecoxib, so what do we do?  Well, preliminary

 

      data from Medi-Cal, we had 54 exposed cases and we

 

      found a point estimate of 0.99.  Now, this was

 

      mostly 10 and 20 mg use.  I think that out of all

 

      the patients that we had in the study, there were 2

 

      or 3 who had 40 mg valdecoxib use.

 

                In Medi-Cal, they only reimburse for the

 

      10-mg tablet, and they do this in an effort to try

 

      to discourage people having larger dose tablets and

 

      then taking more of it.

 

                So, this is all the epidemiologic

 

      information that I am aware of, that I have had an

 

      opportunity to review on valdecoxib.

 

                I will now move to naproxen.  The issue of

 

      naproxen is important for several reasons.  One,

 

      with the VIGOR study, the medical community was

 

      confronted with the hypothesis that naproxen was

 

      the single greatest and most effective

 

      cardio-protectant in the history of mankind, that

 

      it was far better than aspirin.

 

                We heard yesterday that aspirin reduces

 

      cardiovascular risk about 20 to 25 percent.

 

      Naproxen, if we were going to believe the VIGOR

 

                                                                69

 

      results, would have to reduce the risk of

 

      cardiovascular events by about 80 to 85 percent.

 

                So, this stimulated a lot of research.

 

      Here, I have summarized in the same fashion as I

 

      did for the rofecoxib studies, the various studies

 

      that have been done. Again, I have separated them

 

      out by the studies that I think are better done,

 

      the studies that have more significant limitations,

 

      and then the two unpublished studies.

 

                I point out the Rahme study to say that

 

      the only reason the Rahme study is listed among

 

      this group of suboptimal studies is that its

 

      reference group was other NSAIDs, primarily

 

      ibuprofen, because ibuprofen was the predominant

 

      other NSAID used in Quebec during the study.

 

                Again, we have the various outcomes that

 

      were done.  What I would point is that you can see

 

      the number of cases that we had to work with in

 

      these various studies, and I would point out that

 

                                                                70

 

      for the Solomon study, they had about 240 MI cases

 

      that they studied overall, but as you will see in a

 

      few minutes, that exposure could occur anytime in

 

      the past 6 months, so they don't see in the paper

 

      how many people were actually on naproxen at the

 

      time they had their event, so I can't put down a

 

      list of how many people were currently exposed.

 

                The Watson study is the only study that

 

      used a composite outcome.  It included myocardial

 

      infarction, stroke, subarachnoid hemorrhage, and

 

      subdural hematoma.  Why subarachnoid hemorrhage and

 

      subdural hematomas are in there is beyond me.  In

 

      any event, 26 cases of that composite outcome and a

 

      much smaller number of actual myocardial

 

      infarctions.  So, that is why that asterisk is

 

      there.

 

                With the Ingenix study, the asterisk next

 

      to the 179 is that this included both prevalent and

 

      incident cases, and the best studies, the best

 

      results come if you base it on incident cases only

 

      or incident use only as opposed to prevalent use,

 

      because prevalent use can have survivor bias. But

 

                                                                71

 

      in any event, in the Ingenix study, they had a

 

      number of different analyses, and they didn't

 

      always use their full number of cases.

 

                There are important limitations to note.

 

      I think the one to focus is to realize (a) there is

 

      no perfect study, we have talked about that before,

 

      and, two, that among all the limitations listed

 

      here, I think the most important one to note was in

 

      the Watson study, was this composite outcome which

 

      really just makes it very difficult from an

 

      epidemiologic perspective to study things.

 

                Myocardial infarction is very well

 

      validated in claims data, and Dr. Platt has already

 

      gone over that with you.  Stroke is notoriously

 

      difficult to work with in claims data, and subdural

 

      hematomas most commonly occur because as people get

 

      older, their brains shrink.  They bump their heads

 

      and then they get a little bleeding on the surface

 

      of the brain.  What that has to do with myocardial

 

      infarction risk, which is what we are really

 

      concerned about today, is beyond me.

 

                I have got two slides on the results. 

 

                                                                72

 

      This slide shows the studies that found no

 

      protective effect.  There is four studies that

 

      found a protective effect, and I am saving them for

 

      a separate slide, because I want to look at those

 

      individually.

 

                What you can see from the majority of

 

      these studies, and I would point out that the

 

      studies that were the best done studies in the top

 

      tier, they are on this slide, that all of them sort

 

      of suggest that there is no cardio-protective

 

      effect of naproxen.  Several of the studies point

 

      to the possibility of a small increased risk with

 

      naproxen.

 

                But we have four studies of positive

 

      results, and we will probably all remember the

 

      Archives of Internal Medicine publishing three of

 

      the articles in the same issue with an accompanying

 

      editorial that stated the issue is solved, naproxen

 

      is cardio-protective.

 

                I want to look at those studies and just

 

      describe to you my view of them.  The top three

 

      studies were the ones that were--well, no, not the

 

                                                                73

 

      Kimmel study--Rahme, Solomon, and Watson were the

 

      Archive studies.

 

                In the Rahme study done in Quebec, they

 

      compared current naproxen use versus other NSAIDs.

 

      That other NSAID was, by and large, ibuprofen, and

 

      they found a protective effect. Well, if ibuprofen

 

      increases the risk of myocardial infarction, let's

 

      just say that it does, and naproxen doesn't,

 

      naproxen could look like it's protective compared

 

      to ibuprofen, but not be protective really.

 

                The data presented in that paper, if we

 

      re-analyzed it versus non-use, we get an odds ratio

 

      of 1.28, statistically significant.  Now, this is

 

      not adjusted.  It is not possible from the data

 

      there for me to adjust this result, but based on

 

      what is in the paper, when you compared the

 

      unadjusted to the adjusted point estimates, they

 

      don't change very much, and what that suggests to

 

      me is that this effect, this 0.128 is probably not

 

      far off the mark.

 

                That would then make it comparable to the

 

      analyses I showed on the previous slide, that all

 

                                                                74

 

      of these slides use non-use or remote use, so then

 

      it would add a fourth study to an elevated point

 

      estimate for naproxen.

 

                Now, the Kimmel study, we have already

 

      talked about low participation rate and this

 

      reverse recall bias, and a small number of NSAID

 

      cases.  In fact, they don't even tell us in the

 

      paper how many cases they had.

 

                We move on to the Solomon study.  This was

 

      the result that was reported in the paper and was

 

      picked up by the press, a 16 percent reduction in

 

      heart attack risk with naproxen.  The problem, in

 

      my view, was that their definition of exposure in

 

      the study was any use of naproxen in the past 6

 

      months, which means that if I took naproxen 6

 

      months ago and stopped it, I could be included in

 

      this study as being exposed to naproxen.

 

                So, the question is then, you know, how do

 

      we interpret the study.  Well, Solomon was good

 

      enough to present data by current use and in recent

 

      use, and recent use included people who stopped

 

      their naproxen.  Their naproxen prescriptions day

 

                                                                75

 

      supply ran out between 1 day and 60 days before the

 

      MI or the index date for their controls, and remote

 

      users, their NSAID use, their naproxen use ended

 

      from 61 days to 180 days prior to the event.

 

                So, let's look at what those results are

 

      then, and what we see is they are identical.  So,

 

      unless the committee is prepared to believe that

 

      naproxen confers lifetime immunity to

 

      cardiovascular disease, I think we have to conclude

 

      from these data that what we really have here is

 

      selection bias, and it is not the fault of the

 

      investigator. Dr. Platt talked about before that

 

      there are some things you can't adjust for.  You

 

      can't adjust for bias.  What you can try to do is

 

      identify bias, and if you identify it, then at

 

      least you know what you are dealing with.

 

                Here, I think we have what is classic

 

      selection bias.  It is not naproxen that protects

 

      you again myocardial infarction, it is some other

 

      factor that in this health plan, that they used to

 

      study this drug, the patients who were being

 

      treated with naproxen happened to have lower

 

                                                                76

 

      cardiovascular risk.

 

                I can't explain why that happened.  Dr.

 

      Solomon probably can't explain why it happened, but

 

      it's not due to naproxen.

 

                Finally, the Watson study.  This study was

 

      sponsored by Merck, and it was authored by Merck

 

      investigators.  The result that was published as

 

      being the basis for the conclusion was this top

 

      result, a 39 percent reduction in cardiovascular

 

      risk.

 

                First, I just want to remind everybody,

 

      composite outcome here, subarachnoid hemorrhage,

 

      subdural hematoma, stroke, as well as heart attack,

 

      26 events total, much smaller number of heart

 

      attacks.

 

                For this event, you can see the

 

      checkmarks.  These are the various variables that

 

      they adjusted for in the study.  The way they

 

      handed cardiovascular risk, if you read the paper,

 

      I would have to say that it doesn't measure up to

 

      the standards that were set by Dr. Wayne Ray.

 

                We modeled our study in Kaiser and in

 

                                                                77

 

      Medi-Cal, and Dr. Wayne Ray, I think that he has

 

      set the standard for how one needs to go about

 

      adjusting for cardiovascular risk. It is not enough

 

      to rely on diagnoses.  You have to use the

 

      medications, because medications are much more

 

      accurate predictors of disease than diagnoses in

 

      these administrative claims data.

 

                In any event, they didn't adjust for

 

      cardiovascular risk, and they didn't adjust for

 

      smoking although they had that data.  Then, they

 

      present later on another analysis that now includes

 

      cardiovascular risk and it is no longer, in quotes,

 

      "statistically significant," and then they include

 

      smoking, and again it is not statistically

 

      significant.

 

                My conclusion on the Watson study was that

 

      (a) they have got a composite outcome that, in my

 

      view, isn't very informative towards the question

 

      of myocardial infarction; (2) that it is very small

 

      numbers; (3) that a variety of approaches were used

 

      in the analysis that inadequately account for the

 

      risk factors that could confound the result, so I

 

                                                                78

 

      have discounted that, as well.

 

                So, a conclusion when I look at these, in

 

      quotes, "4 positive studies," I conclude that none

 

      of them provide credible evidence of a protective

 

      effect.

 

                In light of yesterday's discussion in the

 

      afternoon about other NSAIDs and what might explain

 

      the differences, let's say, celecoxib and rofecoxib

 

      studies, the rofecoxib studies used naproxen as a

 

      background, a comparator, the celecoxib studies

 

      using ibuprofen or diclofenac.

 

                Dr. FitzGerald is talking and saying,

 

      well, you know, all of these drugs could increase

 

      the risk because what is happening, you know,

 

      biochemically, with the balance of prostacyclin,

 

      could be influenced by these different drugs in

 

      ways that aren't immediately obvious or detectable

 

      in a clinical trial.

 

                I thought I would just share some of that

 

      information on other NSAIDs with the committee,

 

      recognizing a couple things that no single study is

 

      definitive and what you want to look for I think is

 

                                                                79

 

      consistency across studies, but as far as

 

      randomized trials go, I would like just to mention

 

      that there are generally too small, too few events,

 

      and you are not going to get the answers that you

 

      need from them unless you make these clinical

 

      trials substantially larger than anything people

 

      have contemplated up to now.

 

                So, from our California Medicaid study, it

 

      is all preliminary and it has not been published,

 

      for ibuprofen we found a small but statistically

 

      significant increased risk. For indomethacin we

 

      found a risk of 1.7.  I would like to say on

 

      indomethacin that we found an increased risk with

 

      indomethacin in our Kaiser Permanente study.  It

 

      was 1.3 and it was highly statistically

 

      significant.

 

                In at least two other studies that I

 

      reviewed in preparation for this advisory meeting,

 

      indomethacin is noted to have an increased risk of

 

      myocardial infarction.

 

                It is not commented on in the text because

 

      that wasn't a primary analysis, but what I am

 

                                                                80

 

      talking to you about now is consistency, and I

 

      would submit to the committee that indomethacin is

 

      a lot of smoke, there is a lot of smoke for

 

      indomethacin.

 

                In our study, in our Kaiser study, for

 

      example, we did not think in advance to look at

 

      indomethacin separately. I mean we knew we were

 

      going to look at it, but it wasn't a primary

 

      hypothesis.  We didn't adjust for gout.  I mean

 

      everyone knows that indomethacin gets used in gout.

 

      Gout increases the risk of cardiovascular disease.

 

                Well, in the Medi-Cal study, we adjusted

 

      for gout. Yes, gout increases the risk of

 

      myocardial infarction.  It didn't change the odds

 

      ratio here.

 

                I think this next finding, Meloxicam, is

 

      important.  Meloxicam is now the number one selling

 

      branded NSAID in the country.  With the removal

 

      from the market of rofecoxib, the medical

 

      community, shying away from the coxibs, are moving

 

      to other drugs that they perceive would have the

 

      advantages of COX-2 selectivity without the bad rep

 

                                                                81

 

      that coxibs appear to be acquiring.

 

                So, you now have a shift in the

 

      marketplace to Meloxicam.  There have been articles

 

      in the Wall Street Journal and the New York Times

 

      on this.  The company recently raised the price on

 

      the tablets.

 

                In any event, we are presenting these data

 

      just to say that we found an increased risk.  It is

 

      one study, but I think it is the only study.  We

 

      looked at this in Kaiser.  Meloxicam is almost not

 

      used in Kaiser, so we couldn't study it.

 

                In our California Medicaid study, we only

 

      looked at drugs that had more than 50 currently

 

      exposed cases.  Nabumetone came out in this study

 

      as not showing a whiff of a problem.  Sulindac,

 

      there was an increased risk.

 

                Regarding ibuprofen, in our Kaiser study,

 

      we found an increased of 1.06, which sounds really

 

      trivial.  It wasn't statistically significant, but

 

      the confidence intervals were pretty narrow.  It

 

      was 0.96 to 1.17.

 

                My concern is, as Dr. Platt talked about,

 

                                                                82

 

      you know, above 2 you feel really comfortable,

 

      above 1.5, you can believe it, below that you begin

 

      to get really edgy.  The problem is most of the

 

      risks that we are probably facing, if it turns out

 

      that the non-coxib NSAIDs increase the risk of

 

      cardiovascular disease, that is where the risk

 

      level is going to be, and that is what we are going

 

      to have to contend with, because it has tremendous

 

      effects on the population.

 

                Finally, dose response.  This slide shows

 

      for diclofenac.  This is from California Medicaid.

 

      What we wanted to do was show evidence of dose

 

      response, consistency in the data.  Remember we

 

      pointed out diclofenac before.  Diclofenac in this

 

      study overall did not have an increased risk, but

 

      at the high doses there is a suggestion of a dose

 

      response.

 

                I will skip that.  This slide was to say

 

      that depending on your reference point, you can get

 

      different results, if I use an active comparator

 

      versus remote, and this is showing the three NSAIDs

 

      from California Medicaid compared to non-coxib

 

                                                                83

 

      NSAIDs, and you can see the rofecoxib is different

 

      than them, and the other two aren't necessarily

 

      that different.

 

                My conclusions, and I am sorry to have

 

      gone so long.  Celecoxib, we believe that based on

 

      the evidence we have at hand, that there is no

 

      apparent effect of risk at doses of 200 mg or less.

 

      Above 200 mg, we think that there is evidence of

 

      increased risk.

 

                For rofecoxib, we believe that there is

 

      evidence of increased risk at both the lower doses

 

      and the higher doses, and that risk begin early in

 

      therapy and is apparent during the first 30 days of

 

      use.

 

                With valdecoxib, there is a paucity of

 

      information, but the information we have at this

 

      time suggests that the risk is not increased at

 

      doses of 20 mg or less.

 

                As a class, non-coxib NSAIDs may increase

 

      the risk with differences between each of the

 

      NSAIDs.  I don't think we are going to be able to

 

      talk so much about class effects. In the end, it is

 

                                                                84

 

      going to have to be looking at individual drugs.

 

                The COX-2 hypothesis may be true, but if

 

      it is, we are still going to have to look at these

 

      other drugs in terms of their individual properties

 

      and what they do.

 

                Finally, naproxen is not

 

      cardio-protective.

 

                Thank you.

 

                (Applause.)

 

                DR. WOOD:  Thanks very much.  David, it

 

      will come as no surprise to you that every time

 

      practically I pick up a newspaper, I read about

 

      what you are not going to tell us.

 

                So, my question to you is what have you

 

      not told us that you think we should know, because

 

      I would like to make sure.  Lots of other people

 

      have shown up here without slides that they forgot,

 

      so I just want to be sure that if there is anything

 

      else we need to hear, we hear it.

 

                DR. GRAHAM:  Well, as far as the science

 

      goes, I think I presented the evidence that I am

 

      happy to be able to share with the committee that I

 

                                                                85

 

      thought it was important for the committee to have

 

      an opportunity to hear.

 

                The source of controversy surrounding my

 

      presentation related to the unpublished studies

 

      that I was going to be permitted to present or

 

      asked, actually asked to present the Ingenix

 

      results, the unpublished study from Merck, but that

 

      I was being told not to present the unpublished

 

      data from the California Medicaid study, and

 

      personally, I had great difficult standing here

 

      before this committee as an investigator and as a

 

      scientist, as a physician, and telling you the

 

      information that I have, that I am allowed to talk

 

      about, and remaining silent on things that I know

 

      about that I am not allowed to talk to you about.

 

                Fortunately, Dr. Crawford exercised great

 

      leadership in making it possible for me to present

 

      that data, recognizing it's preliminary, but the

 

      methods that we used are identical to our Kaiser

 

      study for the California Medicaid, and for me, I

 

      think the big reservation is, is that it's an

 

      untested database, but I think that everything that

 

                                                                86

 

      could be done to develop the database and to do

 

      quality assurance and to work out the kinks has

 

      been done.

 

                If you look at the findings in the

 

      California Medicaid study and you compare them to

 

      the clinical trials data, and the anomalies and the

 

      questions that you were discussing yesterday about

 

      the clinical trials' data, you look back at the

 

      California Medicaid data, and you are going to see

 

      I think great consistency between the findings that

 

      might help explain and interpret some of the things

 

      that seemed questionable or uncertain yesterday.

 

                So, in any event, I have been able to

 

      present what I thought was important to present,

 

      and I am happy to have had that opportunity.

 

                DR. WOOD:  So, the answer is we have seen

 

      it all, is that right?

 

                DR. GRAHAM:  You have seen it all.

 

                DR. WOOD:  Okay, good.  Let me ask you a

 

      question. If you go back to your slide that showed

 

      the excess population risk, put that in proportion

 

      for us in terms of, say, the other drugs that have

 

                                                                87

 

      been withdrawn from the market.  I mean what sort

 

      of numbers would we be expected to see?

 

                DR. GRAHAM:  That is a great question.

 

      The typical drug that has come off the market in

 

      the United States, like the leading cause of drug

 

      withdrawals in the United States in the last 20

 

      years has probably been acute liver failure.

 

      Rezulin came off the market because of it,

 

      troglitazone, bromfenac, a number of other drugs.

 

                Acute liver failure in the general

 

      population has a background rate of about 1 per

 

      million per year.  We are talking about that is the

 

      rate of being struck by lightning, 1 per million

 

      per year, and these drugs were pulled off the

 

      market because it increased the risk of that.  It

 

      might increase the risk 5-fold, it might increase

 

      the risk 10-fold, it might increase the risk

 

      100-fold.  The fact is the background rate was 1 in

 

      a million and what that means is that the actual

 

      number of people affected is sort of measured in

 

      the tens and the hundreds for the liver failure

 

      that could be life-threatening.

 

                In this situation, and this is why the

 

      lower relative risk becomes so critical, we are

 

      talking about a serious event that has a very high

 

                                                                88

 

      background rate.  Heart attack is not a rare event,

 

      and as I pointed out before, there is a 1 in 50

 

      chance that the average American male age 65 to 74

 

      is going to have a heart attack this year, 1 in 50.

 

                That is an extraordinarily high risk.  You

 

      increase that risk 5-fold with a high dose.  That

 

      is what happened with VIGOR.  If I have got

 

      millions of people taking the high doses, and that

 

      is what had in the United States, and I have

 

      increased the risk 5-fold, you are going to get

 

      numbers that balloon out like this.

 

                So, there is no comparison in terms of

 

      what the population impact is of the typical drug

 

      that has come off the market in the United States

 

      and what we are dealing with here, and that is

 

      because of the high background rate of the

 

      underlying event that we are talking about.

 

                DR. WOOD:  So, this would produce many

 

      more cases from what I understand.

 

                DR. GRAHAM:  Many more.

 

                    Committee Questions to Speakers

 

                DR. WOOD:  From the committee, we have

 

      questions.  Let's start with Dr. Shafer.

 

                DR. SHAFER:  Dr. Graham, tomorrow we are

 

      going to be asked, as a committee, to consider the

 

                                                                89

 

      question about a class effect for the selective

 

      COX-2 antagonists and for the non-selective NSAIDs.

 

                One of the things that I am finding, that

 

      I am having trouble putting together here, is we

 

      have a lot of conflicting data, and for the COX-2

 

      antagonists we have a lot of data from randomized

 

      controlled trials.

 

                Certainly for the NSAIDs, we are going to

 

      have to go with a lot of these observational

 

      studies because we don't have a lot of data on the

 

      topic at hand from randomized controlled trials.

 

                As I look at this, if we come up with some

 

      sort of common warning as a class, and it applies

 

      to everything, we have, in fact, communicated no

 

      relevant information.  On the other hand, if we are

 

      going to come up with individual drug-specific

 

                                                                90

 

      recommendations, we are going to have to have very

 

      different evidentiary standards in some ways,

 

      because for some of these, we have very little

 

      information, as you pointed out, and yet your data,

 

      particularly the unpublished data from the Medi-Cal

 

      trial, and I appreciate that there is all the

 

      issues of not being previewed and stuff, but we are

 

      all familiar with that process and know how it

 

      works.

 

                What can you tell us to guide us?  Should

 

      we try to go drug by drug specific?  How do we set

 

      our evidentiary standards when we talk about class

 

      effects where in some cases, we are just not going

 

      to have a lot of data here?

 

                DR. GRAHAM:  Right.  What you are going to

 

      be getting now, of course, is my opinion, not FDA's

 

      opinion. Probably if you were to talk to Bob Temple

 

      or John Jenkins, or anybody else, everybody is

 

      going to have a slightly different answer.

 

                What we talking about now I think to some

 

      extent is philosophy, so what that preamble, first,

 

      I believe based on the evidence that there is a

 

                                                                91

 

      COX-2 effect and that that COX-2 effect is dose

 

      dependent, and that we see evidence of that with

 

      rofecoxib, with celecoxib, and with valdecoxib.

 

                The difference between rofecoxib and the

 

      other two coxibs on the market is that a safe dose

 

      for rofecoxib wasn't identified, the dose wasn't

 

      low enough.  That raises a question in my mind

 

      about what is an appropriate therapeutic index for

 

      a drug.

 

                I am giving you my opinion now, but when I

 

      listened to Dr. Cryer's presentation yesterday, the

 

      bottom line conclusion I came to at the end of that

 

      was there really doesn't appear to be a need for

 

      COX-2 selective NSAIDs based on what I heard

 

      yesterday.  There is probably other information out

 

      there why I am wrong, but that was the conclusion I

 

      came from.

 

                So, in any event, that is answer one.  I

 

      believe there is an effect and it's dose related,

 

      and with celecoxib and valdecoxib, I think we have

 

      evidence.  You said before we have a good

 

      evidentiary base based on clinical trials for the

 

                                                                92

 

      COX-2s.  I would challenge that in the sense of the

 

      survival curves and the things that I talked about

 

      there, that we have a very weak evidentiary base

 

      for things like protective, you know, is there a

 

      grace period for use, and also on the dose issue,

 

      we really don't have a great evidentiary base.  But

 

      that being said, you understand me.

 

                Now, for the non-coxib NSAIDs, my own view

 

      is that as an epidemiologist first, I try to report

 

      the phenomenon I observe and leave it to brighter

 

      minds to figure out why what I observed happens.

 

                You are asking me sort of what do I think

 

      is happening underneath it all.  I am attracted to

 

      the COX-2 hypothesis personally.  Dr. Gurkiepal

 

      Singh, my colleague and co-author in Medi-Cal, he

 

      has a different view on that, but I think that we

 

      can these in vitro tests that say, oh, this is the

 

      COX-2 selectivity of this NSAID, you know, in a

 

      test tube.

 

                What happens in the human body could end

 

      up being surprisingly different.  We saw yesterday

 

      that the dynamic response of these differences,

 

                                                                93

 

      that the platelet effect is very quick, the

 

      thromboxane effect is a very quick effect, the

 

      prostacyclin effect seems to be a more gradual

 

      effect, that this creates very complex interactions

 

      that ibuprofen, that any of these drugs could, in

 

      the end, end up with a deficit, a prostacyclin

 

      deficit that results.

 

                I think Dr. FitzGerald showed that slide

 

      yesterday with the normal distribution of the time

 

      area under the curve and then this little sliver

 

      where they are not protected, and that may be the

 

      reason why, for these different drugs, that we end

 

      up with these different relative risks and these

 

      different odds ratios.

 

                In the end, for the non-selective NSAIDs,

 

      my own advice would be let's look to see are there

 

      somewhere in studies--it is going to be

 

      observational studies--in observational studies

 

      that we believe have been reasonably well done.

 

                By "well done," here, they have to be

 

      large.  The literature is full of really small

 

      studies.  I mean I could have presented Meloxicam

 

                                                                94

 

      studies, 5 patients, no risk.  Well, da, you know,

 

      you have got a confidence interval that goes from

 

      zero to infinity.  They need to be large.  Look in

 

      a systematic way to identify what the body of

 

      evidence is.

 

                Can we identify bad actors?  I believe

 

      indomethacin, for example, is clearly a bad actor,

 

      and if people looking at the data concluded that,

 

      take appropriate action, weed the garden of the bad

 

      actors.

 

                Try to identify drugs that based on the

 

      evidence we have, appear to be less risk in the

 

      totality of their evidence, looking for consistency

 

      study to study to study, and then, in a rational

 

      way, suggest these are the drugs we think that the

 

      public should use, and these other drugs, well,

 

      then you have to decide do you want them on the

 

      market or not.

 

                I am not really going to comment on that,

 

      but I think that is the approach I would take.  I

 

      would be trying to sort of identify right off the

 

      bat the bad actors and let's get rid of them.

 

                Things that look like they may actually be

 

      safe, and when I say "safe" now, I mean that they

 

      don't appear to have cardiovascular risk, identify

 

                                                                95

 

      them and shift the market towards that, and then

 

      deal with the others.

 

                DR. WOOD:  Dr. Friedman.

 

                DR. FRIEDMAN:  Thank you.  Several

 

      comments.  First, as both Dr. Graham and Dr. Platt

 

      have mentioned, observational studies are

 

      essential, but they have a number of limitations,

 

      and because of those limitations, it is easy after

 

      the fact to critique away those whose results you

 

      don't much care for as we have seen.

 

                But a couple of other points.  One, can

 

      these particular drugs, their primary use, we are

 

      dealing with chronic conditions, conditions that

 

      last years, sometimes many years, and so the drugs

 

      are intended for use over those many years

 

      potentially.

 

                Yet, most of the clinical trials we heard

 

      reported yesterday are 12, 18 weeks, a few of them

 

      go longer.  You mentioned that one of the reasons

 

                                                                96

 

      we didn't see the problems early on may be numbers,

 

      and I agree that is potentially it, but the fact is

 

      we didn't see problems arise in the studies until

 

      14, 18 months.

 

                We often see analyses by patient years of

 

      exposure.  In this particular setting, I don't know

 

      whether patient years are always equal to patient

 

      years, and therefore, I guess I would say why

 

      aren't we doing more bigger, longer randomized

 

      clinical trials for these chronic conditions?

 

                DR. GRAHAM:  I am not speaking for the

 

      agency now.

 

                DR. WOOD:  We got that.  Don't say it each

 

      time.

 

                DR. GRAHAM:  Okay.  I think they are

 

      incredibly expensive and companies don't want to do

 

      them.  There is not an incentive for them to do

 

      them, and you would have to talk to the people from

 

      the new drug side of the house, but the fact is

 

      that they are not requiring them.

 

                So, that is a very legitimate question.

 

      You know, working as an epidemiologist, we try to

 

                                                                97

 

      make do with what is, and so we use the

 

      observational data.  You are going to get better

 

      quality data if you are able to do this, but just

 

      to give you a sense of the size of the studies that

 

      I think you would need to do, I mean you talked

 

      about before that you have the APPROVe study and we

 

      see no effect until 18 months, but there was study

 

      090 that was talked about briefly by Dr. Villalba

 

      yesterday.  It was a 6-week study at 12.5 mg, and

 

      it showed a difference, the suggestion of a

 

      cardiovascular risk within the 6-week study at the

 

      lowest dose.  Now, it's a small study, as well.

 

                But I am just saying that to say that I

 

      think the epidemiologic data, in my mind at least,

 

      answers the question about when the effect begins.

 

      The question is if you want to have--this is the

 

      philosophy--how much certainty do you need to make

 

      a decision.

 

                Right now, when it comes to efficacy, the

 

      effect, does the drug work, you are looking at the

 

      lower bound of the confidence interval, and you

 

      want to see is that different than 1, because if it

 

                                                                98

 

      is, then, I will conclude with 95 percent certainty

 

      or greater that the drug actually has an effect.

 

                When it comes to safety, you are doing the

 

      same thing.  You are looking at that lower bound.

 

      You want this 95 percent certainty that the drug is

 

      harmful.  You are presuming that the drug is safe

 

      rather than let's presume we want to do no harm to

 

      patients.

 

                Let's start off at the beginning assuming

 

      that the drug isn't safe, and we want to have a

 

      certain level of confidence about how bad this drug

 

      could be, and that is still tolerable to us.  We

 

      want to cap the risk.  It will be a completely

 

      different way of looking at studies for a safety

 

      perspective, one that actually gives a priority to

 

      safety and it maximally protective of patient

 

      safety, just as that high standard for efficacy is

 

      maximally protective of patient safety, because by

 

      keeping drugs off the market that don't work, I am

 

      protecting patients from unsafe drugs, and if I

 

      have pneumonia and I am given a drug that doesn't

 

      work, well, I get a harm from that.

 

                But that's philosophy, and I think it's an

 

      outcropping, it's a development, a natural

 

      extension of the development of clinical trials in

 

                                                                99

 

      the United States where the focus has always been

 

      on efficacy.

 

                DR. WOOD:  Let's try and keep both the

 

      questions and the answers reasonably short,

 

      otherwise, we will be here until after midnight.

 

                DR. GRAHAM:  I apologize.

 

                DR. WOOD:  That's okay.  Let's go on to

 

      Dr. Elashoff.

 

                DR. ELASHOFF:  First, I have one comment

 

      and then one question.  In terms of confounding,

 

      just because you put a lot of variables in some

 

      model doesn't necessarily mean that you have

 

      adequately removed the confounding effects even of

 

      those variables.

 

                The second has to do with Dr. Graham's

 

      slide 13, the excess population risk.  I note that

 

      the Ingenix data has been left out of the bottom

 

      category.

 

                DR. GRAHAM:  That's right, because for the

 

                                                               100

 

      high dose.

 

                DR. ELASHOFF:  Yes, but the negative sign

 

      needs to be on the slide, otherwise, it's a biased

 

      presentation.

 

                DR. GRAHAM:  Well enough.  I take that

 

      correction. Okay, fair enough.

 

                DR. WOOD:  Dr. Bathon.

 

                DR. BATHON:  Yes.  As we weigh the

 

      risk-benefit ratio of these drugs, one

 

      consideration is that there are subgroups of

 

      patients in which the benefit might outweigh the

 

      risk possibly.

 

                With that in mind, it would be helpful for

 

      us who are not cardiologists or epidemiologists to

 

      be able to put the relative risks that we have been

 

      seeing over the past day or two in context with all

 

      the cardiovascular risk factors that exist.

 

                So, for example, if you were take the

 

      presumed relative risk of rofecoxib of 1.5 to 2.0,

 

      at least at the higher dose, and put it into some

 

      context for us of the 20 to 40 cardiovascular risk

 

      factors that exist in a sort of rank order, where

 

                                                               101

 

      would you put the COX-2 drugs?

 

                 DR. GRAHAM:  For the high dose it would

 

      be probably more significant than smoking or

 

      diabetes or hypertension, maybe more important than

 

      the combination of several of those factors in a

 

      patient.  For the lower dose, it is probably more

 

      than hypertension, a little less than diabetes, and

 

      a little less than smoking.

 

                I know, David, you know the cardiovascular

 

      risk factors much better than I do, and so does Dr.

 

      Hennekens, but that would be my ballpark on that.

 

                DR. WOOD:  Dr. Abramson.

 

                DR. ABRAMSON:  Yes.  I want to go back to

 

      the question Dr. Shafer asked about if these

 

      classes of drugs or this group of drugs could be if

 

      there was a hierarchy of risk, and you first

 

      answered that you thought the coxibs were more

 

      risky, but I would challenge you a bit simply on

 

      your own presentation.

 

                I would like you to discuss your data,

 

      because you then went on to talk about how

 

      indomethacin has a risk, Meloxicam has a risk. 

 

                                                               102

 

      Based on your data, the message that came through

 

      is that there was a dose response risk for

 

      cardiovascular outcomes, that we saw it within the

 

      coxibs, but we also saw it where the data were

 

      available in the non-selective NSAIDs.

 

                There are data that we have seen that

 

      ibuprofen might increase risk.  We didn't talk

 

      about the McDonald and Way paper that in

 

      cardiovascular discharge patients, people given

 

      ibuprofen had a higher mortality 2-fold.

 

                So, as the smoke clears, I am not sure

 

      that the simple answer that the coxibs were

 

      different was actually supported by your data, nor

 

      your ultimate explanation.  Can you defend that?

 

                DR. GRAHAM:  I think you are accurate.

 

      What I was saying was I was referring, I think, to

 

      the underlying COX-2 hypothesis and that it is

 

      clearer, I believe, and, well, maybe it's an

 

      overgeneralization, because we have the n that we

 

      are viewing is so small, that looking at rofecoxib

 

      as sort of the example where we can see very

 

      clearly the dose response at all the levels and its

 

                                                               103

 

      progression, and understanding its mechanism of

 

      action, and then seeing similar things with

 

      celecoxib and valdecoxib.

 

                I think what you are saying is fair.

 

      Maybe a better thing to say is, in the end, that

 

      you do need to look at it drug by drug.

 

                What I was saying, though, in that answer

 

      that I gave to Dr. Shafer, I was really talking

 

      more about sort of the COX-2 mechanism and the

 

      coxibs as being, in quotes, "COX-2 selective," but

 

      I think your observation is correct.

 

                DR. ABRAMSON:  Add to that, that although

 

      there is a hazard that we don't accomplish a lot by

 

      simply saying the class of NSAIDs may have risk, I

 

      think we have under-appreciated that over the last

 

      10 years.

 

                It is not that different from the

 

      mid-nineties recognizing that there was a class GI

 

      effect of these drugs, and that compared to

 

      placebo, whether it's hypertension or long-term

 

      potential adverse outcomes, this is something that

 

      doctors have to be aware of, even the simple thing

 

                                                               104

 

      of checking blood pressures when you put people on

 

      any nonsteroidal drug.

 

                So, I don't know that it is necessarily a

 

      bad outcome to call attention to this class effect

 

      until we get better information on each of these

 

      individual drugs.

 

                DR. WOOD:  Dr. Day.

 

                DR. DAY:  I have a comment about recall

 

      bias and reverse recall bias.  There is a huge

 

      research literature on how memory works both in the

 

      laboratory and in the every-day world, and there

 

      are two phenomena that have been very heavily

 

      studied that I think might be relevant here.

 

                One is called flashbulb memory, and the

 

      idea is when an emotional spectacular event

 

      happens, such as when you first learn that JFK had

 

      been shot, or the Challenger blew up, or the World

 

      Trade Center had been hit, it is as if the old-time

 

      flashbulb from an old-time flash camera went off

 

      and captured all the details, and you remember all

 

      of those details forever afterwards associated with

 

      the event that you might otherwise have just not

 

                                                               105

 

      even noticed or forgotten.

 

                So, there is a lot of research on

 

      flashbulb memory that shows many of those details

 

      are indeed correct, but some are notoriously false.

 

      For example, there are accounts of people who

 

      remember a certain even with great emotional

 

      aspects to it, and they remember listening the

 

      world series when so-and-so is pitching and it was

 

      the bottom of the 9th, da-da-da, all these details,

 

      and when you go back and check the evidence of what

 

      was going on, on that day and time, that particular

 

      game was not on.

 

                So, that phenomenon number one, flashbulb

 

      memory, and the second is eyewitness testimony.

 

      How you ask a person a question will affect what

 

      answers you get.  So, if you have in the courtroom,

 

      someone who has witnessed a car accident, if the

 

      lawyer asks this witness, "Did you see the broken

 

      glass," then, the witness is more likely to say yes

 

      than if you ask, "Did you see any broken glass,"

 

      because the broken glass presumes that there was

 

      some, and so forth.

 

                So, I take your points seriously about

 

      potential recall bias and reverse recall bias, but

 

      we would have to look at both, whether there is an

 

                                                               106

 

      emotional component or not.  Those who have had an

 

      MI, for example, would have that most likely, but

 

      also how the questions are asked in these surveys,

 

      and it is not trivial how you ask people questions

 

      about were you taking any medications or were you

 

      taking medication X, and for how long, and what was

 

      the dosage, and so on.

 

                So, I don't think that these details are

 

      always published with the studies, and I would like

 

      to encourage people who ask people about their

 

      experiences with drugs, take a look at the memory

 

      literature for some of these points.

 

                DR. WOOD:  Dr. Gibofsky.

 

                DR. GIBOFSKY:  Dr. Graham, I am wondering

 

      if you separated out your populations based on the

 

      indication for which they were taking the drug.  I

 

      ask that because we heard yesterday, and it's well

 

      known, that rheumatoid arthritis is itself a risk

 

      factor for cardiovascular disease, and higher doses

 

                                                               107

 

      of coxibs, in particular celecoxib, are usually

 

      given to patients with rheumatoid arthritis as

 

      opposed to osteoarthritis.

 

                So, I am wondering if you look at that in

 

      your breakdown.

 

                DR. GRAHAM:  Several of the studies that I

 

      reviewed have looked at the indication, but in

 

      automated claims data, it is very difficult to be

 

      sort of be sure does the patient have rheumatoid

 

      arthritis, and there are different algorithms one

 

      could use, but in general, what has been found in

 

      the studies where they have looked at that, that

 

      the prevalence of rheumatoid arthritis in the study

 

      populations has been low, very low, and that its

 

      impact on the results when they adjusted for it

 

      didn't materially affect things.

 

                Now, in the California Medicaid study, one

 

      difference in that study was that our base

 

      population was limited to patients who had

 

      diagnoses of osteoarthritis or rheumatoid

 

      arthritis.  Now, these are diagnoses, and so does

 

      that mean that they really had osteoarthritis or

 

                                                               108

 

      rheumatoid arthritis, I don't know, but when we did

 

      try to eliminate in that study at least were the

 

      people who might be using an NSAIDs for a muscle

 

      injury, a short-term complaint as opposed to a

 

      chronic illness.

 

                In none of those does the presence of

 

      rheumatoid arthritis seem to affect things, but

 

      again I think the prevalence is pretty low in all

 

      of these studies.

 

                DR. GIBOFSKY:  One quick question for Dr.

 

      Platt, if I might.  I need to understand the

 

      concept of survivor bias somewhat in that I think

 

      there is a difference between a patient who is

 

      drug-naive, then put on a drug, and then an event

 

      happens versus a patient who may have seen a drug,

 

      perhaps seen another drug after that, 3 or 4 agents

 

      of the class, and is then switched to another agent

 

      and something happens.

 

                I think we have talked about remote versus

 

      current, but there is also this issue of sequential

 

      effect, and I am wondering how you deal with that

 

      as a survivor, particularly because of the paper we

 

                                                               109

 

      saw a few weeks ago in the Archives suggesting that

 

      discontinuation of an NSAID may itself be a risk

 

      factor for a thrombotic event.

 

                DR. PLATT:  Your point is exactly right.

 

      I think that the concern about survivor bias is

 

      that if we think that some people are particularly

 

      susceptible, which is almost certainly the case,

 

      then, if we start the clock after a person has

 

      already been exposed to a drug or to one that has

 

      the same effect, then, it is very much less likely

 

      that those individuals will have a problem.

 

                That may be the explanation, for instance,

 

      for the reason that the literature was so badly

 

      wrong about postmenopausal estrogens and heart

 

      disease, that most of the epi studies started with

 

      prevalent users.

 

                I think the majority of the studies that

 

      we were reviewing here, these were individuals who

 

      are known to have had at least a year of prior

 

      experience without exposure to the nonsteroidals.

 

                Your study in Kaiser I know was an

 

      exception cohort at least with regard to a year of

 

                                                               110

 

      prior history, but I am not aware that any studies

 

      have a longer drug-free prior interval than that.

 

                DR. WOOD:  Dr. O'Neil, do you want to

 

      comment particularly on this?

 

                DR. O'NEIL:  Yes, this is an important

 

      point and a lot of things have been covered in

 

      Richard's and David's presentation, but one thing I

 

      think that is relevant that Richard did not cover,

 

      that is, the value of a randomized trial, is the

 

      ascertainment and follow-up, and knowing the status

 

      of individuals in the sense of who goes off therapy

 

      and how long they stay on therapy.

 

                That is very critical relative to the time

 

      dependency of the risk.  It was mentioned, for

 

      example, the use in the observational sense of

 

      recent and remote and current use.  Those are all

 

      terms that are nice, but they don't get at the

 

      issue that we are trying to get at with regard to

 

      the clinical trials, and that is essentially when

 

      does time zero start for you.

 

                So, I think the appropriate question to

 

      ask is what is the duration of exposure since your

 

                                                               111

 

      initial exposure to the drug, because I think that

 

      is very relevant to the interpretation of the three

 

      clinical trials that we have, two of which are in

 

      placebo-control populations.

 

                There is a rofecoxib-naproxen control

 

      trial for one years, there is a placebo-control

 

      trial in polyp prevention for three years, and

 

      there is a placebo-control trial in Alzheimer's

 

      disease for four years, and the time dependency

 

      from time zero matters as you have seen in the

 

      plots.

 

                It is relevant to the excess risk

 

      calculation.  So, I would ask the committee, as

 

      well as I would ask David, of the observational

 

      studies that you have reported, how many of them

 

      are cohort studies, and how many of them are able

 

      to identify new initial use, and then track

 

      continued use for that individual, so that one

 

      could look at the relationship between the hazard

 

      rates and the hazard ratios that we are identifying

 

      in the randomized trials and match that to the odds

 

      ratios that are being reported in the observational

 

                                                               112

 

      studies.

 

                DR. GRAHAM:  On one of my initial slides,

 

      you can see what the cohort studies were, and in

 

      some of the nested case control studies, you are

 

      also able to get the time on drug.  Actually, in

 

      Wayne Ray's cohort study, most of these cohort

 

      studies include prevalent and incident users, so

 

      they will do what is called a "new user"

 

      subanalysis, which is to try to get to this issue

 

      of when does time zero begin.

 

                We addressed that problem in our study

 

      here by the inception cohort design in our base

 

      population, so that we can identify what time zero

 

      was for the cases.

 

                Now, none of those studies presented data

 

      in the form of a survival analysis, which I think

 

      in the end, that is what Dr. O'Neil would like to

 

      see.

 

                DR. O'NEIL:  No, my question is not so

 

      much in survival.  I don't believe, and again that

 

      is why I am asking you, I don't think any of those

 

      studies were designed or able to capture the

 

                                                               113

 

      question I am asking.

 

                In fact, if I am not mistaken, in the

 

      Wayne Ray study, he defined new use, but he did not

 

      define any time from new use, which is essentially

 

      critical to when those risks start.

 

                DR. GRAHAM:  That study isn't cited as one

 

      of the studies where we are able to derive that

 

      information.  This slide was a slide that I

 

      presented to show that from the epidemiologic

 

      literature, those studies where the investigators

 

      had identified when time zero began for rofecoxib

 

      use, and they didn't present the data as a survival

 

      analysis, but they identified when time zero began

 

      and then, in various ways, showed you either what

 

      the distribution of the cases were, so that you can

 

      see that it was impossible for the risk to have

 

      been delayed for 18 months, because nobody in the

 

      study used the drug for 18 months, or they parsed

 

      time out and looked at the first 30 days of use

 

      from time zero, and found the risks that they found

 

      down here.

 

                But you are right, those studies aren't

 

                                                               114

 

      designed that way, and we haven't had time in our

 

      Medicaid study to do these analyses yet, but we

 

      have the data to now do the cohort study and time

 

      to event, so we will have an opportunity actually

 

      within the data to actually compare and look to see

 

      exactly the question you are driving at.

 

                But I would say that from the published

 

      data, in each of these studies, time zero for

 

      rofecoxib was identified and in some way or

 

      another, information that I think could be useful

 

      to the committee in establishing when does risk

 

      begin was contained in those studies.

 

                DR. O'NEIL:  Well, the other point here,

 

      which is the value of clinical trials, and it was

 

      the question that was discussed yesterday with

 

      regard to the intent-to-treat analysis, and that is

 

      to say to analyze all outcomes once randomized to

 

      the trial regardless of whether you want to track

 

      the individual to 14 days post-exposure.

 

                You can't really maybe get access to this

 

      information in the observational studies.  That is

 

      a conjecture, but it's one or the other biases, and

 

                                                               115

 

      it was interesting to the comment, whether one

 

      would believe this or not, that discontinuation,

 

      discontinuation from an NSAID alone raises risk.

 

                If that were to be the case, that is a

 

      different analysis altogether.

 

                DR. GRAHAM:  In that actual paper, it

 

      could be that people were discontinuing the NSAIDs

 

      because they were having chest pain and it was

 

      being interpreted as dyspepsia or something, and

 

      then they go to have their infarct.

 

                I mean you are right about that, but this

 

      is the nature of how epidemiology is done, and I

 

      can't change it.  I didn't make the rules, I am

 

      only following them.  Nobody is arguing that

 

      clinical trials, if they could be large enough,

 

      that they would give all of us answers that we

 

      would have greater comfort trusting what they are

 

      saying.

 

                What I am proposing is that we don't have

 

      that kind of data in the clinical trials.  As large

 

      as the clinical trials are, for the questions that

 

      this committee is facing, you don't have the data

 

                                                               116

 

      you need, and what I presented is the epidemiologic

 

      data, and it is imperfect and it has its warts, and

 

      that is why I would emphasize looking at

 

      consistency and trying to sort of derive from that

 

      a general sense.

 

                I mean does it make pharmacologic sense

 

      that you would have an 18-month delay?  I mean I

 

      guess I suppose it depends on what you think the

 

      mechanism of action is for the underlying disease,

 

      but even in the clinical trials, study 090 was 6

 

      weeks long, 12.5 mg, and it had a cardiovascular

 

      effect.

 

                DR. WOOD:  I am happy to facilitate a

 

      discussion among the FDA, but I think we would

 

      rather hear from the committee right now.  Dr.

 

      Farrar, you are next.

 

                DR. FARRAR:  I think that the

 

      recommendations of the committee tomorrow are going

 

      to depend on the assessment of the overall risk and

 

      the overall benefit of this class of drugs.

 

                As a researcher and after all the data

 

      that has been presented, I am more than happy to

 

                                                               117

 

      accept the fact that there are serious risks even

 

      of death from taking NSAIDs.  In fact, though,

 

      there are serious risks in taking any medication at

 

      all.

 

                For some of the NSAIDs, it is

 

      cardiovascular risks, for some of them it is

 

      clearly GI bleeding.  As a doctor, though, who

 

      takes care of patients, I know that treating pain

 

      or not treating pain and not treating the

 

      disability of arthritis also has very serious risks

 

      even of death.

 

                Given the extensive work that you have

 

      done, on the risk of both the cardiovascular and

 

      the GI bleed, I wonder what level of risk is

 

      acceptable you, and remembering that the only other

 

      drugs that are really available is analgesics or

 

      narcotics, and the only other drugs that are really

 

      available in terms of limiting inflammation are

 

      biologics or immunosuppressants, I wonder what drug

 

      is safe enough that you would recommend that I

 

      actually would be able to use it in patients to

 

      prevent some of their suffering.

 

                DR. GRAHAM:  Well, I am not going to give

 

      a product endorsement.  A couple of things, though.

 

                DR. WOOD:  Try and make it brief.

 

                                                               118

 

                DR. GRAHAM:  One, the benefits of the

 

      treatment for the traditional NSAIDs compared to

 

      the COX-2 selective NSAIDs with GI bleed, we have

 

      clinical trial evidence that suggest that there may

 

      be a difference, but here, to me, is an anomaly.

 

                Rofecoxib got the indication for being

 

      GI-protective, celecoxib didn't based on the

 

      clinical trials data you guys looked at yesterday.

 

                There are two published studies in the

 

      literature looking at what I would say is actual

 

      benefit.  There, they were looking at

 

      hospitalization for GI bleed--they didn't look at

 

      death from GI bleed, but I wish they had--but

 

      hospitalization for GI bleed, and what they found

 

      was, in both of these studies, that celecoxib was

 

      actually more beneficial, you know, lower rate of

 

      hospitalization for GI than rofecoxib.  So, that is

 

      the population, two large studies.

 

                You have got your clinical trials that

 

                                                               119

 

      would have said it should be the reverse.  So, I

 

      throw that out as one sort of conundrum.

 

                The second is that I don't think that the

 

      actual benefits of these drugs are understood well

 

      enough to sort of try to weigh these very well.

 

      The case fatality rate for myocardial infarction in

 

      the United States approaches 40 percent.  The case

 

      fatality rate for hospitalized GI bleeding is

 

      probably somewhere around 5 or 10, it is a much

 

      lower case fatality rate.

 

                Nobody that I have seen anywhere has sort

 

      of worked this out very well, so I would submit to

 

      you and to the committee that you actually know

 

      very little about the actual population benefit of

 

      any of these products.

 

                DR. WOOD:  I don't think we are going to

 

      get an answer to that question, so let's move on.

 

                Dr. Nissen.

 

                DR. NISSEN:  Let me briefly answer the

 

      earlier question about what does the hazard ratio

 

      of 1.5 to 2 mean. Before I came to the meeting, I

 

      made a point to look this up, because I thought it

 

                                                               120

 

      would be very relevant.

 

                It is equivalent to raising a cholesterol

 

      from 200 to 260, or taking up smoking.  Another way

 

      for the committee, I mean as a cardiologist I have

 

      to deal with this all the time, the most effective

 

      drugs we have for prevention of morbidity and

 

      mortality are statins, and they reduce risk about

 

      35 percent.

 

                So, a hazard ratio of 1.5 to 2 is really a

 

      very, very big effect when you are talking about

 

      the most common cause of mortality, and that is why

 

      this discussion is so important.

 

                Now, my question is this.  We are going to

 

      be asked to balance risk and benefit, and so the

 

      magnitude of the hazard ratio is very important to

 

      all of us, and I am trying to reconcile what we see

 

      in the randomized control trials with, let's take

 

      rofecoxib for a moment, where it looks like the

 

      hazard ratio in the randomized trials is in the

 

      range of 2, 3, 4, maybe even higher, and in the

 

      observational data it is significantly lower.

 

                I would like to propose a hypothesis to

 

                                                               121

 

      you and just ask you if you think this is right.

 

      In your observational data, you are looking at

 

      mostly short-term exposure, so you are looking at

 

      less than 12 months typically of exposure.

 

                It may well be that the hazard increases

 

      over time, so that by the time you get to 18

 

      months, you can actually see it in a much smaller

 

      randomized trial, and so it doesn't rule out the

 

      possibility that, in fact, both observations are

 

      right, that, in fact, there is an early hazard, but

 

      that early hazard has a smaller hazard ratio than

 

      the hazard at 18 months or 24 months or even 36

 

      months, and if we ever were to look out 5 years, it

 

      might still be increasing.

 

                Do you think that is a reasonable

 

      hypothesis?

 

                DR. GRAHAM:  I think more likely it is,

 

      that in your clinical trials, early on you don't

 

      have enough power to distinguish the risk.  The

 

      hazard is the same, but the lines are closer

 

      together, because we are closer to the origin.

 

                I think one other explanation for the

 

                                                               122

 

      lower risk ratios in observational studies, I would

 

      think is more likely due to misclassification of

 

      exposure and misclassification of outcome.  It is

 

      likely to be nondifferential, so it would tend to

 

      reduce the odds ratios and relative risks towards

 

      1.

 

                Exposure, because people are going to take

 

      it, a lot of these people are taking it on a prn

 

      kind of basis.  In a clinical trial, you have a

 

      greater certitude that they are actually taking it

 

      every day.  That introduces a lot of

 

      misclassification, so the a priori hypothesis going

 

      into an observational study, with misclassification

 

      going on, you are fighting an uphill battle to see

 

      an effect.

 

                DR. WOOD:  We have got lots of people who

 

      want to ask questions.  I want to make sure that

 

      the people who are asking questions have questions

 

      they want to ask for clarification of the speakers

 

      who have spoken rather than just general points.

 

                Dr. D'Agostino.

 

                DR. D'AGOSTINO:  I have a couple of

 

                                                               123

 

      questions along the way here.  I have spent a good

 

      part of my career in the Framingham Heart Study,

 

      and it's an epidemiological study and a cohort

 

      study, and we take joy when somebody runs a

 

      controlled trial on hypotheses and then later on

 

      confirms it.

 

                The first question is I am concerned that

 

      even though you have gone through this careful

 

      analysis, your conclusions are no apparent effect,

 

      probably increased effect, probable increased risk.

 

      They really don't help us in the sense of pinning

 

      things down.  We have a couple of very strong I

 

      think good studies, the APPROVe study and the APC

 

      study as placebo-controlled trials.

 

                Tell us quickly where is the weight of how

 

      we should look at these two pieces, the controlled

 

      trials we have versus what you have produced.

 

                DR. WOOD:  Really quickly.

 

                DR. D'AGOSTINO:  Really quickly, it can be

 

      done quickly.

 

                DR. GRAHAM:  My belief is that for the

 

      controlled clinical trials, for the levels of risk

 

                                                               124

 

      that we are concerned about, that they do not have

 

      the statistical power early on to show risk

 

      differences.

 

                DR. D'AGOSTINO:  I think Bob O'Neil's

 

      comment is very important here.

 

                The other two points, and again I will

 

      make them quick, I am very concerned about the high

 

      dose effect you have, and I am really concerned

 

      about the MI and the number of cases.  I mean blood

 

      pressure, cholesterol, diabetes, smoking, this is

 

      what drives people to have heart attacks and what

 

      have you, and that is completely missing on your

 

      assessment of how many new cases, so I guess it is

 

      more of a comment that I am really concerned that

 

      that sheet needs sobering interpretation.

 

                DR. GRAHAM:  But it was based on the odds

 

      ratios and relative risks where those factors were

 

      adjusted for, so as well as they are adjusted for,

 

      that is what the projection represents, the excess

 

      after adjustment.

 

                DR. D'AGOSTINO:  Yes, but I mean the

 

      comment was made by you, throwing in the analysis

 

                                                               125

 

      doesn't necessarily adjust for them.

 

                The last one, you made a very nice point

 

      about the cardio-protective effect, and you tried

 

      to show that these uses, and what have you, somehow

 

      or other all have the same risk, and your

 

      interpretation that there must be some confounding

 

      going on, why doesn't that hold for all the studies

 

      you gave, why don't that hold for the Solomon

 

      study, which you thought was a great study, yet,

 

      this one result you don't like?

 

                DR. GRAHAM:  For what, the Kimmel study?

 

                DR. D'AGOSTINO:  Wasn't it the Solomon

 

      study that had the naproxen as the

 

      cardio-protective?

 

                DR. GRAHAM:  That is because the cardio

 

      protection was present when they were on the drug

 

      and when they weren't on the drug.

 

                DR. D'AGOSTINO:  I understand what you are

 

      saying, but if that's a problem, then, it means

 

      there is some confounding going on.

 

                DR. GRAHAM:  No, it's selection bias.

 

                DR. D'AGOSTINO:  Well, it's selection

 

                                                               126

 

      bias, but why isn't it for the whole study?  Why do

 

      you throw out a result you don't like and keep all

 

      the results you like?

 

                DR. GRAHAM:  No, that is not what I did.

 

      I pointed out a result where they showed the

 

      presence of the selection bias.  In other studies,

 

      the Ingenix study is the only other study that

 

      looked at this.  I don't have a slide of it.

 

                DR. D'AGOSTINO:  I don't know if it's a

 

      selection bias or misinterpretation of the data.

 

                DR. GRAHAM:  Well, to me it looks like

 

      selection bias.

 

                DR. WOOD:  Let's continue that

 

      conversation later.

 

                Dr. Morris.

 

                DR. MORRIS:  David, would you go to slide

 

      14.  That is the risk, the duration of use.  I

 

      think one of your points was that if you look at

 

      your study, tell me if I understand this right,

 

      that with the lower dose, that the median time to

 

      an AMI is sooner than with a higher dose, did I

 

      understand that right?

 

                DR. GRAHAM:  Yes.

 

                DR. MORRIS:  A month?

 

                DR. GRAHAM:  Had more cases, a greater

 

                                                               127

 

      proportion of our cases, but the other thing is

 

      remember, down here, we are talking about 18 cases

 

      or so.  The N here is small, the N here is like 58,

 

      and the N here is 10.  So, I wouldn't read too much

 

      into the difference.

 

                The more important point is that at the

 

      low dose, nobody was out there beyond 18 months, so

 

      all the action happened before 18 months, and the

 

      same for the others.  I see what you are saying.  I

 

      can only say that is what our data were.

 

                DR. MORRIS:  One interpretation is what

 

      you said earlier, that for this particular drug, we

 

      are talking about, as you said, no safe level.  I

 

      was wondering if that is the way you interpreted

 

      it, that because we are talking about Vioxx here,

 

      and there is no safe level, that something is going

 

      to happen sooner, or is it something with the

 

      populations are different.

 

                DR. GRAHAM:  The populations could be

 

                                                               128

 

      different, but I think, you know, you would expect

 

      the higher dose to have a shorter latency to onset

 

      than the higher dose, but the numbers are so small.

 

                DR. MORRIS:  Okay, it's a small number

 

      problem.

 

                DR. WOOD:  So, the answer is too small

 

      numbers at high dose.

 

                Dr. Boulware.

 

                DR. BOULWARE:  I just want to make sure I

 

      understand something that you had proposed in your

 

      excess population risk slide, if you would put that

 

      back up.

 

                As a rheumatologist, I use these drugs in

 

      a population much greater than what you have here

 

      with a 65 to 74 where the risk of an MI is fairly

 

      high in that group.

 

                Did you want us to believe that this

 

      excess risk that you are proposing would be

 

      extrapolated to other population groups, too?

 

                DR. GRAHAM:  Well, no.

 

                DR. BOULWARE:  Do you have any numbers

 

      that may demonstrate that?

 

                DR. GRAHAM:  Well, the answer to the

 

      second is no. This was an example in conversation

 

      with people planning the talk, to try to help

 

                                                               129

 

      people connect with what it means.

 

                Cardiovascular risks go up.  I mean in the

 

      next age group higher, the risks are higher.  In

 

      the age groups lower, they are lower, but

 

      cardiovascular risk begins to increase in the 40s.

 

                DR. BOULWARE:  I understand, but it

 

      wouldn't be a linear type of thing.

 

                DR. GRAHAM:  No, the background risk isn't

 

      linear, the relative risks, though, are adjusted

 

      out.

 

                DR. BOULWARE:  Because one of the

 

      questions we will be faced with is are there

 

      subpopulations or groups that these may be safe in,

 

      and I just want to make sure I understand the

 

      relative risk in different age groups.

 

                DR. GRAHAM:  Nobody in any of the studies

 

      where they have looked at it have reported effect

 

      modification, which would be that the level of risk

 

      differs at different ages.

 

                DR. BOULWARE:  One more question here.  I

 

      want to make sure I understand.  I think I heard a

 

      comment that says when the risk approaches

 

      2.0--maybe I just assumed that you said this--that

 

      it was an unacceptable level of risk.

 

                Is there ever a case where a drug may have

 

                                                               130

 

      a clinical benefit in which that risk is

 

      acceptable, because for the patients I see, not

 

      giving them any of these drugs will confer a great

 

      deal of risk on them, and physical impairment, and

 

      we have studies that show that the functional

 

      classification of rheumatoid arthritis patients

 

      carries with it a significant mortality as that

 

      class goes up?

 

                DR. WOOD:  I think that is a question for

 

      the committee to answer rather than Dr. Graham.

 

                Let's move on to Dr. Cryer.  Do you have a

 

      question?

 

                DR. CRYER:  I do.  The comment and

 

      question I have of Dr. Graham addresses an issue

 

      that I think is an important difference between the

 

      observational studies and the prospective studies,

 

                                                               131

 

      and this difference relates to assessment of drug

 

      compliance and missed doses, and I think it is

 

      critical as it relates to assessing drugs which

 

      potentially affect platelet function.

 

                A huge difference, as you know, between

 

      aspirin's effect and every other NSAID including

 

      the COX-2 inhibitors, is that with the non-aspirin

 

      NSAIDs, as soon as you remove the drugs, whatever

 

      potential effect they would have had on the

 

      platelet are immediately reversed.

 

                So, with naproxen specifically, my

 

      preconceived bias, which may be wrong, but my

 

      preconceived bias based upon everything I know

 

      about the pharmacology and the things that Dr.

 

      FitzGerald has reviewed for us, is that it should

 

      have some mild anti-platelet effects which would

 

      only be present when the drug is on board in the

 

      system.

 

                So, the specific question is, in the

 

      observational studies, recognizing that in clinical

 

      practice people miss doses of their NSAIDs, they

 

      are not taking their NSAIDs consistently, how do

 

                                                               132

 

      you account for the missed doses in the

 

      observational studies recognizing that this could

 

      potentially lead to a mitigation of whatever

 

      negative effect or positive effect that they may

 

      have?

 

                DR. GRAHAM:  It ends up being

 

      misclassification. Generally, what that means is it

 

      will force the observed level of risk, the relative

 

      risk of the odds ratio closer to 1.  So, if we had

 

      an increased risk, it would make it lower, if we

 

      had a protective effect, it would sort of make it

 

      higher, closer to 1.

 

                DR. CRYER:  Right, we agree on that.  The

 

      specific question is, is there a way to actually

 

      recognize or to account for when people do not take

 

      their doses in the observational databases?

 

                DR. GRAHAM:  No, there isn't, so when you

 

      are studying, say, an increased risk, that is why I

 

      said if you find something, you have to realize you

 

      found it despite the misclassification.

 

                DR. WOOD:  Okay.  Dr. Domanski.

 

                DR. DOMANSKI:  I will save it for

 

                                                               133

 

      tomorrow.

 

                DR. WOOD:  Okay, great.  Dr. Furberg.

 

                DR. FURBERG:  No.

 

                DR. WOOD:  Okay, great.

 

                Dr. Temple, who does speak for the FDA.

 

                DR. TEMPLE:  I am just asking questions.

 

      A couple.  Actually, one point is it seems to me

 

      that since we expect that people are going to be

 

      getting one drug or another, comparisons with other

 

      NSAIDs seems like as good a comparison as we should

 

      make.  You might want to leave out indomethacin if

 

      you are worried about it.  That's one thing.

 

                I guess my main question, though, is

 

      everybody has paid appropriate lip service to the

 

      idea that very small differences are hard to

 

      interpret in epidemiology.

 

                People have said 1.5, 2.  Actually, I

 

      notice in one of his editorials, Dr. Furberg cited

 

      a paper of mine where I said anything less than 2

 

      really needs a lot of questions.  Jerry Cornfield,

 

      who sort of invented all this stuff, used to say 3.

 

                Well, we are talking about differences

 

                                                               134

 

      here that are 0.1 differences, not that they

 

      wouldn't be hugely important if they were true,

 

      that is absolutely true.  So, I guess I want to

 

      know what Richard and you make of all this, because

 

      the numbers are very small, and yet, just as an

 

      example, there is a very great consistency that you

 

      cite that celecoxib looks sort of okay, but you

 

      found one study where there is a little hint that

 

      maybe the higher dose is a problem, and since

 

      probably we all think dose response is likely, that

 

      looks good to you.

 

                DR. GRAHAM:  Two studies, there were 2.

 

                DR. TEMPLE:  Okay, 2.  The valdecoxib

 

      data, which shows nothing, doesn't look so good

 

      because we probably all believe that there is

 

      likely to be a class effect.

 

                What I am asking is, with numbers like

 

      this, how do you know what to do with them?  That

 

      seems very fundamental for the epidemiology.

 

                DR. WOOD:  But, Bob, there are 4

 

      randomized clinical trials here, and your comments

 

      don't apply to them, I assume.

 

                DR. TEMPLE:  No, they don't, although they

 

      are not perfectly consistent either.  But, no, I am

 

      asking, what do we make of differences of this

 

                                                               135

 

      magnitude with everybody having given lip service

 

      to the idea that small differences are hard to

 

      interpret, and yet we seem to be enthusiastically

 

      endorsing them, so I just want to know what Richard

 

      and David think about that.

 

                DR. GRAHAM:  Rich, do you want to go

 

      first?

 

                DR. PLATT:  I think we have to be cautious

 

      about how we interpret it, so I would say the

 

      finding of a relative risk of 3 in an epidemiologic

 

      study, as David found, is meaningful--

 

                DR. TEMPLE:  For high dose rofecoxib.

 

                DR. PLATT:  For high dose rofecoxib.

 

                DR. TEMPLE:  I would not dispute that at

 

      all.

 

                DR. PLATT:  It seems to me that in that

 

      context, that a dose response effect, that the

 

      information about lower doses gains weight by

 

      borrowing from that.  I think that is also worth

 

                                                               136

 

      keeping in mind when, in other studies that are

 

      working in that range that make us all nervous,

 

      there appears to be a dose response effect.

 

                It is the kind of consistency that makes

 

      the study, in my mind, be worth more attention.  I

 

      think there is something to be said for giving more

 

      weight to relatively small excess risks if they are

 

      seen in a number of different environments when we

 

      can't have good reason to think that there is a

 

      similar kind of biases that might be contributing

 

      to it.

 

                After that, I agree with you.  We are in

 

      relatively difficult terrain.  I think that it is

 

      not the same as no data, though.  I think we ought

 

      to distinguish between the situation in which we

 

      have no evidence from ones in which we have

 

      relatively weak evidence.

 

                We didn't talk at all, for instance, about

 

      the enormous number of spontaneous reports of

 

      myocardial infarction following exposure to

 

      nonsteroidals.  There are thousands and thousands

 

      of them.  In my mind, they don't contribute at all

 

                                                               137

 

      to the discussion, whereas, I think these need to

 

      be weighed in the mix when we don't have clinical

 

      trial information to depend on.

 

                DR. GRAHAM:  My answer is similar to his,

 

      but I think that what you are identifying is, is

 

      that we are hitting or at least right now the

 

      frontier is the limits of what the available tools

 

      we have to define the levels of risk that we are

 

      talking about.

 

                We are talking about small levels of risk

 

      that turn out for this particular event to be

 

      enormously important in a population level.  If you

 

      are talking liver failure, we wouldn't be having

 

      this conversation.  For that reason, it becomes

 

      important and what I would say is sort of

 

      emphasizing what Rich said, is I would be looking

 

      for consistency across different studies, and if I

 

      found a number of studies, say, as with Indocin,

 

      for example, to me, that is more persuasive.

 

                If I found a number of studies that

 

      pointed to a particular set of NSAIDs that seems to

 

      have low risks, I would take comfort in that in the

 

                                                               138

 

      absence of perfect information.  I mean some light

 

      in a storm is probably better than no light In a

 

      storm.

 

                DR. TEMPLE:  I take it if the differences

 

      were at the level of 10 percent, 1.1 versus 1.2--

 

                DR. GRAHAM:  I am thinking more in a very

 

      qualitative sense of things that they seem to

 

      cluster around 1.  I mean 1.1 for ibuprofen, it

 

      could be that, for example, may naproxen increases

 

      the risk 3 percent in the real world, we are never

 

      going to figure that out, maybe ibuprofen increases

 

      it 10 percent or 15 percent, maybe we could figure

 

      that out, I don't know, but there is going to be a

 

      place where qualitatively, if we see enough studies

 

      kind of sort of pointing to the same place, you

 

      know, most of them, they are not all going to say

 

      the same thing, there is going to be these

 

      conflicts, just like we have in clinical trials

 

      data.

 

                But if most of the compass arrows are sort

 

      of pointing in the same direction for particular

 

      NSAIDs, I think those are the ones that at least

 

                                                               139

 

      that I sort of place on a suspect list.

 

                DR. TEMPLE:  So, very low hazards need at

 

      least multiple support before they are credible.

 

                DR. GRAHAM:  I think so, and I think that

 

      you want to try to encourage to collect that

 

      information sort of to test that out.

 

                DR. TEMPLE:  Alastair, could I take half a

 

      second to answer a question Larry raised before?

 

                DR. WOOD:  Sure, a second.

 

                DR. TEMPLE:  Well, it's a very good

 

      question, you know, if the drug is going to be used

 

      forever, why don't you study them forever.  The

 

      only thing I would point out here is that what sort

 

      of started people thinking was VIGOR, and VIGOR

 

      didn't take 3 years to show anything, it showed up

 

      in 9 months.

 

                So, what you have seen is for, say,

 

      lumiracoxib, a humongous study of about the same

 

      length, but, of course, they didn't know about

 

      APPROVe, did they, and whatever you think APPROVe

 

      means, whether Bob is right that it's late, or

 

      David is right that there weren't enough cases,

 

                                                               140

 

      people were pointing toward a study that by every

 

      reasonable thought, if you think platelets are

 

      involved, ought to be long enough to show things

 

      up.

 

                But then you form a new hypothesis once

 

      you have APPROVe, and you have to adapt it, and I

 

      think that goes on all the time.  It would not be I

 

      must say for most things my first thought unless

 

      you are looking for cancer that you need a 3-year

 

      study to find it, but maybe you learned that it

 

      does.

 

                Just for what is worth as an example, you

 

      can't get an anti-arrhythmic drug approved in this

 

      country without showing that you don't alter

 

      survival unfavorably.  One result is there are

 

      hardly any being developed, but, you know, we had

 

      bad experiences, we didn't like the results of

 

      CAST, so you change.

 

                I think there is no doubt that things

 

      evolve and you have to expect that, and APPROVe,

 

      depending on what you think of it, changes the

 

      nature of what you expect.

 

                DR. GRAHAM:  Bob, just one point on that.

 

      I think if the APPROVe study had been 5 or 10 times

 

      larger than it was--I am talking about retrospect

 

                                                               141

 

      now--you would be able to answer with much greater

 

      confidence what is happening month 1 to 18.  I

 

      guess what I am saying is that you could also

 

      shorten the latency to identification of a problem

 

      if it turns out that the risk is early on.

 

                DR. TEMPLE:  David, I think that is

 

      entirely possible, and if it involves platelets, I

 

      would believe you, but if it involves a small,

 

      long-term increase in blood pressure, then, I am

 

      not so sure.

 

                DR. GRAHAM:  Right, but we saw yesterday--

 

                DR. TEMPLE:  We don't know.

 

                DR. GRAHAM:  We don't, but if it's

 

      prostacyclin, that effect could occur immediately.

 

                DR. TEMPLE:  Yes, but the blood pressure

 

      effect could be delayed.

 

                DR. WOOD:  Right.  So what, Bob, you are

 

      saying is that it is easy to be a Monday morning

 

      quarterback, but the data were not there before.

 

                DR. TEMPLE:  I would never be that rude.

 

                DR. WOOD:  I think you are right.

 

                Dr. Stemhagen.

 

                DR. STEMHAGEN:  I would like to clarify a

 

      couple things.  First, I am a little concerned in

 

      terms of the unpublished data.  I appreciate that

 

                                                               142

 

      we are able to get data very quickly, right at the

 

      minute that it is being generated, but none of us

 

      have had a chance to really review that, so I do

 

      have some concerns about the weight putting on this

 

      unpublished data when the rest of us haven't had a

 

      chance to look at it.

 

                I think there needs to be some

 

      clarification. There was some discussion about the

 

      recall bias, and so on. Certainly, there is a major

 

      concern about that in case- controlled studies, and

 

      we don't have the questionnaires, but there were a

 

      lot of sort of subanalysis done in the Kimmel

 

      study, about trying to look at whether recall bias

 

      is a problem, and I am not sure that you have

 

      highlighted that enough that looking at all those

 

      different things, there were really no differences

 

                                                               143

 

      found.

 

                Similarly, in the Watson study, it's a

 

      GPRD study, it is different than a lot of the large

 

      databases, the automated databases.

 

                There is a lot more personal involvement

 

      in terms of the data and the data collection and

 

      the adjudication of results, and I think it just

 

      needs to be clear that all of these studies are not

 

      the same in terms of a Medicare study where we

 

      can't go back and validate records.  A lot of them

 

      had a much more careful review, and I am just not

 

      sure that that was totally clear and if you hadn't

 

      read each of the papers.

 

                I would like to just ask a question in

 

      terms of your definition of the inception cohort,