FOOD AND DRUG ADMINISTRATION















                            JOINT MEETING OF






                           ADVISORY COMMITTEE



                               VOLUME II











                      Thursday, February 17, 2005


                               8:00 a.m.








                          Hilton Gaithersburg

                           620 Perry Parkway

                         Gaithersburg, Maryland



                        P A R T I C I P A N T S


      Alastair J.J. Wood, M.D., Chair

      Kimberly Littleton Topper, M.D. Executive Secretary




      Allan Gibofsky, M.D., J.D., Chair

      Joan M. Bathon, M.D.

      Dennis W. Boulware, M.D.

      John J. Cush, M.D.

      Gary Stuart Hoffman, M.D.

      Norman T. Ilowite, M.D.

      Susan M. Manzi, M.D., M.P.H.





      Peter A. Gross, M.D., Chair

      Stephanie Y. Crawford, Ph.D., M.P.H.

      Ruth S. Day, Ph.D.

      Curt D. Furberg, M.D., Ph.D.

      Jacqueline S. Gardner, Ph.D., M.P.H.

      Eric S. Holmboe, M.D.

      Arthur A. Levin, M.P.H., Consumer Rep.

      Louis A. Morris, Ph.D.

      Richard Platt, M.D., M.Sc.

      Robyn S. Shapiro, J.D.

      Annette Stemhagen, Dr.PH., Industry Rep.




      Steven Abramson, M.D.

      Ralph B. D'Agostino, Ph.D.

      Robert H. Dworkin, Ph.D.

      Janet Elashoff, Ph.D.

      John T. Farrar, M.D.

      Leona M. Malone, L.C.S.W., Patient Rep.

      Thomas Fleming, Ph.D.

      Charles H. Hennekens, M.D.

      Steven Nissen, M.D.

      Emil Paganini, M.D., FACP, FRCP

      Steven L. Shafer, M.D.

      Alastair J.J. Wood, M.D. (Meeting Chair)



                  P A R T I C I P A N T S (Continued)




      Byron Cryer, M.D. (Speaker and Discussant)

      Milton Packer, M.D. (Speaker only)




      Richard O. Cannon, III, M.D.

      Michael J. Domanski, M.D.

      Lawrence Friedman, M.D.


      GUEST SPEAKERS (Non-Voting)


      Garret A. FitzGerald, M.D.

      Ernest Hawk, M.D., M.P.H.

      Bernard Levin, M.D.

      Constantine Lyketsos, M.S., M.H.S.

      FDA (CDER)


      Jonca Bull, M.D.

      David Graham, M.D., M.P.H.

      Brian Harvey, M.D.

      Sharon Hertz, M.D.

      John Jenkins, M.D., F.C.C.P.

      Sandy Kweder, M.D.

      Robert O'Neil, Ph.D.

      Joel Schiffenbauer, M.D.

      Paul Seligman, M.D.

      Robert Temple, M.D.

      Anne Trontell, M.D., M.P.H.

      Lourdes Villalba, M.D.

      James Witter, M.D., Ph.D.

      Steven Galson, M.D.

      Kimberly Littleton Topper, M.S., Executive




                            C O N T E N T S


      Call to Order:

                Alastair J.J. Wood, M.D., Chair                  5


      Conflict of Interest Statement:

                Kimberly Littleton Topper, M.S.                  5


      Interpretation of Observational Studies

      of Cardiovascular Risk of Non-steroidal Drugs

                Richard Platt, M.D., M.S.                        8


      Review of Epidemiologic Studies on

      Cardiovascular Risk with Selected NSAIDs

                David Graham, M.D., M.P.H.                      37


      Committee Questions to Speakers                           89


                          Arcoxia (etoricoxib)

                      Merck Research Laboratories


      Sponsor Presentation

                Sean P. Curtis, M.D.                           152


      FDA Presentation

                Joel Schiffenbauer, M.D.                       189



                        Novartis Pharmaceuticals


      Sponsor Presentation


                Mathias Hukkelhoven, Ph.D.                     201


      Gastrointestinal and Cardiovascular Safety

      of Lumiracoxib, Ibuprofen, and Naproxen

                Patrice Matchaba, M.D.                         205


      Open Public Hearing                                      236


      FDA Presentation (Lumiracoxib)

                Lourdes Villalba, M.D.                         336


      Committee Questions to Speakers                          346


      Committee Discussion                                     410




                         P R O C E E D I N G S


                             Call to Order


                DR. WOOD:  Let's get started and welcome


      back to another day.  We are going to begin as on


      the agenda seeing we worked late last night.


                A couple of housekeeping things first.  As


      they say in the movie theater, please turn off your


      cell phones. We don't have the one that sort of,


      you know, spars you into space if you do that, the


      ejector seat, but then please don't answer your


      calls in here, so we don't have to hear the


      beginning of your conversation.


                Kimberly, are you going to read the


      conflict of interest?  Okay.  Go ahead.


                     Conflict of Interest Statement


                MS. TOPPER:  The following announcement


      addresses the issue of conflict of interest with


      respect to this meeting and is made as part of the


      record to preclude even the appearance of such.


                Based on the agenda, it has been


      determined that the topics of today's meeting are


      issues of broad applicability and there are no




      products being approved.  Unlike issues before a


      committee in which a particular product is


      discussed, issues of broader applicability involved


      many industrial sponsors and academic institutions.


      All special government employees have been screened


      for their financial interests as they may apply to


      the general topics at hand.


                To determine if any of the conflict of


      interest existed, the agency has reviewed the


      agenda and all relevant financial interests


      reported by the meeting participants. The Food and


      Drug Administration has granted general matter


      waivers to the special government employees


      participating in this meeting who require a waiver


      under Title 18, United States Code Section 208.


                A copy of the waiver statements may be


      obtained by submitting a written request of the


      agency's Freedom of Information Office, Room 12A-30


      of the Parklawn Building.


                Because general topics impact so many


      entities, it is not practical to recite all


      potential conflicts of interest as they apply to




      each member, consultant, and guest speaker.  FDA


      acknowledges that there may be potential conflicts


      of interest, but because of the general nature of


      the discussions before the committee, these


      potential conflicts are mitigated.


                With respect to FDA's invited industry


      representative, we would like to disclose that Dr.


      Annette Stemhagen is participating in this meeting


      as a non-voting industry representative acting on


      behalf of regulated industry.


                Dr. Stemhagen's role on this committee is


      to represent industry interests in general, and not


      any one particular company.  Dr. Stemhagen is vice


      president of Strategic Development Services for


      Covance Periapproval Services, Inc.


                In the event that the discussions involve


      any other products of firm not already on the


      agenda for which FDA participants have a financial


      interest, the participants involved and their


      exclusion will be noted for the record.


                With respect to all other participants, we


      ask in the interest of fairness that they address




      any current or previous financial involvement with


      any first whose products they may wish to comment




                Thank you.


                DR. WOOD:  Thank you.


                Let's go right to the first speaker, Dr.


      Platt, who is going to tell us about observational




               Interpretation of Observational Studies of


               Cardiovascular Risk of Nonsteroidal Drugs


                       Richard Platt, M.D., M.S.


                DR. PLATT:  Thanks.  The framers of the


      meeting thought it would be useful at this point to


      have a discussion about observational studies to


      put us all on the same page.


                There was a view by some that the


      expertise around the table might be uneven and it


      would be worthwhile to have some discussion about


      some of the basics.  It is clear that that is not


      the case.


                I realize that a number of the people here


      have written a book and several of my teachers are




      here, so to that extent, I think we can either make


      this a quick discuss or use this as an opportunity


      for a real interactive discussion, because there


      are some hard questions here and no matter how we


      sort we out, we are going to be left with less than


      in the way of firm answers than we would like.


                I also understand that there is a point of


      view that says that there are lies, damn lies, and


      observational studies, so part of what I think is


      worth doing is using this time maybe to take our


      temperature about whether and under what


      circumstances we can put weight on observational




                We saw a version of this slide last night


      actually in the last presentation about why perform


      observational studies at all, because I subscribe


      to the general view that all things being equal, a


      clinical trial, a randomized trial is more


      credible, provides more information than an


      observational study.


                The problem is all things aren't always


      equal and so there are reasons to ask what we can




      learn from observational studies.


                I think the most important of them is no


      matter how well a clinical trial is designed, the


      individuals who are recruited and consented to a


      clinical trial are inherently going to be different


      from the actual population of users, and if we want


      to understand how an agent performs among real


      users in the way they actually use the drug, then,


      I think there is no escape but to look to


      observational studies.


                Additionally, observational data is by


      definition there, so when a pressing question


      arises, sometimes observational data is the first


      way we can get insight into the relationship


      between the drugs we care about and the exposures.


                I think in that regard, these studies can


      often be thought of as helping us identify the


      areas in which it would be most fruitful to invest


      in full-blown randomized trials.  We will never


      live in a world where we are able to do all the


      randomized trials we care about.


                I know that Charlie Hennekens' landmark




      randomized trial of aspirin was preceded by, as I


      recollect Charlie, a large number of observational


      trials, it made you think that it was reasonable to


      do those randomized trials, so observational


      studies can be useful in that regard.


                Finally, when we are talking about trying


      to understand effects that are relatively unusual,


      we stress even the largest clinical trials.  We


      talked yesterday about the fact that the most


      recent drug approvals have used much larger


      populations in the NDA phase than had been studied


      in the old days, and yet they are still small


      compared to the numbers needed to parse out


      relatively small differences.


                There are a lot of different kinds of


      observational trials.  I have listed a few of the


      most common.  The ones between the lines here are


      the ones that are really the subject for discussion




                Tom Fleming made the absolutely correct


      and somewhat counterintuitive point that it is


      often more difficult to do good observational




      studies of relatively common outcomes than rare


      ones, and because of that, the group of studies


      that I think at least are reasonable to consider


      for looking at relatively common outcomes are


      case-control studies, nested case-control studies


      and cohort studies.


                We have examples of each in the materials


      that have been handed to us.  The study by Kimmel


      is a pretty traditional case-control study.  The


      studies by Ray are cohort studies, as is the Aramis


      study.  The study by Dave Graham, the Solomon study


      are nested case-control studies.


                Just as a quick reminder, the


      distinguishing feature of cohort studies is the


      fact that the study population is defined on the


      basis of whether people are exposed to the drug or


      not, and then we look forward to what happens to


      them.  In that way, they are exactly comparable to


      clinical trials, with the big difference that the


      assignment to drug is not randomized.


                The strengths of those compared to


      case-control studies are you have a reasonable shot




      at the outset of selecting individuals who are


      representative of the group that you are trying to


      study, and if you organize the study properly, you


      have a reasonably good chance of getting unbiased


      exposure assessments.


                The weaknesses, particularly of


      observational cohort studies is that just because


      individuals had the right drug exposure at the


      outset, they may change that.  You can deal with


      that with an intention-to-treat design, but you pay


      for a price for that, and in observational studies,


      loss to followup is a big problems.


                We are particularly plagued by that


      because the large majority of the observational


      studies we are working in are ones that use


      administrative data from one sort of health plan or


      another, and individuals move in and out of health


      plans, so that it becomes difficult to follow them


      over time.


                Case-control studies, remember are ones


      that start with individuals who have the outcome we


      care about, myocardial infarction or myocardial




      infarction and sudden death, and compares them to


      individuals who haven't had that experience, then,


      you look back and ask what their drug exposures


      are, the reasons for doing those studies are that


      they are, first of all, very efficient studies.


                You don't have to study thousands and


      thousands. You can study as many cases as you find


      and a reasonable number of controls, and you can


      look back and classify exposure however is most


      useful, and that is a very convenient and versatile


      feature of case-control studies.


                The big weaknesses are that it is very


      hard to assure oneself that the cases and the


      controls are really representative of the


      populations that you care about, and for


      conventional case-control studies, for instance,


      the study by Kimmel that we are going to look at,


      it takes a lot of work to be sure that people who


      know what they have already experienced an MI don't


      differentially report their exposure to the drugs


      that we care about.


                That can be for all sorts of reasons and




      it might not even be wrong, but the individual who


      has had an MI and might be just thinking harder


      about whether he or she had been exposed to a drug


      that we care about.


                By the way, nested case-control studies,


      for instance, the study that David Graham did is a


      hybrid that really, in my view, draws many of the


      strengths from both designs, that is, because


      nested means the case-control study is nested in a


      defined population, so it has a lot of the


      strengths of cohort studies and some of the


      efficiencies of the case-control studies.


                The differences between the observational


      studies and randomized studies are pretty clear.


      Randomized trials have the tremendous advantage


      that there is lots more reason to expect the


      treated and untreated groups to be comparable to


      one another.


                There is a lot more opportunity to be sure


      that the outcome assessment and adherence to


      treatment are good or at least well known, and we


      have reviewed the difference for the observational






                I think it is worth making the point that


      there are a substantial number of similarities


      between observational and randomized studies.  Just


      because we randomize individuals in randomized


      studies, it doesn't mean that the treated and


      untreated groups are comparable.


                We talked about a study yesterday that was


      a randomized trial where there was a substantial


      imbalance in important risk factors.  So, it is


      incumbent no matter what kind of study you do, I


      think to look for comparability, and both studies


      have as potential weaknesses that there are risks


      of false positive results and doing subgroup


      analyses and multiple comparisons increases that




                We talked a fair amount about that


      yesterday, and both are at risk for false negative


      results.  That can be partly because the studies


      may not be powered well enough either because there


      is insufficient sample size or individuals aren't


      studied for a long enough duration to see the




      biological effects that we care about, or a


      vulnerable group just isn't included.


                That is a problem with both kinds of


      studies and I think all studies have to be


      evaluated on their own merits, so let's just step


      through the various places where observational


      studies might be into trouble or at least the


      things that need careful assessment when we look at


      these studies.


                The first is are we studying the right


      outcomes. It is essentially impossible in any of


      these observational studies to use the kind of


      rigorous adjudication that is a hallmark of the


      randomized study, so I think we are going to have


      to ask ourselves are these outcomes good enough.


                The several kinds of outcomes in the


      studies that we have been asked to look at are


      hospitalized MIs.  The case-control study by Kimmel


      uses survivors.  It had to use survivors because


      they were collecting the exposure information by


      interview after the individuals had left the


      hospital, so if we care about all MIs, then, that




      study isn't going to tell us what we want to know.


                Some of the studies use MI and


      out-of-hospital sudden death by linking to vital


      statistics records.  I think that is probably the


      closest we can get in observational studies to the


      intention-to-treat all outcome designs of the


      randomized trials, and some of the studies use


      composite designs.


                You have to ask are these outcomes


      measured appropriately.  Most of the studies that


      we are looking at use some form of automated


      medical record or claims data that have been, in my


      view, reasonably well validated.  That is, there is


      a moderate literature showing that claims data are


      not so bad for studying acute myocardial


      infarction. They have sensitivities in the 90s and


      positive predictive values in the 90s.


                So, they are not perfect and I think we


      will have to ask as we review the studied can the


      amount of uncertainty that we know exists in those


      account for the effects that we see, or could they


      obliterate effects that we would like to see and




      which aren't there.


                My sense is that that is probably not a


      sufficient explanation to dismiss the studies that


      we are looking at. The issue of bias is one that I


      think always has to live as a sub-text, but quite


      frankly, in the studies that do outcomes in the way


      we have been describing, I don't think that is a


      serious problem.


                For cohort studies, we have to ask are we


      studying the right population, and here I think we


      really do have to stop and ask carefully.  One is


      are these people selected from the population under


      study.  I think in most of these examples, they are


      reasonably representative, that is, a study of the


      people of Ontario or members of a large health




                I think that the data systems that are


      used to identify the individuals in the cohort are


      good enough to give us reasonable belief that we


      are identifying either all the people or a


      representative sample of them.


                I think there is a fair question of




      whether they are representative of the larger


      population.  We could ask are health plan members


      systematically different from the general


      population of individuals who are taking these




                The range of studies we have include


      health plan members.  I think that there is


      reasonable information that they probably are


      representative, at least with respect to the drug


      myocardial infarction outcomes that are studied.


      Studies in Medicare and population-based studies,


      such as those in Canada, I think also give us


      reason to think that they are representative.


                But there is an important consideration


      about whether there are issues about the way


      clinicians practice in those setting that might


      have a serious impact on selecting individuals.  In


      particular, to the extent that formularies are


      restrictive of, say, newer or more expensive drugs


      like the COX-2 inhibitors, but I think we have to


      ask very carefully whether the factors that would


      influence the prescribing of one class of drugs




      over another is likely to seriously impact the risk


      of these outcomes.


                Additionally, if there are cost


      differentials for these drugs, it may be that there


      is some form of self-selection that causes


      individuals who are sicker to receive these drugs,


      and I think that it is incumbent on us to expect


      that to be a problem in every one of these


      observational studies and to ask how well do these


      studies do in adjusting for that.  I will circle


      back to that in a moment.


                I think we have to be concerned about


      whether we are studying people who have had prior


      NSAID exposure, in which case we would be worried


      about survivor biases, of finding the individuals


      who are relatively immune to these problems.


                Finally, there are study design issues


      about whether there are restrictions of eligibility


      that might importantly color the data.  For


      instance, at least one of the studies we are


      looking at requires individuals to have received at


      least two dispensings of a nonsteroidal agent in




      order to be eligible.


                That means that you have to live long


      enough to have two dispensings, so it certainly


      doesn't tell us anything about the early effects of


      these drugs, and it might in an important way color


      the results with regard to later exposure.


                There is an important question which is


      not unique to the observational studies, which is


      who are the right comparators.  We had a number of


      discussions about that yesterday.  I think that all


      the issues that we discuss with regard to the


      clinical trials are applicable here.  In


      particular, there is a lot of reason to want to


      compare to other nonsteroidal users because that


      gives the best chance of having a group that is


      similar with regard to underlying disease status


      and presumably risk of myocardial infarction.


                Similarly, it is possible to say that if


      you really care about COX-2 selective agents, you


      should compared one COX-2 selective agent to




                That leaves us in the uncomfortable




      situation of not knowing what is the risk compared


      to no use at all, so we have some comparisons that


      do look at non-users or at least remote users, and


      that has its strengths.  It has the big weakness,


      of course, of putting us at risk of making


      comparisons against groups that are unrelated.


                So, we are really talking here of mostly


      about a study like the Kimmel study, not the nested


      case-control study.  The other kinds of concerns


      that raise red flags are the real concern about


      losing cases who make the group who are studied




                I would point out to you, for instance,


      that in the Kimmel study, only half of the MI


      survivors who were identified were actually


      interviewed and therefore part of the formal




                We already talked about the fact that


      since that study was limited to MI survivors, that


      restricts us to a less serious set of outcomes.


                The other problem that really bedevils


      conventional case-control studies is knowing




      whether the group of people who are selected as


      comparators are really comparable.


                I think that is one of the reasons that


      there is so much interest in doing nested case


      control studies, because at the end of the day it


      is really extremely difficult to satisfy oneself


      that controls really are appropriate.


                Much of what we need to be concerned about


      in these studies is understanding exposures.  Part


      of the issue is understanding how to characterize


      exposure.  This is both a strength and a weakness


      of these studied.


                You will remember I made the point at the


      outset that if we want to understand how drugs work


      in actual practice, that we have to do


      observational studies.  On the other hand, that


      means we have to find a reasonable way to


      characterize these drugs.


                We talked yesterday I think about all the


      important issues of understanding whether we had to


      look at absolute dose or cumulative effects or


      whether the effects start early or whether they




      start late.


                I think that the best of the studies that


      we are looking at tackle a number of these issues.


      I will mention in a minute some of the ways that


      these studies have gone about that.


                I think in terms of ascertaining exposure,


      it is probably reasonable to put the most reliance


      on the studies that use administrative databases of


      pharmacy dispensing, but I will just make the point


      that we have to be clear that these studies are


      done in situations where we have reason to expect


      that the administrative databases are correct.


                I think all the studies we are reviewing


      are ones where the investigators were careful to


      know that the individuals really had a drug benefit


      that was operating at the moment, that would likely


      find the prescription drug exposures that we care


      about, but as a general proposition, you can't


      assume that that is the case.


                Most health plans have some kind of


      restrictions on benefits that might lead


      individuals to change their benefit status, so




      there would be periods of time when we might know


      that they had an MI, and we might not know that


      their drug exposure is at the moment.


                I will return to a point that we touched


      on yesterday, which is that although almost all of


      the studies that we are talking about report their


      results as relative risks, a 2-fold increase in


      risk, a 70 percent decrease in risk.  What we


      really care about is the absolute difference in




                So, that is not different between


      observational studies and randomized studies, but I


      think it is really a critical piece of our thinking


      about the problem that we are dealing with.


                The second thing that is just worth


      recalling is that when we talk about a 95 percent


      confidence interval, that our expectation about


      where the true value lies is not uniformly


      distributed over that interval.


                Our best guess about where the true value


      lies is around the point estimate, and if that


      point estimate is wrong, the large majority of the




      uncertainly is pretty close to that point estimate,


      so that it is particularly not helpful, in my view,


      to pay enormous attention to p values.


                The difference between a p value of 0.05,


      as shown here, and a p value of 0.01 and a p value


      of 0.13 is not all that enormous in terms of the


      biological impact.


                I think one of the things that is a


      particular concern that we need to pay attention to


      in these studies is the fact that it is easy to


      look at a lot of different comparisons, and to the


      extent that we do that, we are going to have to


      just be careful to know that the strength of any


      one comparison is weaker than it appears to be.


                For instance, this is a quote from one of


      the studies that we are looking at.  We undertook


      an observational study examining the association


      between rofecoxib, celecoxib, other nonsteroidals


      and myocardial infarction.


                Well, there is no primary hypothesis


      there, and the results for all of the


      nonsteroidals.  They are all interesting to look




      at, they are all associated with p values.  Those p


      values are all relatively too extreme given the


      fact that there are so many comparisons.


                It is a problem for randomized trials.  We


      talked about subgroup analyses.  It is important to


      do those studies, those subgroup analyses, but


      absent having specified a principal hypothesis at


      the outset, I think that we have difficulties in


      knowing how much weight to put on any particular




                We talked a lot about confounding.  That


      is one of the most important concerns in randomized


      trials.  I know you all know what confounding is.


      It wasn't obvious to me when I was making these


      slides that everyone knew that, but the example, so


      that we have it in mind is if what we know is drug


      A versus drug B, and MI or no MI, and we don't take


      into account important confounders, we can get


      importantly incorrect results.


                So, here is an example of an aggregate


      analysis with a relative risk of 1.5 among 2,000


      people who are exposed to two drugs.  If you break




      it apart and see that in the high-risk group, drug


      A accounted for 80 percent of the exposure, and in


      the low-risk group, drug B accounted for 80 percent


      of the exposure, you see that in each of those two


      categories, the high-risk group and the low-risk


      group, that, in fact, there is no association


      between drug and outcome, but you have to take them


      apart to do that.


                Well, the good news is if you know what


      the confounders are, and you have measured them


      accurately, it is possible to adjust for them, and


      all of the studies we are looking at do a pretty


      job of adjusting for the confounders that we know


      about, so I guess one of the questions is how well


      do they do at identifying the important




                I would say not bad on a lot of that.


      That is, if you take, for example, the Graham study


      or the studies that Wayne Ray did in Tennessee


      Medicaid, there are a number of strengths.  I will


      sort of stop and back up on the things that make


      these look like relatively more credible studies in




      the scheme of the factors that we care about.


                They are inception cohorts of nonsteroidal


      users, that is, they are individuals who had to


      have been members of the health plan for at least a


      year before they received their nonsteroidal.


                There was a lot of information about their


      underlying medical status that was available to the


      investigators using both claims data and medical


      record data to ascertain cardiovascular disease


      along a number of dimensions, utilization of


      procedures like surgery or angioplasty or


      diagnostic procedures that are intended to find


      cardiovascular disease, hospitalizations, emergency


      room visits, and a substantial amount of


      information about the medications that these


      individuals took that was related to or plausibly


      related to cardiovascular risk factors.


                Those large number of factors were used to


      create separate risk models using only the


      unexposed, and then to use those risk models to


      create risk indexes for the individuals to use as


      an adjuster for underlying cardiovascular risk.


                Is it perfect?  No.  Is it pretty good?


      It seems to me that it meets the sniff test of


      saying that it has a reasonable chance of




      identifying important confounding.


                Unfortunately, there are a number of


      important confounders for which health care systems


      typically don't have good data, like smoking, OTC


      NSAID use, obesity, family history, and those are


      typically much more problematic.


                Some of these studies have worked pretty


      hard to try to either deal with it or understand


      whether it could be an important problem.  One of


      the handouts we had, for instance, was the study by


      Schneeweiss and colleagues who looked back at one


      of the studies by Solomon that was performed in the


      Medicare data set, and asked how important could


      these unmeasured confounders be.


                They actually had access to information


      from the Medicare Beneficiary Survey that asked


      representative Medicare beneficiaries detailed


      questions about many of the things that we would


      are about.  They weren't the people who were




      involved in that case-control study, but if you


      assume that the beneficiary survey, members were


      representative and they gave plausible answers, it


      is possible to extrapolate back to the source


      population, and the take-home message from that


      work, the answer didn't change very much, which is


      really what we want to know, not sort of the


      absolute difference, but whether those unmeasured


      confounders are important enough that they could


      cause a difference.


                I think we still have to be concerned at


      the end of the day, we still have to be concerned


      about residual confounding as a potentially


      important problem.


                One way I think that we can draw relative


      assurance from that work of adjusting for


      confounding is to ask how much did the estimate of


      risk change between the unadjusted and the adjusted




                I think there is a world of difference


      between an unadjusted result of 10 and an adjusted


      result of 1.5, and having an unadjusted result of




      1.6 and an adjusted result of 1.5.  The former, I


      think the reasonable assumption is we arguably


      haven't been able to deal with confounding in a way


      that would let us believe that 1.5 means something.


                I think there is a much stronger case to


      be made when adjusting for important confounders


      that we know about doesn't change the risk estimate


      very much, that that is a relative more credible




                Having said that, I think that


      observational studies are best at finding relative


      risks that are more than 2.  I think that I would


      pay some attention to relative risks of 1.5.  I get


      very nervous about adjusted relative risks of 1.2.


                That doesn't mean that they are not right


      and I don't ignore them, but if we ask is that for


      sure the answer, my response to that is I am just


      less certain about that.


                I think we are always left at the end,


      while we spend a lot of time thinking about and


      adjusting for confounding, and I think we can do a


      pretty good job of that, it is much harder to




      adjust for misclassification, and it is essentially


      impossible to adjust for bias.


                So, I think one of the things we have to


      ask about is are there plausible sources of


      misclassification and bias, and if there are, in


      which direction do they work and would they


      seriously change our interpretation.


                We talked about the fact that absolute


      differences are the important ones that we care


      about.  We have already started to look at data


      that talks about person level risk and population


      level risk, so beyond saying that at the end of the


      day, I think these are the answers that we really


      need to talk about, not about relative risk.


                Personally, I think that we need two kinds


      of answers.  One is what is the information that


      patients and their physicians need to have to make


      decisions for them personally about whether to


      accept certain kinds of treatments in exchange for


      certain kinds of anticipated benefits.


                I think there is a population level


      concern that we have to have that emerges from the




      same set of analyses, but takes on a different




                So, you will be pleased to know that I am


      wrapping it up now, and I would say that both the


      cohort and nested case-control designs, which are


      the bulk of the observational studies that we are


      looking at, are relatively strong ones and I think


      deserve the committee's real attention.


                I am sorry that not every one of these


      studies prespecified a primary hypothesis that we


      can attend to, but we should whenever possible do


      that.  Even though we don't find important effects


      in some of these studies, I think it is important


      to recognize that they don't exclude one.


                As I have said, I am least certain about


      attaching great weight to relatively small excess


      risks even understanding that when they are


      extrapolated to a large population, they could


      account for very important public health problems.


                Finally, I would say that the things that


      support the studies' conclusions are the fact that


      when we do subgroup analyses and look for




      dose-response effects, that they strengthen the


      cause-effect relationship, and I think that there


      is reason to look for consistency across studies.


                I take the point that was made yesterday


      that it is possible that a dozen studies of


      naproxen could all have the same underlying bias


      that shift the point estimate in the same


      direction, but it is not so clear to me what that


      bias is.


                So, I think that we would have to have a


      reasonable idea of what might explain consistent


      differences across studies and ask if they are of


      sufficient magnitude to explain that.  As I say, I


      am not clear that there are those kinds of biases.


                I think we have to be cautious about the


      fact that residual confounding bias and


      misclassification are all issues with these


      studies.  So, I think that while they add to our


      discussion, they have to be considered in light of


      the fact that they are imperfect vehicles.






                DR. WOOD:  Thanks very much.


                Let's just go straight on to the next


      speaker and then we will take questions for Dr.




      Platt after David Graham's talk.


                The next speaker is Dr. David Graham from


      the FDA.


                   Review of Epidemiologic Studies on


                Cardiovascular Risk with Selected NSAIDs


                       David Graham, M.D., M.P.H.


                DR. GRAHAM:  Good morning.  Today, I will


      give a review of epidemiologic studies and


      cardiovascular risk with selected NSAIDs.  I will


      be evaluating epidemiologic data from the published


      literature plus two currently unpublished studies


      that I have evaluated.


                My focus will be on providing estimates of


      risk of acute myocardial infarction in the setting


      of the use of COX-2 selective NSAIDs or naproxen,


      although I will have some comments in light of


      yesterday's discussion about other NSAIDs on those,


      as well.


                The methodology was to do a PubMed search




      by specific NSAIDs and then cross-check the


      citations in those articles to see if there are


      other articles I had missed.


                I would also like to take this moment to


      thank Dr. Crawford for his leadership in making it


      possible for me to present some of our preliminary


      data from a study in California Medicaid, which Dr.


      Gurkiepal Singh from Stanford and I recently




                Before I get into the substance of my


      talk, I just want to comment a little bit on excess


      cases and projecting to the national population


      what was the impact of rofecoxib use, and I am


      doing this for two reasons - one, because it has


      been a source of controversy and concern.  We cite


      a number in a paper that I and others have


      published from Kaiser Permanente in which we made


      an estimate of the impact of rofecoxib use.


                Tomorrow, FDA will present its estimation


      of the number harmed by rofecoxib, modeling


      randomized clinical trial survival curves.  A


      couple of things I would like the Committee just to




      be aware of when they see that data tomorrow.  It


      assumes a grace period at the beginning of use that


      is based on the VIGOR study and the APPROVe, 6-week


      grace period in which there is no difference in MI


      or increased risk of MI, and the first six weeks of


      high-dose use with the first 18 months of low-dose


      use of rofecoxib.


                As I will show later in my talk, I believe


      that this is unreliable due to low statistical


      power early on, because we are only talking about


      in each of these studies a handful of cases early


      on in the study.  Two or three cases of MI and wide


      confidence intervals, you could have divergence of


      the curves very early.


                The epi studies, however, that I will


      present will show that there is a 3- to 50-fold


      more events to work with, more statistical power,


      and it suggests a different outcome.


                The second is, is that the patient


      enrolled in randomized clinical trials are


      generally healthier than patients in the real


      world.  So, if you are going to model what is the




      number of people who have been harmed in the


      population, you have got to assume what is the


      background rate that you are modeling off of.


                If you use a background rate from healthy


      people to model what is happening in the population


      of people who really aren't so healthy, who have a


      higher background rate, you will underestimate the


      actual population impact.


                So, in any event, now on to the substance


      of my talk.


                The next three slides provide a very dense


      overview of the major features of each of the


      epidemiologic studies that I reviewed.  I am


      looking at COX-2 usage in acute myocardial




                You can see that they are grouped in


      several groups.  The top three studies I consider


      from an epidemiologic perspective to be stronger


      studies to have been done better.  In terms of the


      things that Dr. Platt just talked about, I thought


      that these studies were the stronger studies.


                The next two studies from the published




      literature I thought were less strong, and I will


      describe why.  Finally, I have separated out these


      last two studies, one submitted by Merck to the


      FDA, performed by Ingenix, and the other, the


      Medi-Cal study that Dr. Gurkiepal Singh and I have


      recently completed of unpublished studies, so they


      are separated out from the group.


                You can see we are talking about different


      source populations, and so if we can see


      consistency of results across different


      populations, different age groups, and different


      study designs, I think that that adds support to


      the notion that there is a real effect.


                If we begin to see that there is a lack of


      consistency across the studies, then, many of the


      things that Dr. Platt talked about before need to


      be considered sort of the individual level of the


      studies, so what might explain why one study shows


      something and another one doesn't.


                This next slide shows the case definitions


      and in a number of cases that we were working with


      to come up with the relative risk estimates that I




      will show you.


                All of the studies began with hospitalized


      acute myocardial infarction.  Several of the


      studies were able to link members of their base


      cohorts to death certificate data to identify


      sudden cardiac deaths, as well.  So, those are the


      ones that have the +Sudden Cardiac Death.


                The asterisk next to the Kimmel study is


      to remind me and to remind you that the Kimmel


      study was based on nonfatal MIs only.  By their


      design, they had to interview their cases in


      person, so the patient had to survive their


      myocardial infarction to be interviewed.  So, there


      are those differences in study design.


                In the end, what is very important in an


      epidemiologic study in dealing with this issue I


      think in particular, is what is the statistical


      power of the study, and that is driven primarily by


      the number of events in the exposed group that we


      have to deal with.


                So, in this column here, you will see the


      total number of cases of myocardial infarction that




      were identified in each of the studies.  The


      asterisk next to the Ingenix study 628 is to remind


      me that in that study, they identified about 1,700


      MIs in total, but they excluded 1,100 of the MIs


      because they occurred in people who weren't exposed


      to an NSAID at the time of the myocardial


      infarction.  So, as a result, they left them out,


      because in the previous slide, when we look at the


      reference group, most of these studies used either


      non-use or remote use as the comparator.  The


      Ingenix study used active treatment with either


      diclofenac or ibuprofen.


                I would like to say one thing about


      reference groups.  Dr. Platt brought it up before.


      In this issue, I don't believe that there is a


      single best or optimal reference group.  What you


      really want to do is get as close as you can to a


      placebo group that has been randomized and has all


      the risk factors of the people who are getting the




                In the observational world we can't get


      there, and so at the end of the day, if you want to




      do a study, you are in a sense forced to pick among


      the least evil of that you think, and then it has


      to do with how you define things.


                So, non-users, for example, could be


      viewed as being close to the placebo group, they


      are not getting the drug.  The problem is people


      who don't use drugs tend to be healthier than


      people who do use drugs, so that raises a host or




                Yes, we can try to adjust for confounding


      and the like, but you are still left with that


      concern that they may be, in some way that we can't


      measure, different from the people who get the




                In the study I did, and in several other


      studies that people have done, we opted to use


      people who had been treated with NSAIDs in the


      past, but weren't currently taking an NSAID at the


      time of the event or the study, the reasoning there


      that whatever the selection factors are that lead


      to a patient getting an NSAID, that some of those


      selection factors are there in people who




      previously received NSAIDs.


                That is still not a perfect group, though,


      because you could argue that patients who are no


      longer taking NSAIDs might be healthier than people


      who are currently taking NSAIDs.


                Finally, the problem that is posed by


      using an active comparator.  If you have an active


      comparator, and I am comparing another drug to an


      active comparator, and I see a difference, I don't


      know what it means.  I need some place to anchor


      the result, and for that reason, although none of


      them are perfect, I believe that the non-use and


      the remote use analyses at least give us a way of


      pegging results, and if we want to compare one drug


      to another drug, if we had that common reference


      point, at least it allows us to accomplish that.


                The one other thing I would like to point


      out about the number of cases is that for


      rofecoxib, especially at the high doses of


      rofecoxib, most of these studies had relatively few


      exposed cases.  The exception is the California


      Medicaid study where we had 157 exposed cases to




      the higher dose of rofecoxib.


                Now, this is a very busy slide and I won't


      spend a lot of time going over it, but I will be


      happy to answer questions later.


                Basically, before we heard there are


      unmeasured risk factors in automated databases that


      frequently can't be accounted for, aspirin use and


      smoking are among the most common.  So, you can see


      here that most of these studies, that information


      isn't obtainable.


                Kimmel was able to get both because they


      interviewed the patients, the cases and the


      controls.  In the Medi-Cal study, it turns out that


      aspirin is reimbursed, and so we have a handle on


      it there.


                In the Graham study, a survey of controls


      was done to see what these unmeasured factors might


      look like in the source population.  The Solomon


      study did the same thing, relying on the Medicare


      Beneficiary Survey that Dr. Platt talked about




                Important limitations I think that need to




      be highlighted are that in the Mamdani study, they


      excluded patients who had less than 30 days of


      NSAID use, so the survivor bias Dr. Platt talked


      about before, in my view, is  big concern with this


      study, and for that reason I ranked it in sort of


      that category of low quality studies.


                In the Kimmel study, as Dr. Platt also


      mentioned, there was low participation rate.


      Basically, half of the cases and half of the


      controls who approached volunteered to be in the


      study.  More importantly I think in that study, and


      it's unfortunate, is that there was what I would


      refer to as the potential for, in quote "reverse


      recall bias."


                Normally, with recall bias, we think oh, I


      have had a heart attack, I am going to remember


      more efficiently what happened to me immediately


      before the heart attack compared to some control


      where I say to the control what were you doing four


      months ago on this particular day.


                That is the classic recall bias.  This


      situation I think had what I would describe as




      reverse recall bias.  They interviewed the people


      who had heart attacks within four months of getting


      out of the hospital - what happened to you the day


      and the week before you had your heart attack four


      months ago.


                For the controls, they call them on the


      phone and they way what happened to you yesterday


      and the week before, so it is actually the reverse.


      The controls actually would have better recall of


      what they were actually doing than the cases


      potentially, and we will see how this is reflected


      in some of the results.


                Finally, with the Medi-Cal study, I think


      the single greatest concern for the committee in


      considering these data (a) that it is preliminary


      data, and (b) that this is a new database for


      research purposes.


                For that reason, I am just including a


      slide to orient people to that.  The other


      databases are ones that have been used before.


      This is a database that only in the last two years


      has come online to be sort of a quality sufficient




      to begin contemplating doing studies.


                Its strengths are that it is very large,


      it captures aspirin use, it doesn't censor people


      by age.  It combines Medicare coverage when you go


      over the age of 65 with the prescription benefits


      of Medicaid, so you get the drugs and the outcomes.


                Matching has been done to multiple cause


      of death tape, so that we have death data in this


      database up through 2002.  We didn't include it in


      the data I will show today because we really want


      the information up through 2004.


                Once people get into Medicaid or Medicare,


      they don't tend to drop out.  The limitations are


      that we can't get medical records, and that is


      something to understand, and that is a very


      complicated database.  Dr. Singh from Stanford who


      is the principal investigator for our Medi-Cal


      work, and who has worked to bring this database


      online, spent two years putting things together and


      working out the kinks in it before contemplating


      doing research with it, so at least you understand


      the limitations of that.


                There is always the concern about


      unmeasured risk factors and Dr. Platt talked about


      that.  I want to review for you very briefly some




      of the evidence from the published literature where


      efforts were made to look at what unmeasured


      confounding looked like and did it differ across


      NSAID type.


                In our study using Kaiser Permanente data,


      we did a survey, a random survey of random sample


      of controls, and we looked at aspirin use, smoking,


      and over-the-counter NSAID use.  You say see by


      NSAID that there really was not significant or


      substantial differences in the distribution of


      these risk factors.


                So, if they don't vary in the control


      group, they can't really confound that observation


      that you see very much.


                In the Solomon study, these are the data


      from the beneficiary survey.  Dr. Platt already


      mentioned a further analyses of these data that


      showed that the actual impact of all these


      unmeasured confounders on the measure of the




      relative risk at the end was measured in the


      hundredths of an odds ratio, so if the odds ratio


      was 1.34, adjusting for these things and projecting


      it out would change it to maybe 1.35 or 1.33.  We


      are talking about minuscule differences, not


      qualitatively important differences.


                Finally, in the Kimmel study, they also,


      through their interview, were able to see that for


      most of these factors, there was similarity across


      NSAID groups except for current smoking where the


      rofecoxib group had much lower current smoking than


      any of the other NSAID groups, but for past


      smoking, it was more than the other NSAID groups or


      the remote groups, and if you added these two


      together, the rofecoxib was very similar to these,


      but the celecoxib group had more smoking.


                My own conclusion from this is that yes,


      it is possible that some of these unmeasured risk


      factors could be influencing the results.  I don't


      think that there is strong evidence that there is a


      systemic bias that would sort of lead to


      interfering with trusting the results and thinking




      that these factors are confounding the observations


      that we see.


                So, first, I will talk about rofecoxib,


      then I will talk about celecoxib, then I will talk


      about valdecoxib in terms of epidemiologic data.


                These studies on the left, with their


      reference groups, are the ones that looked at


      myocardial infarction with rofecoxib.  What I have


      shown is for all doses and where it was present


      less than or equal to 25 milligrams and over 25


      milligrams, what the fully adjusted odds ratio and


      95 percent confidence intervals were.


                These studies varied in the extent of


      adjustment that they did.  The Ray and the Graham


      studies each adjusted for about 30 cardiovascular


      risk factors.  The Solomon study was a somewhat


      smaller number, Mamdani was a somewhat smaller


      number.  Kimmel, they adjusted for somewhere in the


      20s, the Ingenix study somewhere in the 20s, the


      Medi-Cal study adjusted for about 40 cardiovascular


      risk factors.


                What you can see is when you look across




      the All Doses is that, in general, the point


      estimates were elevated and for many the 95 percent


      confidence intervals excluded 1.


                More importantly, though, is looking at


      the low dose and the high dose data because we know


      from the clinical trials data, and we would suspect


      it on just pharmacologic grounds, that if there is


      an association that it might be worse with the


      higher dose than with the lower.


                So, four studies provide us estimates at


      the low and the high doses, the Wayne Ray study and


      our study from California Medicaid, and then the


      two unpublished studies, one from Ingenix and the


      other from California Medicaid.


                We see there that in three of the four


      studies, there is an elevation in the point


      estimate.  In the Graham study, it included one.


      When we look over 25 mg, we see greater consistency


      although in the Ingenix study, there is this


      paradoxical finding of sort of basically a neutral


      relative risk.  I don't have an explanation for why


      that happened, but it makes me concerned to some




      extent about what was going on in that study,


      because it is a result that goes in a very


      unexpected direction.


                What I would like to point out, because I


      will come back to it again, is that when we are


      dealing with drug safety, and the goal now is what


      risk can I exclude, if my job is--now I am not


      talking about efficacy anymore, what I am talking


      about is safety--if my job is to protect the public


      from harm, what risk can I exclude based on the


      data that I have, I believe that is much more


      relevant to look at the upper bound of the


      confidence interval than the lower bound.


                What traditionally happens is we look at


      the lower bound of the confidence interval and we


      say if it includes one, there isn't a problem, but


      the biggest reason, as Dr. Platt showed in his


      previous slide, for a wide distribution and a wide


      confidence interval in your study, is that the


      study doesn't have enough statistical power to get


      you a narrow enough confidence interval to say that


      you have the 95 percent certainty that you want.


                So, if your mission is above all else I


      want to do no harm, that I want to protect patients


      from harm, then, based on the data you have, I




      would submit that the upper bound of the confidence


      interval provides greater assurance to patients,


      and then if you are going to compare a benefit to a


      drug, that you might want to consider that benefit


      against that upper bound of the confidence


      interval, because that is compatible with the data.


      In any event, that is my view, and not the FDA's.


                This is a slide from California Medicaid.


      It is preliminary data and I wanted to present it


      to you, because what it shows is a dose-response to


      rofecoxib from 12.5 mg up to and through 50 mg.


                You can see that we have very wide


      confidence intervals for some of them, and that is


      a reflection of the limited number of cases, but I


      want to point your attention to the very narrow


      confidence intervals in the 12 to 25 mg and in the


      25 to 50 mg, just to point out that in the previous


      slide here, where we are talking about what are


      these point estimates, that now you can what we




      have done is we have fleshed them out a little bit




                Another comparison that I think is


      important to consider, certainly it was for us,


      when we did our study in Kaiser Permanente, was at


      the time there were two COX-2 selective inhibitors


      on the market, celecoxib and rofecoxib.


                The bigger study raised a question about


      high-dose rofecoxib.  Our question as researchers


      was, and public health scientists, was, well, let's


      suppose that rofecoxib increases the risk of


      myocardial infarction.


                We don't know that it does, but let's


      suppose that it does, what about celecoxib, because


      it actually had a larger share of the market, and


      if it turned out that these drugs have a benefit,


      and that benefit is worthwhile, then, it would make


      more sense from a practical perspective to use the


      drug that had a better safety profile.


                So, to us, it was very natural to want to


      compare rofecoxib to celecoxib, and so several of


      the epidemiologic studies felt similarly and in




      their design they included that analysis, and some


      of them it was, as Dr. Platt said, part of a we are


      going to make comparisons of everything against




                The Solomon study, for example, did that.


      They did not state in that study what their prior


      hypothesis was. In our study, we did state it.  I


      mean yes, in a sense we had multiple comparisons,


      but we were interested in two different things.  We


      were interested in rofecoxib versus remote use, and


      we were interested in rofecoxib versus celecoxib,


      but we thought it beforehand and we planned that




                But in any event, what we say is, when you


      look at the all dose analysis, in all of the


      published studies, rofecoxib increased the risk


      compared to celecoxib.  When we looked at low dose


      rofecoxib, we see the increased risk.  When we look


      at the high doses of rofecoxib to celecoxib, again,


      we see the same pattern.


                Dr. Platt, in his talk before, talked


      about relative risks, risk differences, individual




      risk, and population risk.  The next two slides are


      intended to address this at the level of the


      individual and at the level of population.


                What I have done on this slide--and these


      slides now, no one should interpret this as meaning


      this is what actually happened in the


      population--the next slide is going to have numbers


      on it that are for illustrative purposes only, to


      help the committee understand what does a relative


      risk of 1.3 translate into at the individual level


      and at the level of population.


                Your typical COX-2 user is somebody in


      their 60s who has several other health problems, so


      I went to the National Center for Health Statistics


      and got the myocardial infarction rate for 65- to


      74-year-old men in the United States.  That rate


      turns out to be 1 per 50 per year.


                What I did is I took that as the


      background rate and I said if I have an individual


      using this drug with that background rate and then


      I applied to that person the relative risks or odds


      ratios found in these studies that are shown in the




      previous slides, what would the excess risk to the


      person be, sort of what would that risk difference


      translate to for the individual.


                For example, in the Ray study, if you


      remember, for 25 mg or less, the odds ratio was


      1.02.  Basically, it doesn't change.  If we based


      it on the point estimate, that 0.02 would translate


      to 1 out of 2,500 in a year increased risk of heart




                Another way to view that number is, is


      that is the number needed to harm.  If I treated


      2,500 65- to 74-year-old men for a year with


      rofecoxib, and the rate was 1.02 that Ray found,


      treating 2,500 patients would produce 1 extra heart




                Now, with the other studies that found


      higher estimates for the lower doses of rofecoxib,


      you can see that the number needed to harm ranges


      from about 90 to 200.  That is saying for every 90


      people to every 200 people I treat with low-dose


      rofecoxib, I would generate 1 other case.


                For high doses, because the relative risks




      were higher, the number needed to harm becomes




                I have also shown it based on the upper


      bound of 95 percent confidence interval to show you


      that based on the data we have at hand, these are


      the excess risks that are consistent with the data,


      and from a public policy perspective, from a public


      health perspective, that is what I react to, and


      when I want to see a benefit and say does benefit


      exceed the risks, well, I want to know what is a


      real benefit in the population in terms of reduced


      hospitalization, lives saved, and does that benefit


      exceed what I can say is possibly the risk of these




                At the population level, now we have gone


      from an individual.  Remember in the Wayne Ray


      study we said it is 1 out of 2,500.  Well, that


      would translate to 400 additional cases of heart


      attack if we treated a million men who were 65 to


      74 years old, and we treated them with rofecoxib


      low dose for a year.


                With the others, you can see that those




      relative risks that might not look so impressive,


      that 1.23, that 1.30, that 1.4, that it projects


      out to a substantial number when you multiply it by


      the large number of people who use these products.


                For high doses it ends up being even


      greater, and then if we focus on the upper bound of


      the confidence interval, we again see that the


      numbers are larger still.  This very high number in


      our study was the result of our having low


      statistical power in addressing the high dose




                One other question that I think is


      important to consider is when does the risk of


      myocardial infarction with rofecoxib kick in.  Now,


      we have seen data yesterday presented by both FDA


      and by Merck of various survival curves.


                We saw the bigger curve that showed the


      separation after about 6 weeks with an overall


      relative risk of about 5.  We saw, for the APPROVe


      study, this close overlapping line at about 18


      months, and then they diverge with an overall


      composite hazard ratio of about 2.


                I would submit to the committee that the


      reason for the failure of these studies to show


      divergence of the line shortly after the drugs are




      used are low statistical power, that they just


      don't have enough events to show it, and as a


      result, you can interpret because of the low


      statistical power you basically--how to describe


      it--you presume that there is nothing there, and


      you err on the side of the drug rather than erring


      on the side of what could the risk be to the




                If you really want to know what is going


      on in the population, then, you want to reduce the


      uncertainty.  The more uncertainty you have, if you


      act basically on the lower bound of that confidence


      interval, which is what you are doing when you are


      saying the risk doesn't begin until 18 months, you


      are basically saying that the absence of evidence


      is evidence of absence.


                I would say that in safety, what it is, is


      you just don't have enough power.


                Looking at the epidemiologic studies, I




      think that we have evidence to suggest that the


      risk begins much earlier.  I will point it out, and


      you guys and women can consider it for yourselves.


                In the Graham study, when we looked at low


      dose and high doses of rofecoxib, 50 percent of our


      cases at the low dose and at the high dose had used


      at the time--remember these are inception cohorts,


      so these people, their total use, this was 1.8


      months, this was 2.7 months--50 percent of our


      cases occurred within 2 to 3 months of starting the




                That is a lot of power and that really


      speaks against the notion that the risk is


      backloaded, you know, it is for the low dose, that


      the risk doesn't happen until after 18 months.


      Nobody in our study was on rofecoxib for more than


      about 15 months.  I think that was the longest


      duration of use we had in our study.


                Now, in the Solomon study, they looked at


      the low dose and the high dose, and they presented


      data in several ways.  One is that they grouped


      things in 1 to 90 days, and what they showed was




      that for both the low dose and the high dose, there


      was evidence or risk early on.


                The Kimmel study, for all its


      deficiencies, most of it was low dose rofecoxib,


      and almost all the patients used it for less than


      12 months.  So, their finding on rofecoxib, if


      anything, would also speak to that the low dose


      effect kicks in long before 18 months.


                Finally, the Solomon and the Ingenix study


      looked at the first 30 days of use of these


      products, and both of them found elevated odds


      ratios of 4 for cardiovascular risk in the first 30




                Now, in both of these studies, they didn't


      separate it out by low dose and high dose, so this


      is a composite, but in both studies, about 85


      percent of the use to 90 percent of the use was low




                So, basically, what I am concluding from


      this slide is that risk of myocardial infarction


      with rofecoxib begins when rofecoxib use begins,


      and that the inability to separate out those curves




      is based on the fact that if you were to count the


      actual number of events in the bigger study in the


      first 6 weeks, we are probably talking about 3 or 4


      events, and if you look at the confidence


      intervals, you are going to see they are wide.


                For the APPROVe study, the same thing


      holds, that you have too few events.  The whole


      study had 45 events, and I don't recall how many of


      those were on rofecoxib and how much of those were


      on placebo, but when you think about it, compare


      that and then look at the epidemiologic studies,


      and look at the number of cases that were in the


      epidemiologic studies, and for all their problems,


      and we can talk about those, they suggest there is


      a big discordance, and I think the answer, the


      reason is absence of statistical power in the


      clinical trials.


                In the epidemiologic literature, this has


      been recognized, and people have written papers


      saying that when you are trying to summarize the


      overall risk from a survival study, and you want to


      look at specific time periods, that you are better




      off taking the overall risk estimate for the entire


      study than focusing on a small segment at a time


      because of this issue of low statistical power, so


      I didn't invent this.


                Now, switch over to celecoxib.  There are


      a number of studies that have been done to look at


      celecoxib risk.  What I have tried to do here is


      plot out for you the relative risk or the odds


      ratio, the author of the study, and then the point


      estimates in the 95 percent confidence intervals.


                What you will see basically is that for


      most of these studies, there is no evidence of a


      protective or an injurious effect except for the


      Kimmel study that found a substantial protective




                Remember the Kimmel study and what I


      believe is this reverse recall bias, as well as the


      low participation rate, and I personally discount


      that study.  The committee can decide for


      themselves that they want to do.


                What about celecoxib lower dose versus


      higher dose?  Well, unfortunately, the only place




      where this is adjusted, is looked at are in the two


      unpublished studies. We have the Ingenix study and


      we have the Medi-Cal study.


                What I would focus your attention on are


      the low dose and high dose, the low dose and the


      high dose.  What we see is in both studies,


      evidence of a dose response.  Now, the 95 percent


      confidence interval in the Ingenix study includes


      1, but the point estimate is pretty elevated.  That


      is 1.18 or so at 400 mg.


                In the Medi-Cal study, we go from 1.01 up


      to about 1.24.  Here, you can see the 95 percent


      confidence intervals.


                What I would conclude from this, although


      they are unpublished studies, that there is


      evidence of a dose response at the higher doses of


      celecoxib do confer an increased risk of myocardial




                I should point out that in the Medi-Cal


      study, the methodology that we used in that study


      is the exact methodology that we used in our Kaiser


      Permanente study that Dr. Platt before was gracious




      enough to say is one of the better done studies.


                There are no published studies on


      valdecoxib, so what do we do?  Well, preliminary


      data from Medi-Cal, we had 54 exposed cases and we


      found a point estimate of 0.99.  Now, this was


      mostly 10 and 20 mg use.  I think that out of all


      the patients that we had in the study, there were 2


      or 3 who had 40 mg valdecoxib use.


                In Medi-Cal, they only reimburse for the


      10-mg tablet, and they do this in an effort to try


      to discourage people having larger dose tablets and


      then taking more of it.


                So, this is all the epidemiologic


      information that I am aware of, that I have had an


      opportunity to review on valdecoxib.


                I will now move to naproxen.  The issue of


      naproxen is important for several reasons.  One,


      with the VIGOR study, the medical community was


      confronted with the hypothesis that naproxen was


      the single greatest and most effective


      cardio-protectant in the history of mankind, that


      it was far better than aspirin.


                We heard yesterday that aspirin reduces


      cardiovascular risk about 20 to 25 percent.


      Naproxen, if we were going to believe the VIGOR




      results, would have to reduce the risk of


      cardiovascular events by about 80 to 85 percent.


                So, this stimulated a lot of research.


      Here, I have summarized in the same fashion as I


      did for the rofecoxib studies, the various studies


      that have been done. Again, I have separated them


      out by the studies that I think are better done,


      the studies that have more significant limitations,


      and then the two unpublished studies.


                I point out the Rahme study to say that


      the only reason the Rahme study is listed among


      this group of suboptimal studies is that its


      reference group was other NSAIDs, primarily


      ibuprofen, because ibuprofen was the predominant


      other NSAID used in Quebec during the study.


                Again, we have the various outcomes that


      were done.  What I would point is that you can see


      the number of cases that we had to work with in


      these various studies, and I would point out that




      for the Solomon study, they had about 240 MI cases


      that they studied overall, but as you will see in a


      few minutes, that exposure could occur anytime in


      the past 6 months, so they don't see in the paper


      how many people were actually on naproxen at the


      time they had their event, so I can't put down a


      list of how many people were currently exposed.


                The Watson study is the only study that


      used a composite outcome.  It included myocardial


      infarction, stroke, subarachnoid hemorrhage, and


      subdural hematoma.  Why subarachnoid hemorrhage and


      subdural hematomas are in there is beyond me.  In


      any event, 26 cases of that composite outcome and a


      much smaller number of actual myocardial


      infarctions.  So, that is why that asterisk is




                With the Ingenix study, the asterisk next


      to the 179 is that this included both prevalent and


      incident cases, and the best studies, the best


      results come if you base it on incident cases only


      or incident use only as opposed to prevalent use,


      because prevalent use can have survivor bias. But




      in any event, in the Ingenix study, they had a


      number of different analyses, and they didn't


      always use their full number of cases.


                There are important limitations to note.


      I think the one to focus is to realize (a) there is


      no perfect study, we have talked about that before,


      and, two, that among all the limitations listed


      here, I think the most important one to note was in


      the Watson study, was this composite outcome which


      really just makes it very difficult from an


      epidemiologic perspective to study things.


                Myocardial infarction is very well


      validated in claims data, and Dr. Platt has already


      gone over that with you.  Stroke is notoriously


      difficult to work with in claims data, and subdural


      hematomas most commonly occur because as people get


      older, their brains shrink.  They bump their heads


      and then they get a little bleeding on the surface


      of the brain.  What that has to do with myocardial


      infarction risk, which is what we are really


      concerned about today, is beyond me.


                I have got two slides on the results. 




      This slide shows the studies that found no


      protective effect.  There is four studies that


      found a protective effect, and I am saving them for


      a separate slide, because I want to look at those




                What you can see from the majority of


      these studies, and I would point out that the


      studies that were the best done studies in the top


      tier, they are on this slide, that all of them sort


      of suggest that there is no cardio-protective


      effect of naproxen.  Several of the studies point


      to the possibility of a small increased risk with




                But we have four studies of positive


      results, and we will probably all remember the


      Archives of Internal Medicine publishing three of


      the articles in the same issue with an accompanying


      editorial that stated the issue is solved, naproxen


      is cardio-protective.


                I want to look at those studies and just


      describe to you my view of them.  The top three


      studies were the ones that were--well, no, not the




      Kimmel study--Rahme, Solomon, and Watson were the


      Archive studies.


                In the Rahme study done in Quebec, they


      compared current naproxen use versus other NSAIDs.


      That other NSAID was, by and large, ibuprofen, and


      they found a protective effect. Well, if ibuprofen


      increases the risk of myocardial infarction, let's


      just say that it does, and naproxen doesn't,


      naproxen could look like it's protective compared


      to ibuprofen, but not be protective really.


                The data presented in that paper, if we


      re-analyzed it versus non-use, we get an odds ratio


      of 1.28, statistically significant.  Now, this is


      not adjusted.  It is not possible from the data


      there for me to adjust this result, but based on


      what is in the paper, when you compared the


      unadjusted to the adjusted point estimates, they


      don't change very much, and what that suggests to


      me is that this effect, this 0.128 is probably not


      far off the mark.


                That would then make it comparable to the


      analyses I showed on the previous slide, that all




      of these slides use non-use or remote use, so then


      it would add a fourth study to an elevated point


      estimate for naproxen.


                Now, the Kimmel study, we have already


      talked about low participation rate and this


      reverse recall bias, and a small number of NSAID


      cases.  In fact, they don't even tell us in the


      paper how many cases they had.


                We move on to the Solomon study.  This was


      the result that was reported in the paper and was


      picked up by the press, a 16 percent reduction in


      heart attack risk with naproxen.  The problem, in


      my view, was that their definition of exposure in


      the study was any use of naproxen in the past 6


      months, which means that if I took naproxen 6


      months ago and stopped it, I could be included in


      this study as being exposed to naproxen.


                So, the question is then, you know, how do


      we interpret the study.  Well, Solomon was good


      enough to present data by current use and in recent


      use, and recent use included people who stopped


      their naproxen.  Their naproxen prescriptions day




      supply ran out between 1 day and 60 days before the


      MI or the index date for their controls, and remote


      users, their NSAID use, their naproxen use ended


      from 61 days to 180 days prior to the event.


                So, let's look at what those results are


      then, and what we see is they are identical.  So,


      unless the committee is prepared to believe that


      naproxen confers lifetime immunity to


      cardiovascular disease, I think we have to conclude


      from these data that what we really have here is


      selection bias, and it is not the fault of the


      investigator. Dr. Platt talked about before that


      there are some things you can't adjust for.  You


      can't adjust for bias.  What you can try to do is


      identify bias, and if you identify it, then at


      least you know what you are dealing with.


                Here, I think we have what is classic


      selection bias.  It is not naproxen that protects


      you again myocardial infarction, it is some other


      factor that in this health plan, that they used to


      study this drug, the patients who were being


      treated with naproxen happened to have lower




      cardiovascular risk.


                I can't explain why that happened.  Dr.


      Solomon probably can't explain why it happened, but


      it's not due to naproxen.


                Finally, the Watson study.  This study was


      sponsored by Merck, and it was authored by Merck


      investigators.  The result that was published as


      being the basis for the conclusion was this top


      result, a 39 percent reduction in cardiovascular




                First, I just want to remind everybody,


      composite outcome here, subarachnoid hemorrhage,


      subdural hematoma, stroke, as well as heart attack,


      26 events total, much smaller number of heart




                For this event, you can see the


      checkmarks.  These are the various variables that


      they adjusted for in the study.  The way they


      handed cardiovascular risk, if you read the paper,


      I would have to say that it doesn't measure up to


      the standards that were set by Dr. Wayne Ray.


                We modeled our study in Kaiser and in




      Medi-Cal, and Dr. Wayne Ray, I think that he has


      set the standard for how one needs to go about


      adjusting for cardiovascular risk. It is not enough


      to rely on diagnoses.  You have to use the


      medications, because medications are much more


      accurate predictors of disease than diagnoses in


      these administrative claims data.


                In any event, they didn't adjust for


      cardiovascular risk, and they didn't adjust for


      smoking although they had that data.  Then, they


      present later on another analysis that now includes


      cardiovascular risk and it is no longer, in quotes,


      "statistically significant," and then they include


      smoking, and again it is not statistically




                My conclusion on the Watson study was that


      (a) they have got a composite outcome that, in my


      view, isn't very informative towards the question


      of myocardial infarction; (2) that it is very small


      numbers; (3) that a variety of approaches were used


      in the analysis that inadequately account for the


      risk factors that could confound the result, so I




      have discounted that, as well.


                So, a conclusion when I look at these, in


      quotes, "4 positive studies," I conclude that none


      of them provide credible evidence of a protective




                In light of yesterday's discussion in the


      afternoon about other NSAIDs and what might explain


      the differences, let's say, celecoxib and rofecoxib


      studies, the rofecoxib studies used naproxen as a


      background, a comparator, the celecoxib studies


      using ibuprofen or diclofenac.


                Dr. FitzGerald is talking and saying,


      well, you know, all of these drugs could increase


      the risk because what is happening, you know,


      biochemically, with the balance of prostacyclin,


      could be influenced by these different drugs in


      ways that aren't immediately obvious or detectable


      in a clinical trial.


                I thought I would just share some of that


      information on other NSAIDs with the committee,


      recognizing a couple things that no single study is


      definitive and what you want to look for I think is




      consistency across studies, but as far as


      randomized trials go, I would like just to mention


      that there are generally too small, too few events,


      and you are not going to get the answers that you


      need from them unless you make these clinical


      trials substantially larger than anything people


      have contemplated up to now.


                So, from our California Medicaid study, it


      is all preliminary and it has not been published,


      for ibuprofen we found a small but statistically


      significant increased risk. For indomethacin we


      found a risk of 1.7.  I would like to say on


      indomethacin that we found an increased risk with


      indomethacin in our Kaiser Permanente study.  It


      was 1.3 and it was highly statistically




                In at least two other studies that I


      reviewed in preparation for this advisory meeting,


      indomethacin is noted to have an increased risk of


      myocardial infarction.


                It is not commented on in the text because


      that wasn't a primary analysis, but what I am




      talking to you about now is consistency, and I


      would submit to the committee that indomethacin is


      a lot of smoke, there is a lot of smoke for




                In our study, in our Kaiser study, for


      example, we did not think in advance to look at


      indomethacin separately. I mean we knew we were


      going to look at it, but it wasn't a primary


      hypothesis.  We didn't adjust for gout.  I mean


      everyone knows that indomethacin gets used in gout.


      Gout increases the risk of cardiovascular disease.


                Well, in the Medi-Cal study, we adjusted


      for gout. Yes, gout increases the risk of


      myocardial infarction.  It didn't change the odds


      ratio here.


                I think this next finding, Meloxicam, is


      important.  Meloxicam is now the number one selling


      branded NSAID in the country.  With the removal


      from the market of rofecoxib, the medical


      community, shying away from the coxibs, are moving


      to other drugs that they perceive would have the


      advantages of COX-2 selectivity without the bad rep




      that coxibs appear to be acquiring.


                So, you now have a shift in the


      marketplace to Meloxicam.  There have been articles


      in the Wall Street Journal and the New York Times


      on this.  The company recently raised the price on


      the tablets.


                In any event, we are presenting these data


      just to say that we found an increased risk.  It is


      one study, but I think it is the only study.  We


      looked at this in Kaiser.  Meloxicam is almost not


      used in Kaiser, so we couldn't study it.


                In our California Medicaid study, we only


      looked at drugs that had more than 50 currently


      exposed cases.  Nabumetone came out in this study


      as not showing a whiff of a problem.  Sulindac,


      there was an increased risk.


                Regarding ibuprofen, in our Kaiser study,


      we found an increased of 1.06, which sounds really


      trivial.  It wasn't statistically significant, but


      the confidence intervals were pretty narrow.  It


      was 0.96 to 1.17.


                My concern is, as Dr. Platt talked about,




      you know, above 2 you feel really comfortable,


      above 1.5, you can believe it, below that you begin


      to get really edgy.  The problem is most of the


      risks that we are probably facing, if it turns out


      that the non-coxib NSAIDs increase the risk of


      cardiovascular disease, that is where the risk


      level is going to be, and that is what we are going


      to have to contend with, because it has tremendous


      effects on the population.


                Finally, dose response.  This slide shows


      for diclofenac.  This is from California Medicaid.


      What we wanted to do was show evidence of dose


      response, consistency in the data.  Remember we


      pointed out diclofenac before.  Diclofenac in this


      study overall did not have an increased risk, but


      at the high doses there is a suggestion of a dose




                I will skip that.  This slide was to say


      that depending on your reference point, you can get


      different results, if I use an active comparator


      versus remote, and this is showing the three NSAIDs


      from California Medicaid compared to non-coxib




      NSAIDs, and you can see the rofecoxib is different


      than them, and the other two aren't necessarily


      that different.


                My conclusions, and I am sorry to have


      gone so long.  Celecoxib, we believe that based on


      the evidence we have at hand, that there is no


      apparent effect of risk at doses of 200 mg or less.


      Above 200 mg, we think that there is evidence of


      increased risk.


                For rofecoxib, we believe that there is


      evidence of increased risk at both the lower doses


      and the higher doses, and that risk begin early in


      therapy and is apparent during the first 30 days of




                With valdecoxib, there is a paucity of


      information, but the information we have at this


      time suggests that the risk is not increased at


      doses of 20 mg or less.


                As a class, non-coxib NSAIDs may increase


      the risk with differences between each of the


      NSAIDs.  I don't think we are going to be able to


      talk so much about class effects. In the end, it is




      going to have to be looking at individual drugs.


                The COX-2 hypothesis may be true, but if


      it is, we are still going to have to look at these


      other drugs in terms of their individual properties


      and what they do.


                Finally, naproxen is not




                Thank you.




                DR. WOOD:  Thanks very much.  David, it


      will come as no surprise to you that every time


      practically I pick up a newspaper, I read about


      what you are not going to tell us.


                So, my question to you is what have you


      not told us that you think we should know, because


      I would like to make sure.  Lots of other people


      have shown up here without slides that they forgot,


      so I just want to be sure that if there is anything


      else we need to hear, we hear it.


                DR. GRAHAM:  Well, as far as the science


      goes, I think I presented the evidence that I am


      happy to be able to share with the committee that I




      thought it was important for the committee to have


      an opportunity to hear.


                The source of controversy surrounding my


      presentation related to the unpublished studies


      that I was going to be permitted to present or


      asked, actually asked to present the Ingenix


      results, the unpublished study from Merck, but that


      I was being told not to present the unpublished


      data from the California Medicaid study, and


      personally, I had great difficult standing here


      before this committee as an investigator and as a


      scientist, as a physician, and telling you the


      information that I have, that I am allowed to talk


      about, and remaining silent on things that I know


      about that I am not allowed to talk to you about.


                Fortunately, Dr. Crawford exercised great


      leadership in making it possible for me to present


      that data, recognizing it's preliminary, but the


      methods that we used are identical to our Kaiser


      study for the California Medicaid, and for me, I


      think the big reservation is, is that it's an


      untested database, but I think that everything that




      could be done to develop the database and to do


      quality assurance and to work out the kinks has


      been done.


                If you look at the findings in the


      California Medicaid study and you compare them to


      the clinical trials data, and the anomalies and the


      questions that you were discussing yesterday about


      the clinical trials' data, you look back at the


      California Medicaid data, and you are going to see


      I think great consistency between the findings that


      might help explain and interpret some of the things


      that seemed questionable or uncertain yesterday.


                So, in any event, I have been able to


      present what I thought was important to present,


      and I am happy to have had that opportunity.


                DR. WOOD:  So, the answer is we have seen


      it all, is that right?


                DR. GRAHAM:  You have seen it all.


                DR. WOOD:  Okay, good.  Let me ask you a


      question. If you go back to your slide that showed


      the excess population risk, put that in proportion


      for us in terms of, say, the other drugs that have




      been withdrawn from the market.  I mean what sort


      of numbers would we be expected to see?


                DR. GRAHAM:  That is a great question.


      The typical drug that has come off the market in


      the United States, like the leading cause of drug


      withdrawals in the United States in the last 20


      years has probably been acute liver failure.


      Rezulin came off the market because of it,


      troglitazone, bromfenac, a number of other drugs.


                Acute liver failure in the general


      population has a background rate of about 1 per


      million per year.  We are talking about that is the


      rate of being struck by lightning, 1 per million


      per year, and these drugs were pulled off the


      market because it increased the risk of that.  It


      might increase the risk 5-fold, it might increase


      the risk 10-fold, it might increase the risk


      100-fold.  The fact is the background rate was 1 in


      a million and what that means is that the actual


      number of people affected is sort of measured in


      the tens and the hundreds for the liver failure


      that could be life-threatening.


                In this situation, and this is why the


      lower relative risk becomes so critical, we are


      talking about a serious event that has a very high




      background rate.  Heart attack is not a rare event,


      and as I pointed out before, there is a 1 in 50


      chance that the average American male age 65 to 74


      is going to have a heart attack this year, 1 in 50.


                That is an extraordinarily high risk.  You


      increase that risk 5-fold with a high dose.  That


      is what happened with VIGOR.  If I have got


      millions of people taking the high doses, and that


      is what had in the United States, and I have


      increased the risk 5-fold, you are going to get


      numbers that balloon out like this.


                So, there is no comparison in terms of


      what the population impact is of the typical drug


      that has come off the market in the United States


      and what we are dealing with here, and that is


      because of the high background rate of the


      underlying event that we are talking about.


                DR. WOOD:  So, this would produce many


      more cases from what I understand.


                DR. GRAHAM:  Many more.


                    Committee Questions to Speakers


                DR. WOOD:  From the committee, we have


      questions.  Let's start with Dr. Shafer.


                DR. SHAFER:  Dr. Graham, tomorrow we are


      going to be asked, as a committee, to consider the




      question about a class effect for the selective


      COX-2 antagonists and for the non-selective NSAIDs.


                One of the things that I am finding, that


      I am having trouble putting together here, is we


      have a lot of conflicting data, and for the COX-2


      antagonists we have a lot of data from randomized


      controlled trials.


                Certainly for the NSAIDs, we are going to


      have to go with a lot of these observational


      studies because we don't have a lot of data on the


      topic at hand from randomized controlled trials.


                As I look at this, if we come up with some


      sort of common warning as a class, and it applies


      to everything, we have, in fact, communicated no


      relevant information.  On the other hand, if we are


      going to come up with individual drug-specific




      recommendations, we are going to have to have very


      different evidentiary standards in some ways,


      because for some of these, we have very little


      information, as you pointed out, and yet your data,


      particularly the unpublished data from the Medi-Cal


      trial, and I appreciate that there is all the


      issues of not being previewed and stuff, but we are


      all familiar with that process and know how it




                What can you tell us to guide us?  Should


      we try to go drug by drug specific?  How do we set


      our evidentiary standards when we talk about class


      effects where in some cases, we are just not going


      to have a lot of data here?


                DR. GRAHAM:  Right.  What you are going to


      be getting now, of course, is my opinion, not FDA's


      opinion. Probably if you were to talk to Bob Temple


      or John Jenkins, or anybody else, everybody is


      going to have a slightly different answer.


                What we talking about now I think to some


      extent is philosophy, so what that preamble, first,


      I believe based on the evidence that there is a




      COX-2 effect and that that COX-2 effect is dose


      dependent, and that we see evidence of that with


      rofecoxib, with celecoxib, and with valdecoxib.


                The difference between rofecoxib and the


      other two coxibs on the market is that a safe dose


      for rofecoxib wasn't identified, the dose wasn't


      low enough.  That raises a question in my mind


      about what is an appropriate therapeutic index for


      a drug.


                I am giving you my opinion now, but when I


      listened to Dr. Cryer's presentation yesterday, the


      bottom line conclusion I came to at the end of that


      was there really doesn't appear to be a need for


      COX-2 selective NSAIDs based on what I heard


      yesterday.  There is probably other information out


      there why I am wrong, but that was the conclusion I


      came from.


                So, in any event, that is answer one.  I


      believe there is an effect and it's dose related,


      and with celecoxib and valdecoxib, I think we have


      evidence.  You said before we have a good


      evidentiary base based on clinical trials for the




      COX-2s.  I would challenge that in the sense of the


      survival curves and the things that I talked about


      there, that we have a very weak evidentiary base


      for things like protective, you know, is there a


      grace period for use, and also on the dose issue,


      we really don't have a great evidentiary base.  But


      that being said, you understand me.


                Now, for the non-coxib NSAIDs, my own view


      is that as an epidemiologist first, I try to report


      the phenomenon I observe and leave it to brighter


      minds to figure out why what I observed happens.


                You are asking me sort of what do I think


      is happening underneath it all.  I am attracted to


      the COX-2 hypothesis personally.  Dr. Gurkiepal


      Singh, my colleague and co-author in Medi-Cal, he


      has a different view on that, but I think that we


      can these in vitro tests that say, oh, this is the


      COX-2 selectivity of this NSAID, you know, in a


      test tube.


                What happens in the human body could end


      up being surprisingly different.  We saw yesterday


      that the dynamic response of these differences,




      that the platelet effect is very quick, the


      thromboxane effect is a very quick effect, the


      prostacyclin effect seems to be a more gradual


      effect, that this creates very complex interactions


      that ibuprofen, that any of these drugs could, in


      the end, end up with a deficit, a prostacyclin


      deficit that results.


                I think Dr. FitzGerald showed that slide


      yesterday with the normal distribution of the time


      area under the curve and then this little sliver


      where they are not protected, and that may be the


      reason why, for these different drugs, that we end


      up with these different relative risks and these


      different odds ratios.


                In the end, for the non-selective NSAIDs,


      my own advice would be let's look to see are there


      somewhere in studies--it is going to be


      observational studies--in observational studies


      that we believe have been reasonably well done.


                By "well done," here, they have to be


      large.  The literature is full of really small


      studies.  I mean I could have presented Meloxicam




      studies, 5 patients, no risk.  Well, da, you know,


      you have got a confidence interval that goes from


      zero to infinity.  They need to be large.  Look in


      a systematic way to identify what the body of


      evidence is.


                Can we identify bad actors?  I believe


      indomethacin, for example, is clearly a bad actor,


      and if people looking at the data concluded that,


      take appropriate action, weed the garden of the bad




                Try to identify drugs that based on the


      evidence we have, appear to be less risk in the


      totality of their evidence, looking for consistency


      study to study to study, and then, in a rational


      way, suggest these are the drugs we think that the


      public should use, and these other drugs, well,


      then you have to decide do you want them on the


      market or not.


                I am not really going to comment on that,


      but I think that is the approach I would take.  I


      would be trying to sort of identify right off the


      bat the bad actors and let's get rid of them.


                Things that look like they may actually be


      safe, and when I say "safe" now, I mean that they


      don't appear to have cardiovascular risk, identify




      them and shift the market towards that, and then


      deal with the others.


                DR. WOOD:  Dr. Friedman.


                DR. FRIEDMAN:  Thank you.  Several


      comments.  First, as both Dr. Graham and Dr. Platt


      have mentioned, observational studies are


      essential, but they have a number of limitations,


      and because of those limitations, it is easy after


      the fact to critique away those whose results you


      don't much care for as we have seen.


                But a couple of other points.  One, can


      these particular drugs, their primary use, we are


      dealing with chronic conditions, conditions that


      last years, sometimes many years, and so the drugs


      are intended for use over those many years




                Yet, most of the clinical trials we heard


      reported yesterday are 12, 18 weeks, a few of them


      go longer.  You mentioned that one of the reasons




      we didn't see the problems early on may be numbers,


      and I agree that is potentially it, but the fact is


      we didn't see problems arise in the studies until


      14, 18 months.


                We often see analyses by patient years of


      exposure.  In this particular setting, I don't know


      whether patient years are always equal to patient


      years, and therefore, I guess I would say why


      aren't we doing more bigger, longer randomized


      clinical trials for these chronic conditions?


                DR. GRAHAM:  I am not speaking for the


      agency now.


                DR. WOOD:  We got that.  Don't say it each




                DR. GRAHAM:  Okay.  I think they are


      incredibly expensive and companies don't want to do


      them.  There is not an incentive for them to do


      them, and you would have to talk to the people from


      the new drug side of the house, but the fact is


      that they are not requiring them.


                So, that is a very legitimate question.


      You know, working as an epidemiologist, we try to




      make do with what is, and so we use the


      observational data.  You are going to get better


      quality data if you are able to do this, but just


      to give you a sense of the size of the studies that


      I think you would need to do, I mean you talked


      about before that you have the APPROVe study and we


      see no effect until 18 months, but there was study


      090 that was talked about briefly by Dr. Villalba


      yesterday.  It was a 6-week study at 12.5 mg, and


      it showed a difference, the suggestion of a


      cardiovascular risk within the 6-week study at the


      lowest dose.  Now, it's a small study, as well.


                But I am just saying that to say that I


      think the epidemiologic data, in my mind at least,


      answers the question about when the effect begins.


      The question is if you want to have--this is the


      philosophy--how much certainty do you need to make


      a decision.


                Right now, when it comes to efficacy, the


      effect, does the drug work, you are looking at the


      lower bound of the confidence interval, and you


      want to see is that different than 1, because if it




      is, then, I will conclude with 95 percent certainty


      or greater that the drug actually has an effect.


                When it comes to safety, you are doing the


      same thing.  You are looking at that lower bound.


      You want this 95 percent certainty that the drug is


      harmful.  You are presuming that the drug is safe


      rather than let's presume we want to do no harm to




                Let's start off at the beginning assuming


      that the drug isn't safe, and we want to have a


      certain level of confidence about how bad this drug


      could be, and that is still tolerable to us.  We


      want to cap the risk.  It will be a completely


      different way of looking at studies for a safety


      perspective, one that actually gives a priority to


      safety and it maximally protective of patient


      safety, just as that high standard for efficacy is


      maximally protective of patient safety, because by


      keeping drugs off the market that don't work, I am


      protecting patients from unsafe drugs, and if I


      have pneumonia and I am given a drug that doesn't


      work, well, I get a harm from that.


                But that's philosophy, and I think it's an


      outcropping, it's a development, a natural


      extension of the development of clinical trials in




      the United States where the focus has always been


      on efficacy.


                DR. WOOD:  Let's try and keep both the


      questions and the answers reasonably short,


      otherwise, we will be here until after midnight.


                DR. GRAHAM:  I apologize.


                DR. WOOD:  That's okay.  Let's go on to


      Dr. Elashoff.


                DR. ELASHOFF:  First, I have one comment


      and then one question.  In terms of confounding,


      just because you put a lot of variables in some


      model doesn't necessarily mean that you have


      adequately removed the confounding effects even of


      those variables.


                The second has to do with Dr. Graham's


      slide 13, the excess population risk.  I note that


      the Ingenix data has been left out of the bottom




                DR. GRAHAM:  That's right, because for the




      high dose.


                DR. ELASHOFF:  Yes, but the negative sign


      needs to be on the slide, otherwise, it's a biased




                DR. GRAHAM:  Well enough.  I take that


      correction. Okay, fair enough.


                DR. WOOD:  Dr. Bathon.


                DR. BATHON:  Yes.  As we weigh the


      risk-benefit ratio of these drugs, one


      consideration is that there are subgroups of


      patients in which the benefit might outweigh the


      risk possibly.


                With that in mind, it would be helpful for


      us who are not cardiologists or epidemiologists to


      be able to put the relative risks that we have been


      seeing over the past day or two in context with all


      the cardiovascular risk factors that exist.


                So, for example, if you were take the


      presumed relative risk of rofecoxib of 1.5 to 2.0,


      at least at the higher dose, and put it into some


      context for us of the 20 to 40 cardiovascular risk


      factors that exist in a sort of rank order, where




      would you put the COX-2 drugs?


                 DR. GRAHAM:  For the high dose it would


      be probably more significant than smoking or


      diabetes or hypertension, maybe more important than


      the combination of several of those factors in a


      patient.  For the lower dose, it is probably more


      than hypertension, a little less than diabetes, and


      a little less than smoking.


                I know, David, you know the cardiovascular


      risk factors much better than I do, and so does Dr.


      Hennekens, but that would be my ballpark on that.


                DR. WOOD:  Dr. Abramson.


                DR. ABRAMSON:  Yes.  I want to go back to


      the question Dr. Shafer asked about if these


      classes of drugs or this group of drugs could be if


      there was a hierarchy of risk, and you first


      answered that you thought the coxibs were more


      risky, but I would challenge you a bit simply on


      your own presentation.


                I would like you to discuss your data,


      because you then went on to talk about how


      indomethacin has a risk, Meloxicam has a risk. 




      Based on your data, the message that came through


      is that there was a dose response risk for


      cardiovascular outcomes, that we saw it within the


      coxibs, but we also saw it where the data were


      available in the non-selective NSAIDs.


                There are data that we have seen that


      ibuprofen might increase risk.  We didn't talk


      about the McDonald and Way paper that in


      cardiovascular discharge patients, people given


      ibuprofen had a higher mortality 2-fold.


                So, as the smoke clears, I am not sure


      that the simple answer that the coxibs were


      different was actually supported by your data, nor


      your ultimate explanation.  Can you defend that?


                DR. GRAHAM:  I think you are accurate.


      What I was saying was I was referring, I think, to


      the underlying COX-2 hypothesis and that it is


      clearer, I believe, and, well, maybe it's an


      overgeneralization, because we have the n that we


      are viewing is so small, that looking at rofecoxib


      as sort of the example where we can see very


      clearly the dose response at all the levels and its




      progression, and understanding its mechanism of


      action, and then seeing similar things with


      celecoxib and valdecoxib.


                I think what you are saying is fair.


      Maybe a better thing to say is, in the end, that


      you do need to look at it drug by drug.


                What I was saying, though, in that answer


      that I gave to Dr. Shafer, I was really talking


      more about sort of the COX-2 mechanism and the


      coxibs as being, in quotes, "COX-2 selective," but


      I think your observation is correct.


                DR. ABRAMSON:  Add to that, that although


      there is a hazard that we don't accomplish a lot by


      simply saying the class of NSAIDs may have risk, I


      think we have under-appreciated that over the last


      10 years.


                It is not that different from the


      mid-nineties recognizing that there was a class GI


      effect of these drugs, and that compared to


      placebo, whether it's hypertension or long-term


      potential adverse outcomes, this is something that


      doctors have to be aware of, even the simple thing




      of checking blood pressures when you put people on


      any nonsteroidal drug.


                So, I don't know that it is necessarily a


      bad outcome to call attention to this class effect


      until we get better information on each of these


      individual drugs.


                DR. WOOD:  Dr. Day.


                DR. DAY:  I have a comment about recall


      bias and reverse recall bias.  There is a huge


      research literature on how memory works both in the


      laboratory and in the every-day world, and there


      are two phenomena that have been very heavily


      studied that I think might be relevant here.


                One is called flashbulb memory, and the


      idea is when an emotional spectacular event


      happens, such as when you first learn that JFK had


      been shot, or the Challenger blew up, or the World


      Trade Center had been hit, it is as if the old-time


      flashbulb from an old-time flash camera went off


      and captured all the details, and you remember all


      of those details forever afterwards associated with


      the event that you might otherwise have just not




      even noticed or forgotten.


                So, there is a lot of research on


      flashbulb memory that shows many of those details


      are indeed correct, but some are notoriously false.


      For example, there are accounts of people who


      remember a certain even with great emotional


      aspects to it, and they remember listening the


      world series when so-and-so is pitching and it was


      the bottom of the 9th, da-da-da, all these details,


      and when you go back and check the evidence of what


      was going on, on that day and time, that particular


      game was not on.


                So, that phenomenon number one, flashbulb


      memory, and the second is eyewitness testimony.


      How you ask a person a question will affect what


      answers you get.  So, if you have in the courtroom,


      someone who has witnessed a car accident, if the


      lawyer asks this witness, "Did you see the broken


      glass," then, the witness is more likely to say yes


      than if you ask, "Did you see any broken glass,"


      because the broken glass presumes that there was


      some, and so forth.


                So, I take your points seriously about


      potential recall bias and reverse recall bias, but


      we would have to look at both, whether there is an




      emotional component or not.  Those who have had an


      MI, for example, would have that most likely, but


      also how the questions are asked in these surveys,


      and it is not trivial how you ask people questions


      about were you taking any medications or were you


      taking medication X, and for how long, and what was


      the dosage, and so on.


                So, I don't think that these details are


      always published with the studies, and I would like


      to encourage people who ask people about their


      experiences with drugs, take a look at the memory


      literature for some of these points.


                DR. WOOD:  Dr. Gibofsky.


                DR. GIBOFSKY:  Dr. Graham, I am wondering


      if you separated out your populations based on the


      indication for which they were taking the drug.  I


      ask that because we heard yesterday, and it's well


      known, that rheumatoid arthritis is itself a risk


      factor for cardiovascular disease, and higher doses




      of coxibs, in particular celecoxib, are usually


      given to patients with rheumatoid arthritis as


      opposed to osteoarthritis.


                So, I am wondering if you look at that in


      your breakdown.


                DR. GRAHAM:  Several of the studies that I


      reviewed have looked at the indication, but in


      automated claims data, it is very difficult to be


      sort of be sure does the patient have rheumatoid


      arthritis, and there are different algorithms one


      could use, but in general, what has been found in


      the studies where they have looked at that, that


      the prevalence of rheumatoid arthritis in the study


      populations has been low, very low, and that its


      impact on the results when they adjusted for it


      didn't materially affect things.


                Now, in the California Medicaid study, one


      difference in that study was that our base


      population was limited to patients who had


      diagnoses of osteoarthritis or rheumatoid


      arthritis.  Now, these are diagnoses, and so does


      that mean that they really had osteoarthritis or




      rheumatoid arthritis, I don't know, but when we did


      try to eliminate in that study at least were the


      people who might be using an NSAIDs for a muscle


      injury, a short-term complaint as opposed to a


      chronic illness.


                In none of those does the presence of


      rheumatoid arthritis seem to affect things, but


      again I think the prevalence is pretty low in all


      of these studies.


                DR. GIBOFSKY:  One quick question for Dr.


      Platt, if I might.  I need to understand the


      concept of survivor bias somewhat in that I think


      there is a difference between a patient who is


      drug-naive, then put on a drug, and then an event


      happens versus a patient who may have seen a drug,


      perhaps seen another drug after that, 3 or 4 agents


      of the class, and is then switched to another agent


      and something happens.


                I think we have talked about remote versus


      current, but there is also this issue of sequential


      effect, and I am wondering how you deal with that


      as a survivor, particularly because of the paper we




      saw a few weeks ago in the Archives suggesting that


      discontinuation of an NSAID may itself be a risk


      factor for a thrombotic event.


                DR. PLATT:  Your point is exactly right.


      I think that the concern about survivor bias is


      that if we think that some people are particularly


      susceptible, which is almost certainly the case,


      then, if we start the clock after a person has


      already been exposed to a drug or to one that has


      the same effect, then, it is very much less likely


      that those individuals will have a problem.


                That may be the explanation, for instance,


      for the reason that the literature was so badly


      wrong about postmenopausal estrogens and heart


      disease, that most of the epi studies started with


      prevalent users.


                I think the majority of the studies that


      we were reviewing here, these were individuals who


      are known to have had at least a year of prior


      experience without exposure to the nonsteroidals.


                Your study in Kaiser I know was an


      exception cohort at least with regard to a year of




      prior history, but I am not aware that any studies


      have a longer drug-free prior interval than that.


                DR. WOOD:  Dr. O'Neil, do you want to


      comment particularly on this?


                DR. O'NEIL:  Yes, this is an important


      point and a lot of things have been covered in


      Richard's and David's presentation, but one thing I


      think that is relevant that Richard did not cover,


      that is, the value of a randomized trial, is the


      ascertainment and follow-up, and knowing the status


      of individuals in the sense of who goes off therapy


      and how long they stay on therapy.


                That is very critical relative to the time


      dependency of the risk.  It was mentioned, for


      example, the use in the observational sense of


      recent and remote and current use.  Those are all


      terms that are nice, but they don't get at the


      issue that we are trying to get at with regard to


      the clinical trials, and that is essentially when


      does time zero start for you.


                So, I think the appropriate question to


      ask is what is the duration of exposure since your




      initial exposure to the drug, because I think that


      is very relevant to the interpretation of the three


      clinical trials that we have, two of which are in


      placebo-control populations.


                There is a rofecoxib-naproxen control


      trial for one years, there is a placebo-control


      trial in polyp prevention for three years, and


      there is a placebo-control trial in Alzheimer's


      disease for four years, and the time dependency


      from time zero matters as you have seen in the




                It is relevant to the excess risk


      calculation.  So, I would ask the committee, as


      well as I would ask David, of the observational


      studies that you have reported, how many of them


      are cohort studies, and how many of them are able


      to identify new initial use, and then track


      continued use for that individual, so that one


      could look at the relationship between the hazard


      rates and the hazard ratios that we are identifying


      in the randomized trials and match that to the odds


      ratios that are being reported in the observational






                DR. GRAHAM:  On one of my initial slides,


      you can see what the cohort studies were, and in


      some of the nested case control studies, you are


      also able to get the time on drug.  Actually, in


      Wayne Ray's cohort study, most of these cohort


      studies include prevalent and incident users, so


      they will do what is called a "new user"


      subanalysis, which is to try to get to this issue


      of when does time zero begin.


                We addressed that problem in our study


      here by the inception cohort design in our base


      population, so that we can identify what time zero


      was for the cases.


                Now, none of those studies presented data


      in the form of a survival analysis, which I think


      in the end, that is what Dr. O'Neil would like to




                DR. O'NEIL:  No, my question is not so


      much in survival.  I don't believe, and again that


      is why I am asking you, I don't think any of those


      studies were designed or able to capture the




      question I am asking.


                In fact, if I am not mistaken, in the


      Wayne Ray study, he defined new use, but he did not


      define any time from new use, which is essentially


      critical to when those risks start.


                DR. GRAHAM:  That study isn't cited as one


      of the studies where we are able to derive that


      information.  This slide was a slide that I


      presented to show that from the epidemiologic


      literature, those studies where the investigators


      had identified when time zero began for rofecoxib


      use, and they didn't present the data as a survival


      analysis, but they identified when time zero began


      and then, in various ways, showed you either what


      the distribution of the cases were, so that you can


      see that it was impossible for the risk to have


      been delayed for 18 months, because nobody in the


      study used the drug for 18 months, or they parsed


      time out and looked at the first 30 days of use


      from time zero, and found the risks that they found


      down here.


                But you are right, those studies aren't




      designed that way, and we haven't had time in our


      Medicaid study to do these analyses yet, but we


      have the data to now do the cohort study and time


      to event, so we will have an opportunity actually


      within the data to actually compare and look to see


      exactly the question you are driving at.


                But I would say that from the published


      data, in each of these studies, time zero for


      rofecoxib was identified and in some way or


      another, information that I think could be useful


      to the committee in establishing when does risk


      begin was contained in those studies.


                DR. O'NEIL:  Well, the other point here,


      which is the value of clinical trials, and it was


      the question that was discussed yesterday with


      regard to the intent-to-treat analysis, and that is


      to say to analyze all outcomes once randomized to


      the trial regardless of whether you want to track


      the individual to 14 days post-exposure.


                You can't really maybe get access to this


      information in the observational studies.  That is


      a conjecture, but it's one or the other biases, and




      it was interesting to the comment, whether one


      would believe this or not, that discontinuation,


      discontinuation from an NSAID alone raises risk.


                If that were to be the case, that is a


      different analysis altogether.


                DR. GRAHAM:  In that actual paper, it


      could be that people were discontinuing the NSAIDs


      because they were having chest pain and it was


      being interpreted as dyspepsia or something, and


      then they go to have their infarct.


                I mean you are right about that, but this


      is the nature of how epidemiology is done, and I


      can't change it.  I didn't make the rules, I am


      only following them.  Nobody is arguing that


      clinical trials, if they could be large enough,


      that they would give all of us answers that we


      would have greater comfort trusting what they are




                What I am proposing is that we don't have


      that kind of data in the clinical trials.  As large


      as the clinical trials are, for the questions that


      this committee is facing, you don't have the data




      you need, and what I presented is the epidemiologic


      data, and it is imperfect and it has its warts, and


      that is why I would emphasize looking at


      consistency and trying to sort of derive from that


      a general sense.


                I mean does it make pharmacologic sense


      that you would have an 18-month delay?  I mean I


      guess I suppose it depends on what you think the


      mechanism of action is for the underlying disease,


      but even in the clinical trials, study 090 was 6


      weeks long, 12.5 mg, and it had a cardiovascular




                DR. WOOD:  I am happy to facilitate a


      discussion among the FDA, but I think we would


      rather hear from the committee right now.  Dr.


      Farrar, you are next.


                DR. FARRAR:  I think that the


      recommendations of the committee tomorrow are going


      to depend on the assessment of the overall risk and


      the overall benefit of this class of drugs.


                As a researcher and after all the data


      that has been presented, I am more than happy to




      accept the fact that there are serious risks even


      of death from taking NSAIDs.  In fact, though,


      there are serious risks in taking any medication at




                For some of the NSAIDs, it is


      cardiovascular risks, for some of them it is


      clearly GI bleeding.  As a doctor, though, who


      takes care of patients, I know that treating pain


      or not treating pain and not treating the


      disability of arthritis also has very serious risks


      even of death.


                Given the extensive work that you have


      done, on the risk of both the cardiovascular and


      the GI bleed, I wonder what level of risk is


      acceptable you, and remembering that the only other


      drugs that are really available is analgesics or


      narcotics, and the only other drugs that are really


      available in terms of limiting inflammation are


      biologics or immunosuppressants, I wonder what drug


      is safe enough that you would recommend that I


      actually would be able to use it in patients to


      prevent some of their suffering.


                DR. GRAHAM:  Well, I am not going to give


      a product endorsement.  A couple of things, though.


                DR. WOOD:  Try and make it brief.




                DR. GRAHAM:  One, the benefits of the


      treatment for the traditional NSAIDs compared to


      the COX-2 selective NSAIDs with GI bleed, we have


      clinical trial evidence that suggest that there may


      be a difference, but here, to me, is an anomaly.


                Rofecoxib got the indication for being


      GI-protective, celecoxib didn't based on the


      clinical trials data you guys looked at yesterday.


                There are two published studies in the


      literature looking at what I would say is actual


      benefit.  There, they were looking at


      hospitalization for GI bleed--they didn't look at


      death from GI bleed, but I wish they had--but


      hospitalization for GI bleed, and what they found


      was, in both of these studies, that celecoxib was


      actually more beneficial, you know, lower rate of


      hospitalization for GI than rofecoxib.  So, that is


      the population, two large studies.


                You have got your clinical trials that




      would have said it should be the reverse.  So, I


      throw that out as one sort of conundrum.


                The second is that I don't think that the


      actual benefits of these drugs are understood well


      enough to sort of try to weigh these very well.


      The case fatality rate for myocardial infarction in


      the United States approaches 40 percent.  The case


      fatality rate for hospitalized GI bleeding is


      probably somewhere around 5 or 10, it is a much


      lower case fatality rate.


                Nobody that I have seen anywhere has sort


      of worked this out very well, so I would submit to


      you and to the committee that you actually know


      very little about the actual population benefit of


      any of these products.


                DR. WOOD:  I don't think we are going to


      get an answer to that question, so let's move on.


                Dr. Nissen.


                DR. NISSEN:  Let me briefly answer the


      earlier question about what does the hazard ratio


      of 1.5 to 2 mean. Before I came to the meeting, I


      made a point to look this up, because I thought it




      would be very relevant.


                It is equivalent to raising a cholesterol


      from 200 to 260, or taking up smoking.  Another way


      for the committee, I mean as a cardiologist I have


      to deal with this all the time, the most effective


      drugs we have for prevention of morbidity and


      mortality are statins, and they reduce risk about


      35 percent.


                So, a hazard ratio of 1.5 to 2 is really a


      very, very big effect when you are talking about


      the most common cause of mortality, and that is why


      this discussion is so important.


                Now, my question is this.  We are going to


      be asked to balance risk and benefit, and so the


      magnitude of the hazard ratio is very important to


      all of us, and I am trying to reconcile what we see


      in the randomized control trials with, let's take


      rofecoxib for a moment, where it looks like the


      hazard ratio in the randomized trials is in the


      range of 2, 3, 4, maybe even higher, and in the


      observational data it is significantly lower.


                I would like to propose a hypothesis to




      you and just ask you if you think this is right.


      In your observational data, you are looking at


      mostly short-term exposure, so you are looking at


      less than 12 months typically of exposure.


                It may well be that the hazard increases


      over time, so that by the time you get to 18


      months, you can actually see it in a much smaller


      randomized trial, and so it doesn't rule out the


      possibility that, in fact, both observations are


      right, that, in fact, there is an early hazard, but


      that early hazard has a smaller hazard ratio than


      the hazard at 18 months or 24 months or even 36


      months, and if we ever were to look out 5 years, it


      might still be increasing.


                Do you think that is a reasonable




                DR. GRAHAM:  I think more likely it is,


      that in your clinical trials, early on you don't


      have enough power to distinguish the risk.  The


      hazard is the same, but the lines are closer


      together, because we are closer to the origin.


                I think one other explanation for the




      lower risk ratios in observational studies, I would


      think is more likely due to misclassification of


      exposure and misclassification of outcome.  It is


      likely to be nondifferential, so it would tend to


      reduce the odds ratios and relative risks towards




                Exposure, because people are going to take


      it, a lot of these people are taking it on a prn


      kind of basis.  In a clinical trial, you have a


      greater certitude that they are actually taking it


      every day.  That introduces a lot of


      misclassification, so the a priori hypothesis going


      into an observational study, with misclassification


      going on, you are fighting an uphill battle to see


      an effect.


                DR. WOOD:  We have got lots of people who


      want to ask questions.  I want to make sure that


      the people who are asking questions have questions


      they want to ask for clarification of the speakers


      who have spoken rather than just general points.


                Dr. D'Agostino.


                DR. D'AGOSTINO:  I have a couple of




      questions along the way here.  I have spent a good


      part of my career in the Framingham Heart Study,


      and it's an epidemiological study and a cohort


      study, and we take joy when somebody runs a


      controlled trial on hypotheses and then later on


      confirms it.


                The first question is I am concerned that


      even though you have gone through this careful


      analysis, your conclusions are no apparent effect,


      probably increased effect, probable increased risk.


      They really don't help us in the sense of pinning


      things down.  We have a couple of very strong I


      think good studies, the APPROVe study and the APC


      study as placebo-controlled trials.


                Tell us quickly where is the weight of how


      we should look at these two pieces, the controlled


      trials we have versus what you have produced.


                DR. WOOD:  Really quickly.


                DR. D'AGOSTINO:  Really quickly, it can be


      done quickly.


                DR. GRAHAM:  My belief is that for the


      controlled clinical trials, for the levels of risk




      that we are concerned about, that they do not have


      the statistical power early on to show risk




                DR. D'AGOSTINO:  I think Bob O'Neil's


      comment is very important here.


                The other two points, and again I will


      make them quick, I am very concerned about the high


      dose effect you have, and I am really concerned


      about the MI and the number of cases.  I mean blood


      pressure, cholesterol, diabetes, smoking, this is


      what drives people to have heart attacks and what


      have you, and that is completely missing on your


      assessment of how many new cases, so I guess it is


      more of a comment that I am really concerned that


      that sheet needs sobering interpretation.


                DR. GRAHAM:  But it was based on the odds


      ratios and relative risks where those factors were


      adjusted for, so as well as they are adjusted for,


      that is what the projection represents, the excess


      after adjustment.


                DR. D'AGOSTINO:  Yes, but I mean the


      comment was made by you, throwing in the analysis




      doesn't necessarily adjust for them.


                The last one, you made a very nice point


      about the cardio-protective effect, and you tried


      to show that these uses, and what have you, somehow


      or other all have the same risk, and your


      interpretation that there must be some confounding


      going on, why doesn't that hold for all the studies


      you gave, why don't that hold for the Solomon


      study, which you thought was a great study, yet,


      this one result you don't like?


                DR. GRAHAM:  For what, the Kimmel study?


                DR. D'AGOSTINO:  Wasn't it the Solomon


      study that had the naproxen as the




                DR. GRAHAM:  That is because the cardio


      protection was present when they were on the drug


      and when they weren't on the drug.


                DR. D'AGOSTINO:  I understand what you are


      saying, but if that's a problem, then, it means


      there is some confounding going on.


                DR. GRAHAM:  No, it's selection bias.


                DR. D'AGOSTINO:  Well, it's selection




      bias, but why isn't it for the whole study?  Why do


      you throw out a result you don't like and keep all


      the results you like?


                DR. GRAHAM:  No, that is not what I did.


      I pointed out a result where they showed the


      presence of the selection bias.  In other studies,


      the Ingenix study is the only other study that


      looked at this.  I don't have a slide of it.


                DR. D'AGOSTINO:  I don't know if it's a


      selection bias or misinterpretation of the data.


                DR. GRAHAM:  Well, to me it looks like


      selection bias.


                DR. WOOD:  Let's continue that


      conversation later.


                Dr. Morris.


                DR. MORRIS:  David, would you go to slide


      14.  That is the risk, the duration of use.  I


      think one of your points was that if you look at


      your study, tell me if I understand this right,


      that with the lower dose, that the median time to


      an AMI is sooner than with a higher dose, did I


      understand that right?


                DR. GRAHAM:  Yes.


                DR. MORRIS:  A month?


                DR. GRAHAM:  Had more cases, a greater




      proportion of our cases, but the other thing is


      remember, down here, we are talking about 18 cases


      or so.  The N here is small, the N here is like 58,


      and the N here is 10.  So, I wouldn't read too much


      into the difference.


                The more important point is that at the


      low dose, nobody was out there beyond 18 months, so


      all the action happened before 18 months, and the


      same for the others.  I see what you are saying.  I


      can only say that is what our data were.


                DR. MORRIS:  One interpretation is what


      you said earlier, that for this particular drug, we


      are talking about, as you said, no safe level.  I


      was wondering if that is the way you interpreted


      it, that because we are talking about Vioxx here,


      and there is no safe level, that something is going


      to happen sooner, or is it something with the


      populations are different.


                DR. GRAHAM:  The populations could be




      different, but I think, you know, you would expect


      the higher dose to have a shorter latency to onset


      than the higher dose, but the numbers are so small.


                DR. MORRIS:  Okay, it's a small number




                DR. WOOD:  So, the answer is too small


      numbers at high dose.


                Dr. Boulware.


                DR. BOULWARE:  I just want to make sure I


      understand something that you had proposed in your


      excess population risk slide, if you would put that


      back up.


                As a rheumatologist, I use these drugs in


      a population much greater than what you have here


      with a 65 to 74 where the risk of an MI is fairly


      high in that group.


                Did you want us to believe that this


      excess risk that you are proposing would be


      extrapolated to other population groups, too?


                DR. GRAHAM:  Well, no.


                DR. BOULWARE:  Do you have any numbers


      that may demonstrate that?


                DR. GRAHAM:  Well, the answer to the


      second is no. This was an example in conversation


      with people planning the talk, to try to help




      people connect with what it means.


                Cardiovascular risks go up.  I mean in the


      next age group higher, the risks are higher.  In


      the age groups lower, they are lower, but


      cardiovascular risk begins to increase in the 40s.


                DR. BOULWARE:  I understand, but it


      wouldn't be a linear type of thing.


                DR. GRAHAM:  No, the background risk isn't


      linear, the relative risks, though, are adjusted




                DR. BOULWARE:  Because one of the


      questions we will be faced with is are there


      subpopulations or groups that these may be safe in,


      and I just want to make sure I understand the


      relative risk in different age groups.


                DR. GRAHAM:  Nobody in any of the studies


      where they have looked at it have reported effect


      modification, which would be that the level of risk


      differs at different ages.


                DR. BOULWARE:  One more question here.  I


      want to make sure I understand.  I think I heard a


      comment that says when the risk approaches


      2.0--maybe I just assumed that you said this--that


      it was an unacceptable level of risk.


                Is there ever a case where a drug may have




      a clinical benefit in which that risk is


      acceptable, because for the patients I see, not


      giving them any of these drugs will confer a great


      deal of risk on them, and physical impairment, and


      we have studies that show that the functional


      classification of rheumatoid arthritis patients


      carries with it a significant mortality as that


      class goes up?


                DR. WOOD:  I think that is a question for


      the committee to answer rather than Dr. Graham.


                Let's move on to Dr. Cryer.  Do you have a




                DR. CRYER:  I do.  The comment and


      question I have of Dr. Graham addresses an issue


      that I think is an important difference between the


      observational studies and the prospective studies,




      and this difference relates to assessment of drug


      compliance and missed doses, and I think it is


      critical as it relates to assessing drugs which


      potentially affect platelet function.


                A huge difference, as you know, between


      aspirin's effect and every other NSAID including


      the COX-2 inhibitors, is that with the non-aspirin


      NSAIDs, as soon as you remove the drugs, whatever


      potential effect they would have had on the


      platelet are immediately reversed.


                So, with naproxen specifically, my


      preconceived bias, which may be wrong, but my


      preconceived bias based upon everything I know


      about the pharmacology and the things that Dr.


      FitzGerald has reviewed for us, is that it should


      have some mild anti-platelet effects which would


      only be present when the drug is on board in the




                So, the specific question is, in the


      observational studies, recognizing that in clinical


      practice people miss doses of their NSAIDs, they


      are not taking their NSAIDs consistently, how do




      you account for the missed doses in the


      observational studies recognizing that this could


      potentially lead to a mitigation of whatever


      negative effect or positive effect that they may




                DR. GRAHAM:  It ends up being


      misclassification. Generally, what that means is it


      will force the observed level of risk, the relative


      risk of the odds ratio closer to 1.  So, if we had


      an increased risk, it would make it lower, if we


      had a protective effect, it would sort of make it


      higher, closer to 1.


                DR. CRYER:  Right, we agree on that.  The


      specific question is, is there a way to actually


      recognize or to account for when people do not take


      their doses in the observational databases?


                DR. GRAHAM:  No, there isn't, so when you


      are studying, say, an increased risk, that is why I


      said if you find something, you have to realize you


      found it despite the misclassification.


                DR. WOOD:  Okay.  Dr. Domanski.


                DR. DOMANSKI:  I will save it for






                DR. WOOD:  Okay, great.  Dr. Furberg.


                DR. FURBERG:  No.


                DR. WOOD:  Okay, great.


                Dr. Temple, who does speak for the FDA.


                DR. TEMPLE:  I am just asking questions.


      A couple.  Actually, one point is it seems to me


      that since we expect that people are going to be


      getting one drug or another, comparisons with other


      NSAIDs seems like as good a comparison as we should


      make.  You might want to leave out indomethacin if


      you are worried about it.  That's one thing.


                I guess my main question, though, is


      everybody has paid appropriate lip service to the


      idea that very small differences are hard to


      interpret in epidemiology.


                People have said 1.5, 2.  Actually, I


      notice in one of his editorials, Dr. Furberg cited


      a paper of mine where I said anything less than 2


      really needs a lot of questions.  Jerry Cornfield,


      who sort of invented all this stuff, used to say 3.


                Well, we are talking about differences




      here that are 0.1 differences, not that they


      wouldn't be hugely important if they were true,


      that is absolutely true.  So, I guess I want to


      know what Richard and you make of all this, because


      the numbers are very small, and yet, just as an


      example, there is a very great consistency that you


      cite that celecoxib looks sort of okay, but you


      found one study where there is a little hint that


      maybe the higher dose is a problem, and since


      probably we all think dose response is likely, that


      looks good to you.


                DR. GRAHAM:  Two studies, there were 2.


                DR. TEMPLE:  Okay, 2.  The valdecoxib


      data, which shows nothing, doesn't look so good


      because we probably all believe that there is


      likely to be a class effect.


                What I am asking is, with numbers like


      this, how do you know what to do with them?  That


      seems very fundamental for the epidemiology.


                DR. WOOD:  But, Bob, there are 4


      randomized clinical trials here, and your comments


      don't apply to them, I assume.


                DR. TEMPLE:  No, they don't, although they


      are not perfectly consistent either.  But, no, I am


      asking, what do we make of differences of this




      magnitude with everybody having given lip service


      to the idea that small differences are hard to


      interpret, and yet we seem to be enthusiastically


      endorsing them, so I just want to know what Richard


      and David think about that.


                DR. GRAHAM:  Rich, do you want to go




                DR. PLATT:  I think we have to be cautious


      about how we interpret it, so I would say the


      finding of a relative risk of 3 in an epidemiologic


      study, as David found, is meaningful--


                DR. TEMPLE:  For high dose rofecoxib.


                DR. PLATT:  For high dose rofecoxib.


                DR. TEMPLE:  I would not dispute that at




                DR. PLATT:  It seems to me that in that


      context, that a dose response effect, that the


      information about lower doses gains weight by


      borrowing from that.  I think that is also worth




      keeping in mind when, in other studies that are


      working in that range that make us all nervous,


      there appears to be a dose response effect.


                It is the kind of consistency that makes


      the study, in my mind, be worth more attention.  I


      think there is something to be said for giving more


      weight to relatively small excess risks if they are


      seen in a number of different environments when we


      can't have good reason to think that there is a


      similar kind of biases that might be contributing


      to it.


                After that, I agree with you.  We are in


      relatively difficult terrain.  I think that it is


      not the same as no data, though.  I think we ought


      to distinguish between the situation in which we


      have no evidence from ones in which we have


      relatively weak evidence.


                We didn't talk at all, for instance, about


      the enormous number of spontaneous reports of


      myocardial infarction following exposure to


      nonsteroidals.  There are thousands and thousands


      of them.  In my mind, they don't contribute at all




      to the discussion, whereas, I think these need to


      be weighed in the mix when we don't have clinical


      trial information to depend on.


                DR. GRAHAM:  My answer is similar to his,


      but I think that what you are identifying is, is


      that we are hitting or at least right now the


      frontier is the limits of what the available tools


      we have to define the levels of risk that we are


      talking about.


                We are talking about small levels of risk


      that turn out for this particular event to be


      enormously important in a population level.  If you


      are talking liver failure, we wouldn't be having


      this conversation.  For that reason, it becomes


      important and what I would say is sort of


      emphasizing what Rich said, is I would be looking


      for consistency across different studies, and if I


      found a number of studies, say, as with Indocin,


      for example, to me, that is more persuasive.


                If I found a number of studies that


      pointed to a particular set of NSAIDs that seems to


      have low risks, I would take comfort in that in the




      absence of perfect information.  I mean some light


      in a storm is probably better than no light In a




                DR. TEMPLE:  I take it if the differences


      were at the level of 10 percent, 1.1 versus 1.2--


                DR. GRAHAM:  I am thinking more in a very


      qualitative sense of things that they seem to


      cluster around 1.  I mean 1.1 for ibuprofen, it


      could be that, for example, may naproxen increases


      the risk 3 percent in the real world, we are never


      going to figure that out, maybe ibuprofen increases


      it 10 percent or 15 percent, maybe we could figure


      that out, I don't know, but there is going to be a


      place where qualitatively, if we see enough studies


      kind of sort of pointing to the same place, you


      know, most of them, they are not all going to say


      the same thing, there is going to be these


      conflicts, just like we have in clinical trials




                But if most of the compass arrows are sort


      of pointing in the same direction for particular


      NSAIDs, I think those are the ones that at least




      that I sort of place on a suspect list.


                DR. TEMPLE:  So, very low hazards need at


      least multiple support before they are credible.


                DR. GRAHAM:  I think so, and I think that


      you want to try to encourage to collect that


      information sort of to test that out.


                DR. TEMPLE:  Alastair, could I take half a


      second to answer a question Larry raised before?


                DR. WOOD:  Sure, a second.


                DR. TEMPLE:  Well, it's a very good


      question, you know, if the drug is going to be used


      forever, why don't you study them forever.  The


      only thing I would point out here is that what sort


      of started people thinking was VIGOR, and VIGOR


      didn't take 3 years to show anything, it showed up


      in 9 months.


                So, what you have seen is for, say,


      lumiracoxib, a humongous study of about the same


      length, but, of course, they didn't know about


      APPROVe, did they, and whatever you think APPROVe


      means, whether Bob is right that it's late, or


      David is right that there weren't enough cases,




      people were pointing toward a study that by every


      reasonable thought, if you think platelets are


      involved, ought to be long enough to show things




                But then you form a new hypothesis once


      you have APPROVe, and you have to adapt it, and I


      think that goes on all the time.  It would not be I


      must say for most things my first thought unless


      you are looking for cancer that you need a 3-year


      study to find it, but maybe you learned that it




                Just for what is worth as an example, you


      can't get an anti-arrhythmic drug approved in this


      country without showing that you don't alter


      survival unfavorably.  One result is there are


      hardly any being developed, but, you know, we had


      bad experiences, we didn't like the results of


      CAST, so you change.


                I think there is no doubt that things


      evolve and you have to expect that, and APPROVe,


      depending on what you think of it, changes the


      nature of what you expect.


                DR. GRAHAM:  Bob, just one point on that.


      I think if the APPROVe study had been 5 or 10 times


      larger than it was--I am talking about retrospect




      now--you would be able to answer with much greater


      confidence what is happening month 1 to 18.  I


      guess what I am saying is that you could also


      shorten the latency to identification of a problem


      if it turns out that the risk is early on.


                DR. TEMPLE:  David, I think that is


      entirely possible, and if it involves platelets, I


      would believe you, but if it involves a small,


      long-term increase in blood pressure, then, I am


      not so sure.


                DR. GRAHAM:  Right, but we saw yesterday--


                DR. TEMPLE:  We don't know.


                DR. GRAHAM:  We don't, but if it's


      prostacyclin, that effect could occur immediately.


                DR. TEMPLE:  Yes, but the blood pressure


      effect could be delayed.


                DR. WOOD:  Right.  So what, Bob, you are


      saying is that it is easy to be a Monday morning


      quarterback, but the data were not there before.


                DR. TEMPLE:  I would never be that rude.


                DR. WOOD:  I think you are right.


                Dr. Stemhagen.


                DR. STEMHAGEN:  I would like to clarify a


      couple things.  First, I am a little concerned in


      terms of the unpublished data.  I appreciate that




      we are able to get data very quickly, right at the


      minute that it is being generated, but none of us


      have had a chance to really review that, so I do


      have some concerns about the weight putting on this


      unpublished data when the rest of us haven't had a


      chance to look at it.


                I think there needs to be some


      clarification. There was some discussion about the


      recall bias, and so on. Certainly, there is a major


      concern about that in case- controlled studies, and


      we don't have the questionnaires, but there were a


      lot of sort of subanalysis done in the Kimmel


      study, about trying to look at whether recall bias


      is a problem, and I am not sure that you have


      highlighted that enough that looking at all those


      different things, there were really no differences






                Similarly, in the Watson study, it's a


      GPRD study, it is different than a lot of the large


      databases, the automated databases.


                There is a lot more personal involvement


      in terms of the data and the data collection and


      the adjudication of results, and I think it just


      needs to be clear that all of these studies are not


      the same in terms of a Medicare study where we


      can't go back and validate records.  A lot of them


      had a much more careful review, and I am just not


      sure that that was totally clear and if you hadn't


      read each of the papers.


                I would like to just ask a question in


      terms of your definition of the inception cohort,