DEPARTMENT OF HEALTH AND HUMAN SERVICES
FOOD AND DRUG ADMINISTRATION
CENTER FOR DRUG EVALUATION AND RESEARCH
JOINT MEETING OF
THE ARTHRITIS ADVISORY COMMITTEE AND
THE DRUG SAFETY AND RISK MANAGEMENT
P A R T I C I P A N T S
Alastair J.J. Wood, M.D., Chair
Kimberly Littleton Topper, M.D. Executive Secretary
ARTHRITIS ADVISORY COMMITTEE MEMBERS
Allan Gibofsky, M.D., J.D., Chair
Joan M. Bathon, M.D.
Dennis W. Boulware, M.D.
John J. Cush, M.D.
Gary Stuart Hoffman, M.D.
Norman T. Ilowite, M.D.
Susan M. Manzi, M.D., M.P.H.
DRUG SAFETY AND RISK MANAGEMENT ADVISORY COMMITTEE
Peter A. Gross, M.D., Chair
Stephanie Y. Crawford, Ph.D., M.P.H.
Ruth S. Day, Ph.D.
Curt D. Furberg, M.D., Ph.D.
Jacqueline S. Gardner, Ph.D., M.P.H.
Eric S. Holmboe, M.D.
Arthur A. Levin, M.P.H., Consumer Rep.
Louis A. Morris, Ph.D.
Richard Platt, M.D., M.Sc.
Robyn S. Shapiro, J.D.
Annette Stemhagen, Dr.PH., Industry Rep.
FDA CONSULTANTS (VOTING)
Steven Abramson, M.D.
Ralph B. D'Agostino, Ph.D.
Robert H. Dworkin, Ph.D.
Janet Elashoff, Ph.D.
John T. Farrar, M.D.
Leona M. Malone, L.C.S.W., Patient Rep.
Thomas Fleming, Ph.D.
Charles H. Hennekens, M.D.
Steven Nissen, M.D.
Emil Paganini, M.D., FACP, FRCP
Steven L. Shafer, M.D.
Alastair J.J. Wood, M.D. (Meeting Chair)
P A R T I C I P A N T S (Continued)
FDA CONSULTANTS (NON-VOTING)
Byron Cryer, M.D. (Speaker and Discussant)
Milton Packer, M.D. (Speaker only)
NIH PARTICIPANTS (VOTING)
Richard O. Cannon, III, M.D.
Michael J. Domanski, M.D.
GUEST SPEAKERS (Non-Voting)
Garret A. FitzGerald, M.D.
Ernest Hawk, M.D., M.P.H.
Bernard Levin, M.D.
Constantine Lyketsos, M.S., M.H.S.
Jonca Bull, M.D.
David Graham, M.D., M.P.H.
Brian Harvey, M.D.
Sharon Hertz, M.D.
John Jenkins, M.D., F.C.C.P.
Sandy Kweder, M.D.
Robert O'Neil, Ph.D.
Joel Schiffenbauer, M.D.
Paul Seligman, M.D.
Robert Temple, M.D.
Anne Trontell, M.D., M.P.H.
Lourdes Villalba, M.D.
James Witter, M.D., Ph.D.
Steven Galson, M.D.
Kimberly Littleton Topper, M.S., Executive
C O N T E N T S
Call to Order:
Alastair J.J. Wood, M.D., Chair 5
Conflict of Interest Statement:
Kimberly Littleton Topper, M.S. 5
Interpretation of Observational Studies
of Cardiovascular Risk of Non-steroidal Drugs
Richard Platt, M.D., M.S. 8
Review of Epidemiologic Studies on
Cardiovascular Risk with Selected NSAIDs
David Graham, M.D., M.P.H. 37
Committee Questions to Speakers 89
Merck Research Laboratories
Sean P. Curtis, M.D. 152
Joel Schiffenbauer, M.D. 189
Mathias Hukkelhoven, Ph.D. 201
Gastrointestinal and Cardiovascular Safety
of Lumiracoxib, Ibuprofen, and Naproxen
Patrice Matchaba, M.D. 205
Open Public Hearing 236
FDA Presentation (Lumiracoxib)
Committee Questions to Speakers 346
Committee Discussion 410
P R O C E E D I N G S
Call to Order
DR. WOOD: Let's get started and welcome
back to another day. We are going to begin as on
the agenda seeing we worked late last night.
A couple of housekeeping things first. As
they say in the movie theater, please turn off your
cell phones. We don't have the one that sort of,
you know, spars you into space if you do that, the
ejector seat, but then please don't answer your
calls in here, so we don't have to hear the
beginning of your conversation.
Kimberly, are you going to read the
conflict of interest? Okay. Go ahead.
Conflict of Interest Statement
MS. TOPPER: The following announcement
addresses the issue of conflict of interest with
respect to this meeting and is made as part of the
record to preclude even the appearance of such.
Based on the agenda, it has been
determined that the topics of today's meeting are
issues of broad applicability and there
products being approved. Unlike issues before a
committee in which a particular product is
discussed, issues of broader applicability involved
many industrial sponsors and academic institutions.
All special government employees have been screened
for their financial interests as they may apply to
the general topics at hand.
To determine if any of the conflict of
interest existed, the agency has reviewed the
agenda and all relevant financial interests
reported by the meeting participants. The Food and
Drug Administration has granted general matter
waivers to the special government employees
participating in this meeting who require a waiver
under Title 18, United States Code Section 208.
A copy of the waiver statements may be
obtained by submitting a written request of the
agency's Freedom of Information Office, Room 12A-30
Because general topics impact so many
entities, it is not practical to recite all
potential conflicts of interest as they
each member, consultant, and guest speaker. FDA
acknowledges that there may be potential conflicts
of interest, but because of the general nature of
the discussions before the committee, these
potential conflicts are mitigated.
With respect to FDA's invited industry
representative, we would like to disclose that Dr.
Annette Stemhagen is participating in this meeting
as a non-voting industry representative acting on
behalf of regulated industry.
Dr. Stemhagen's role on this committee is
to represent industry interests in general, and not
any one particular company. Dr. Stemhagen is vice
president of Strategic Development Services for
Covance Periapproval Services, Inc.
In the event that the discussions involve
any other products of firm not already on the
agenda for which FDA participants have a financial
interest, the participants involved and their
exclusion will be noted for the record.
With respect to all other participants, we
ask in the interest of fairness that they
any current or previous financial involvement with
any first whose products they may wish to comment
DR. WOOD: Thank you.
Let's go right to the first speaker, Dr.
Platt, who is going to tell us about observational
Interpretation of Observational Studies of
Cardiovascular Risk of Nonsteroidal Drugs
Richard Platt, M.D., M.S.
DR. PLATT: Thanks. The framers of the
meeting thought it would be useful at this point to
have a discussion about observational studies to
put us all on the same page.
There was a view by some that the
expertise around the table might be uneven and it
would be worthwhile to have some discussion about
some of the basics. It is clear that that is not
I realize that a number of the people here
have written a book and several of my
here, so to that extent, I think we can either make
this a quick discuss or use this as an opportunity
for a real interactive discussion, because there
are some hard questions here and no matter how we
sort we out, we are going to be left with less than
in the way of firm answers than we would like.
I also understand that there is a point of
view that says that there are lies, damn lies, and
observational studies, so part of what I think is
worth doing is using this time maybe to take our
temperature about whether and under what
circumstances we can put weight on observational
We saw a version of this slide last night
actually in the last presentation about why perform
observational studies at all, because I subscribe
to the general view that all things being equal, a
clinical trial, a randomized trial is more
credible, provides more information than an
The problem is all things aren't always
equal and so there are reasons to ask
what we can
learn from observational studies.
I think the most important of them is no
matter how well a clinical trial is designed, the
individuals who are recruited and consented to a
clinical trial are inherently going to be different
from the actual population of users, and if we want
to understand how an agent performs among real
users in the way they actually use the drug, then,
I think there is no escape but to look to
Additionally, observational data is by
definition there, so when a pressing question
arises, sometimes observational data is the first
way we can get insight into the relationship
between the drugs we care about and the exposures.
I think in that regard, these studies can
often be thought of as helping us identify the
areas in which it would be most fruitful to invest
in full-blown randomized trials. We will never
live in a world where we are able to do all the
randomized trials we care about.
I know that Charlie Hennekens'
randomized trial of aspirin was preceded by, as I
recollect Charlie, a large number of observational
trials, it made you think that it was reasonable to
do those randomized trials, so observational
studies can be useful in that regard.
Finally, when we are talking about trying
to understand effects that are relatively unusual,
we stress even the largest clinical trials. We
talked yesterday about the fact that the most
recent drug approvals have used much larger
populations in the NDA phase than had been studied
in the old days, and yet they are still small
compared to the numbers needed to parse out
relatively small differences.
There are a lot of different kinds of
observational trials. I have listed a few of the
most common. The ones between the lines here are
the ones that are really the subject for discussion
Tom Fleming made the absolutely correct
and somewhat counterintuitive point that it is
often more difficult to do good
studies of relatively common outcomes than rare
ones, and because of that, the group of studies
that I think at least are reasonable to consider
for looking at relatively common outcomes are
case-control studies, nested case-control studies
and cohort studies.
We have examples of each in the materials
that have been handed to us. The study by Kimmel
is a pretty traditional case-control study. The
studies by Ray are cohort studies, as is the Aramis
study. The study by Dave Graham, the Solomon study
are nested case-control studies.
Just as a quick reminder, the
distinguishing feature of cohort studies is the
fact that the study population is defined on the
basis of whether people are exposed to the drug or
not, and then we look forward to what happens to
them. In that way, they are exactly comparable to
clinical trials, with the big difference that the
assignment to drug is not randomized.
The strengths of those compared to
case-control studies are you have a
at the outset of selecting individuals who are
representative of the group that you are trying to
study, and if you organize the study properly, you
have a reasonably good chance of getting unbiased
The weaknesses, particularly of
observational cohort studies is that just because
individuals had the right drug exposure at the
outset, they may change that. You can deal with
that with an intention-to-treat design, but you pay
for a price for that, and in observational studies,
loss to followup is a big problems.
We are particularly plagued by that
because the large majority of the observational
studies we are working in are ones that use
administrative data from one sort of health plan or
another, and individuals move in and out of health
plans, so that it becomes difficult to follow them
Case-control studies, remember are ones
that start with individuals who have the outcome we
care about, myocardial infarction or
infarction and sudden death, and compares them to
individuals who haven't had that experience, then,
you look back and ask what their drug exposures
are, the reasons for doing those studies are that
they are, first of all, very efficient studies.
You don't have to study thousands and
thousands. You can study as many cases as you find
and a reasonable number of controls, and you can
look back and classify exposure however is most
useful, and that is a very convenient and versatile
feature of case-control studies.
The big weaknesses are that it is very
hard to assure oneself that the cases and the
controls are really representative of the
populations that you care about, and for
conventional case-control studies, for instance,
the study by Kimmel that we are going to look at,
it takes a lot of work to be sure that people who
know what they have already experienced an MI don't
differentially report their exposure to the drugs
that we care about.
That can be for all sorts of
it might not even be wrong, but the individual who
has had an MI and might be just thinking harder
about whether he or she had been exposed to a drug
that we care about.
By the way, nested case-control studies,
for instance, the study that David Graham did is a
hybrid that really, in my view, draws many of the
strengths from both designs, that is, because
nested means the case-control study is nested in a
defined population, so it has a lot of the
strengths of cohort studies and some of the
efficiencies of the case-control studies.
The differences between the observational
studies and randomized studies are pretty clear.
Randomized trials have the tremendous advantage
that there is lots more reason to expect the
treated and untreated groups to be comparable to
There is a lot more opportunity to be sure
that the outcome assessment and adherence to
treatment are good or at least well known, and we
have reviewed the difference for the
I think it is worth making the point that
there are a substantial number of similarities
between observational and randomized studies. Just
because we randomize individuals in randomized
studies, it doesn't mean that the treated and
untreated groups are comparable.
We talked about a study yesterday that was
a randomized trial where there was a substantial
imbalance in important risk factors. So, it is
incumbent no matter what kind of study you do, I
think to look for comparability, and both studies
have as potential weaknesses that there are risks
of false positive results and doing subgroup
analyses and multiple comparisons increases that
We talked a fair amount about that
yesterday, and both are at risk for false negative
results. That can be partly because the studies
may not be powered well enough either because there
is insufficient sample size or individuals aren't
studied for a long enough duration to see
biological effects that we care about, or a
vulnerable group just isn't included.
That is a problem with both kinds of
studies and I think all studies have to be
evaluated on their own merits, so let's just step
through the various places where observational
studies might be into trouble or at least the
things that need careful assessment when we look at
The first is are we studying the right
outcomes. It is essentially impossible in any of
these observational studies to use the kind of
rigorous adjudication that is a hallmark of the
randomized study, so I think we are going to have
to ask ourselves are these outcomes good enough.
The several kinds of outcomes in the
studies that we have been asked to look at are
hospitalized MIs. The case-control study by Kimmel
uses survivors. It had to use survivors because
they were collecting the exposure information by
interview after the individuals had left the
hospital, so if we care about all MIs,
study isn't going to tell us what we want to know.
Some of the studies use MI and
out-of-hospital sudden death by linking to vital
statistics records. I think that is probably the
closest we can get in observational studies to the
intention-to-treat all outcome designs of the
randomized trials, and some of the studies use
You have to ask are these outcomes
measured appropriately. Most of the studies that
we are looking at use some form of automated
medical record or claims data that have been, in my
view, reasonably well validated. That is, there is
a moderate literature showing that claims data are
not so bad for studying acute myocardial
infarction. They have sensitivities in the 90s and
positive predictive values in the 90s.
So, they are not perfect and I think we
will have to ask as we review the studied can the
amount of uncertainty that we know exists in those
account for the effects that we see, or could they
obliterate effects that we would like to
which aren't there.
My sense is that that is probably not a
sufficient explanation to dismiss the studies that
we are looking at. The issue of bias is one that I
think always has to live as a sub-text, but quite
frankly, in the studies that do outcomes in the way
we have been describing, I don't think that is a
For cohort studies, we have to ask are we
studying the right population, and here I think we
really do have to stop and ask carefully. One is
are these people selected from the population under
study. I think in most of these examples, they are
reasonably representative, that is, a study of the
I think that the data systems that are
used to identify the individuals in the cohort are
good enough to give us reasonable belief that we
are identifying either all the people or a
representative sample of them.
I think there is a fair
whether they are representative of the larger
population. We could ask are health plan members
systematically different from the general
population of individuals who are taking these
The range of studies we have include
health plan members. I think that there is
reasonable information that they probably are
representative, at least with respect to the drug
myocardial infarction outcomes that are studied.
Studies in Medicare and population-based studies,
such as those in
reason to think that they are representative.
But there is an important consideration
about whether there are issues about the way
clinicians practice in those setting that might
have a serious impact on selecting individuals. In
particular, to the extent that formularies are
restrictive of, say, newer or more expensive drugs
like the COX-2 inhibitors, but I think we have to
ask very carefully whether the factors that would
influence the prescribing of one class of
over another is likely to seriously impact the risk
of these outcomes.
Additionally, if there are cost
differentials for these drugs, it may be that there
is some form of self-selection that causes
individuals who are sicker to receive these drugs,
and I think that it is incumbent on us to expect
that to be a problem in every one of these
observational studies and to ask how well do these
studies do in adjusting for that. I will circle
back to that in a moment.
I think we have to be concerned about
whether we are studying people who have had prior
NSAID exposure, in which case we would be worried
about survivor biases, of finding the individuals
who are relatively immune to these problems.
Finally, there are study design issues
about whether there are restrictions of eligibility
that might importantly color the data. For
instance, at least one of the studies we are
looking at requires individuals to have received at
least two dispensings of a nonsteroidal
order to be eligible.
That means that you have to live long
enough to have two dispensings, so it certainly
doesn't tell us anything about the early effects of
these drugs, and it might in an important way color
the results with regard to later exposure.
There is an important question which is
not unique to the observational studies, which is
who are the right comparators. We had a number of
discussions about that yesterday. I think that all
the issues that we discuss with regard to the
clinical trials are applicable here. In
particular, there is a lot of reason to want to
compare to other nonsteroidal users because that
gives the best chance of having a group that is
similar with regard to underlying disease status
and presumably risk of myocardial infarction.
Similarly, it is possible to say that if
you really care about COX-2 selective agents, you
should compared one COX-2 selective agent to
That leaves us in the uncomfortable
situation of not knowing what is the risk compared
to no use at all, so we have some comparisons that
do look at non-users or at least remote users, and
that has its strengths. It has the big weakness,
of course, of putting us at risk of making
comparisons against groups that are unrelated.
So, we are really talking here of mostly
about a study like the Kimmel study, not the nested
case-control study. The other kinds of concerns
that raise red flags are the real concern about
losing cases who make the group who are studied
I would point out to you, for instance,
that in the Kimmel study, only half of the MI
survivors who were identified were actually
interviewed and therefore part of the formal
We already talked about the fact that
since that study was limited to MI survivors, that
restricts us to a less serious set of outcomes.
The other problem that really bedevils
conventional case-control studies is
whether the group of people who are selected as
comparators are really comparable.
I think that is one of the reasons that
there is so much interest in doing nested case
control studies, because at the end of the day it
is really extremely difficult to satisfy oneself
that controls really are appropriate.
Much of what we need to be concerned about
in these studies is understanding exposures. Part
of the issue is understanding how to characterize
exposure. This is both a strength and a weakness
of these studied.
You will remember I made the point at the
outset that if we want to understand how drugs work
in actual practice, that we have to do
observational studies. On the other hand, that
means we have to find a reasonable way to
characterize these drugs.
We talked yesterday I think about all the
important issues of understanding whether we had to
look at absolute dose or cumulative effects or
whether the effects start early or
I think that the best of the studies that
we are looking at tackle a number of these issues.
I will mention in a minute some of the ways that
these studies have gone about that.
I think in terms of ascertaining exposure,
it is probably reasonable to put the most reliance
on the studies that use administrative databases of
pharmacy dispensing, but I will just make the point
that we have to be clear that these studies are
done in situations where we have reason to expect
that the administrative databases are correct.
I think all the studies we are reviewing
are ones where the investigators were careful to
know that the individuals really had a drug benefit
that was operating at the moment, that would likely
find the prescription drug exposures that we care
about, but as a general proposition, you can't
assume that that is the case.
Most health plans have some kind of
restrictions on benefits that might lead
individuals to change their benefit
there would be periods of time when we might know
that they had an MI, and we might not know that
their drug exposure is at the moment.
I will return to a point that we touched
on yesterday, which is that although almost all of
the studies that we are talking about report their
results as relative risks, a 2-fold increase in
risk, a 70 percent decrease in risk. What we
really care about is the absolute difference in
So, that is not different between
observational studies and randomized studies, but I
think it is really a critical piece of our thinking
about the problem that we are dealing with.
The second thing that is just worth
recalling is that when we talk about a 95 percent
confidence interval, that our expectation about
where the true value lies is not uniformly
distributed over that interval.
Our best guess about where the true value
lies is around the point estimate, and if that
point estimate is wrong, the large
majority of the
uncertainly is pretty close to that point estimate,
so that it is particularly not helpful, in my view,
to pay enormous attention to p values.
The difference between a p value of 0.05,
as shown here, and a p value of 0.01 and a p value
of 0.13 is not all that enormous in terms of the
I think one of the things that is a
particular concern that we need to pay attention to
in these studies is the fact that it is easy to
look at a lot of different comparisons, and to the
extent that we do that, we are going to have to
just be careful to know that the strength of any
one comparison is weaker than it appears to be.
For instance, this is a quote from one of
the studies that we are looking at. We undertook
an observational study examining the association
between rofecoxib, celecoxib, other nonsteroidals
and myocardial infarction.
Well, there is no primary hypothesis
there, and the results for all of the
nonsteroidals. They are all interesting to look
at, they are all associated with p values. Those p
values are all relatively too extreme given the
fact that there are so many comparisons.
It is a problem for randomized trials. We
talked about subgroup analyses. It is important to
do those studies, those subgroup analyses, but
absent having specified a principal hypothesis at
the outset, I think that we have difficulties in
knowing how much weight to put on any particular
We talked a lot about confounding. That
is one of the most important concerns in randomized
trials. I know you all know what confounding is.
It wasn't obvious to me when I was making these
slides that everyone knew that, but the example, so
that we have it in mind is if what we know is drug
A versus drug B, and MI or no MI, and we don't take
into account important confounders, we can get
importantly incorrect results.
So, here is an example of an aggregate
analysis with a relative risk of 1.5 among 2,000
people who are exposed to two drugs. If you break
it apart and see that in the high-risk group, drug
A accounted for 80 percent of the exposure, and in
the low-risk group, drug B accounted for 80 percent
of the exposure, you see that in each of those two
categories, the high-risk group and the low-risk
group, that, in fact, there is no association
between drug and outcome, but you have to take them
apart to do that.
Well, the good news is if you know what
the confounders are, and you have measured them
accurately, it is possible to adjust for them, and
all of the studies we are looking at do a pretty
job of adjusting for the confounders that we know
about, so I guess one of the questions is how well
do they do at identifying the important
I would say not bad on a lot of that.
That is, if you take, for example, the Graham study
or the studies that Wayne Ray did in
Medicaid, there are a number of strengths. I will
sort of stop and back up on the things that make
these look like relatively more credible
the scheme of the factors that we care about.
They are inception cohorts of nonsteroidal
users, that is, they are individuals who had to
have been members of the health plan for at least a
year before they received their nonsteroidal.
There was a lot of information about their
underlying medical status that was available to the
investigators using both claims data and medical
record data to ascertain cardiovascular disease
along a number of dimensions, utilization of
procedures like surgery or angioplasty or
diagnostic procedures that are intended to find
cardiovascular disease, hospitalizations, emergency
room visits, and a substantial amount of
information about the medications that these
individuals took that was related to or plausibly
related to cardiovascular risk factors.
Those large number of factors were used to
create separate risk models using only the
unexposed, and then to use those risk models to
create risk indexes for the individuals to use as
an adjuster for underlying cardiovascular risk.
Is it perfect? No. Is it pretty good?
It seems to me that it meets the sniff test of
saying that it has a reasonable chance of
identifying important confounding.
Unfortunately, there are a number of
important confounders for which health care systems
typically don't have good data, like smoking, OTC
NSAID use, obesity, family history, and those are
typically much more problematic.
Some of these studies have worked pretty
hard to try to either deal with it or understand
whether it could be an important problem. One of
the handouts we had, for instance, was the study by
Schneeweiss and colleagues who looked back at one
of the studies by Solomon that was performed in the
Medicare data set, and asked how important could
these unmeasured confounders be.
They actually had access to information
from the Medicare Beneficiary Survey that asked
representative Medicare beneficiaries detailed
questions about many of the things that we would
are about. They weren't the people who were
involved in that case-control study, but if you
assume that the beneficiary survey, members were
representative and they gave plausible answers, it
is possible to extrapolate back to the source
population, and the take-home message from that
work, the answer didn't change very much, which is
really what we want to know, not sort of the
absolute difference, but whether those unmeasured
confounders are important enough that they could
cause a difference.
I think we still have to be concerned at
the end of the day, we still have to be concerned
about residual confounding as a potentially
One way I think that we can draw relative
assurance from that work of adjusting for
confounding is to ask how much did the estimate of
risk change between the unadjusted and the adjusted
I think there is a world of difference
between an unadjusted result of 10 and an adjusted
result of 1.5, and having an unadjusted
1.6 and an adjusted result of 1.5. The former, I
think the reasonable assumption is we arguably
haven't been able to deal with confounding in a way
that would let us believe that 1.5 means something.
I think there is a much stronger case to
be made when adjusting for important confounders
that we know about doesn't change the risk estimate
very much, that that is a relative more credible
Having said that, I think that
observational studies are best at finding relative
risks that are more than 2. I think that I would
pay some attention to relative risks of 1.5. I get
very nervous about adjusted relative risks of 1.2.
That doesn't mean that they are not right
and I don't ignore them, but if we ask is that for
sure the answer, my response to that is I am just
less certain about that.
I think we are always left at the end,
while we spend a lot of time thinking about and
adjusting for confounding, and I think we can do a
pretty good job of that, it is much
adjust for misclassification, and it is essentially
impossible to adjust for bias.
So, I think one of the things we have to
ask about is are there plausible sources of
misclassification and bias, and if there are, in
which direction do they work and would they
seriously change our interpretation.
We talked about the fact that absolute
differences are the important ones that we care
about. We have already started to look at data
that talks about person level risk and population
level risk, so beyond saying that at the end of the
day, I think these are the answers that we really
need to talk about, not about relative risk.
Personally, I think that we need two kinds
of answers. One is what is the information that
patients and their physicians need to have to make
decisions for them personally about whether to
accept certain kinds of treatments in exchange for
certain kinds of anticipated benefits.
I think there is a population level
concern that we have to have that emerges
same set of analyses, but takes on a different
So, you will be pleased to know that I am
wrapping it up now, and I would say that both the
cohort and nested case-control designs, which are
the bulk of the observational studies that we are
looking at, are relatively strong ones and I think
deserve the committee's real attention.
I am sorry that not every one of these
studies prespecified a primary hypothesis that we
can attend to, but we should whenever possible do
that. Even though we don't find important effects
in some of these studies, I think it is important
to recognize that they don't exclude one.
As I have said, I am least certain about
attaching great weight to relatively small excess
risks even understanding that when they are
extrapolated to a large population, they could
account for very important public health problems.
Finally, I would say that the things that
support the studies' conclusions are the fact that
when we do subgroup analyses and look for
dose-response effects, that they strengthen the
cause-effect relationship, and I think that there
is reason to look for consistency across studies.
I take the point that was made yesterday
that it is possible that a dozen studies of
naproxen could all have the same underlying bias
that shift the point estimate in the same
direction, but it is not so clear to me what that
So, I think that we would have to have a
reasonable idea of what might explain consistent
differences across studies and ask if they are of
sufficient magnitude to explain that. As I say, I
am not clear that there are those kinds of biases.
I think we have to be cautious about the
fact that residual confounding bias and
misclassification are all issues with these
studies. So, I think that while they add to our
discussion, they have to be considered in light of
the fact that they are imperfect vehicles.
DR. WOOD: Thanks very much.
Let's just go straight on to the next
speaker and then we will take questions
Platt after David Graham's talk.
The next speaker is Dr. David Graham from
Review of Epidemiologic Studies on
Cardiovascular Risk with Selected NSAIDs
David Graham, M.D., M.P.H.
DR. GRAHAM: Good morning. Today, I will
give a review of epidemiologic studies and
cardiovascular risk with selected NSAIDs. I will
be evaluating epidemiologic data from the published
literature plus two currently unpublished studies
that I have evaluated.
My focus will be on providing estimates of
risk of acute myocardial infarction in the setting
of the use of COX-2 selective NSAIDs or naproxen,
although I will have some comments in light of
yesterday's discussion about other NSAIDs on those,
The methodology was to do a
by specific NSAIDs and then cross-check the
citations in those articles to see if there are
other articles I had missed.
I would also like to take this moment to
thank Dr. Crawford for his leadership in making it
possible for me to present some of our preliminary
data from a study in California Medicaid, which Dr.
Gurkiepal Singh from Stanford and I recently
Before I get into the substance of my
talk, I just want to comment a little bit on excess
cases and projecting to the national population
what was the impact of rofecoxib use, and I am
doing this for two reasons - one, because it has
been a source of controversy and concern. We cite
a number in a paper that I and others have
published from Kaiser Permanente in which we made
an estimate of the impact of rofecoxib use.
Tomorrow, FDA will present its estimation
of the number harmed by rofecoxib, modeling
randomized clinical trial survival curves. A
couple of things I would like the
Committee just to
be aware of when they see that data tomorrow. It
assumes a grace period at the beginning of use that
is based on the VIGOR study and the APPROVe, 6-week
grace period in which there is no difference in MI
or increased risk of MI, and the first six weeks of
high-dose use with the first 18 months of low-dose
use of rofecoxib.
As I will show later in my talk, I believe
that this is unreliable due to low statistical
power early on, because we are only talking about
in each of these studies a handful of cases early
on in the study. Two or three cases of MI and wide
confidence intervals, you could have divergence of
the curves very early.
The epi studies, however, that I will
present will show that there is a 3- to 50-fold
more events to work with, more statistical power,
and it suggests a different outcome.
The second is, is that the patient
enrolled in randomized clinical trials are
generally healthier than patients in the real
So, if you are going to model what is the
number of people who have been harmed in the
population, you have got to assume what is the
background rate that you are modeling off of.
If you use a background rate from healthy
people to model what is happening in the population
of people who really aren't so healthy, who have a
higher background rate, you will underestimate the
actual population impact.
So, in any event, now on to the substance
of my talk.
The next three slides provide a very dense
overview of the major features of each of the
epidemiologic studies that I reviewed. I am
looking at COX-2 usage in acute myocardial
You can see that they are grouped in
several groups. The top three studies I consider
from an epidemiologic perspective to be stronger
studies to have been done better. In terms of the
things that Dr. Platt just talked about, I thought
that these studies were the stronger studies.
The next two studies from the
literature I thought were less strong, and I will
describe why. Finally, I have separated out these
last two studies, one submitted by Merck to the
FDA, performed by Ingenix, and the other, the
Medi-Cal study that Dr. Gurkiepal Singh and I have
recently completed of unpublished studies, so they
are separated out from the group.
You can see we are talking about different
source populations, and so if we can see
consistency of results across different
populations, different age groups, and different
study designs, I think that that adds support to
the notion that there is a real effect.
If we begin to see that there is a lack of
consistency across the studies, then, many of the
things that Dr. Platt talked about before need to
be considered sort of the individual level of the
studies, so what might explain why one study shows
something and another one doesn't.
This next slide shows the case definitions
and in a number of cases that we were working with
to come up with the relative risk
estimates that I
will show you.
All of the studies began with hospitalized
acute myocardial infarction. Several of the
studies were able to link members of their base
cohorts to death certificate data to identify
sudden cardiac deaths, as well. So, those are the
ones that have the +Sudden Cardiac Death.
The asterisk next to the Kimmel study is
to remind me and to remind you that the Kimmel
study was based on nonfatal MIs only. By their
design, they had to interview their cases in
person, so the patient had to survive their
myocardial infarction to be interviewed. So, there
are those differences in study design.
In the end, what is very important in an
epidemiologic study in dealing with this issue I
think in particular, is what is the statistical
power of the study, and that is driven primarily by
the number of events in the exposed group that we
have to deal with.
So, in this column here, you will see the
total number of cases of myocardial
were identified in each of the studies. The
asterisk next to the Ingenix study 628 is to remind
me that in that study, they identified about 1,700
MIs in total, but they excluded 1,100 of the MIs
because they occurred in people who weren't exposed
to an NSAID at the time of the myocardial
infarction. So, as a result, they left them out,
because in the previous slide, when we look at the
reference group, most of these studies used either
non-use or remote use as the comparator. The
Ingenix study used active treatment with either
diclofenac or ibuprofen.
I would like to say one thing about
reference groups. Dr. Platt brought it up before.
In this issue, I don't believe that there is a
single best or optimal reference group. What you
really want to do is get as close as you can to a
placebo group that has been randomized and has all
the risk factors of the people who are getting the
In the observational world we can't get
there, and so at the end of the day, if
you want to
do a study, you are in a sense forced to pick among
the least evil of that you think, and then it has
to do with how you define things.
So, non-users, for example, could be
viewed as being close to the placebo group, they
are not getting the drug. The problem is people
who don't use drugs tend to be healthier than
people who do use drugs, so that raises a host or
Yes, we can try to adjust for confounding
and the like, but you are still left with that
concern that they may be, in some way that we can't
measure, different from the people who get the
In the study I did, and in several other
studies that people have done, we opted to use
people who had been treated with NSAIDs in the
past, but weren't currently taking an NSAID at the
time of the event or the study, the reasoning there
that whatever the selection factors are that lead
to a patient getting an NSAID, that some of those
selection factors are there in people who
previously received NSAIDs.
That is still not a perfect group, though,
because you could argue that patients who are no
longer taking NSAIDs might be healthier than people
who are currently taking NSAIDs.
Finally, the problem that is posed by
using an active comparator. If you have an active
comparator, and I am comparing another drug to an
active comparator, and I see a difference, I don't
know what it means. I need some place to anchor
the result, and for that reason, although none of
them are perfect, I believe that the non-use and
the remote use analyses at least give us a way of
pegging results, and if we want to compare one drug
to another drug, if we had that common reference
point, at least it allows us to accomplish that.
The one other thing I would like to point
out about the number of cases is that for
rofecoxib, especially at the high doses of
rofecoxib, most of these studies had relatively few
exposed cases. The exception is the
Medicaid study where we had 157 exposed
the higher dose of rofecoxib.
Now, this is a very busy slide and I won't
spend a lot of time going over it, but I will be
happy to answer questions later.
Basically, before we heard there are
unmeasured risk factors in automated databases that
frequently can't be accounted for, aspirin use and
smoking are among the most common. So, you can see
here that most of these studies, that information
Kimmel was able to get both because they
interviewed the patients, the cases and the
controls. In the Medi-Cal study, it turns out that
aspirin is reimbursed, and so we have a handle on
In the Graham study, a survey of controls
was done to see what these unmeasured factors might
look like in the source population. The Solomon
study did the same thing, relying on the Medicare
Beneficiary Survey that Dr. Platt talked about
Important limitations I think
that need to
be highlighted are that in the Mamdani study, they
excluded patients who had less than 30 days of
NSAID use, so the survivor bias Dr. Platt talked
about before, in my view, is big concern with this
study, and for that reason I ranked it in sort of
that category of low quality studies.
In the Kimmel study, as Dr. Platt also
mentioned, there was low participation rate.
Basically, half of the cases and half of the
controls who approached volunteered to be in the
study. More importantly I think in that study, and
it's unfortunate, is that there was what I would
refer to as the potential for, in quote "reverse
Normally, with recall bias, we think oh, I
have had a heart attack, I am going to remember
more efficiently what happened to me immediately
before the heart attack compared to some control
where I say to the control what were you doing four
months ago on this particular day.
That is the classic recall bias. This
situation I think had what I would
reverse recall bias. They interviewed the people
who had heart attacks within four months of getting
out of the hospital - what happened to you the day
and the week before you had your heart attack four
For the controls, they call them on the
phone and they way what happened to you yesterday
and the week before, so it is actually the reverse.
The controls actually would have better recall of
what they were actually doing than the cases
potentially, and we will see how this is reflected
in some of the results.
Finally, with the Medi-Cal study, I think
the single greatest concern for the committee in
considering these data (a) that it is preliminary
data, and (b) that this is a new database for
For that reason, I am just including a
slide to orient people to that. The other
databases are ones that have been used before.
This is a database that only in the last two years
has come online to be sort of a quality
to begin contemplating doing studies.
Its strengths are that it is very large,
it captures aspirin use, it doesn't censor people
by age. It combines Medicare coverage when you go
over the age of 65 with the prescription benefits
of Medicaid, so you get the drugs and the outcomes.
Matching has been done to multiple cause
of death tape, so that we have death data in this
database up through 2002. We didn't include it in
the data I will show today because we really want
the information up through 2004.
Once people get into Medicaid or Medicare,
they don't tend to drop out. The limitations are
that we can't get medical records, and that is
something to understand, and that is a very
complicated database. Dr. Singh from Stanford who
is the principal investigator for our Medi-Cal
work, and who has worked to bring this database
online, spent two years putting things together and
working out the kinks in it before contemplating
doing research with it, so at least you understand
the limitations of that.
There is always the concern about
unmeasured risk factors and Dr. Platt talked about
I want to review for you very briefly some
of the evidence from the published literature where
efforts were made to look at what unmeasured
confounding looked like and did it differ across
In our study using Kaiser Permanente data,
we did a survey, a random survey of random sample
of controls, and we looked at aspirin use, smoking,
and over-the-counter NSAID use. You say see by
NSAID that there really was not significant or
substantial differences in the distribution of
these risk factors.
So, if they don't vary in the control
group, they can't really confound that observation
that you see very much.
In the Solomon study, these are the data
from the beneficiary survey. Dr. Platt already
mentioned a further analyses of these data that
showed that the actual impact of all these
unmeasured confounders on the measure of
relative risk at the end was measured in the
hundredths of an odds ratio, so if the odds ratio
was 1.34, adjusting for these things and projecting
it out would change it to maybe 1.35 or 1.33. We
are talking about minuscule differences, not
qualitatively important differences.
Finally, in the Kimmel study, they also,
through their interview, were able to see that for
most of these factors, there was similarity across
NSAID groups except for current smoking where the
rofecoxib group had much lower current smoking than
any of the other NSAID groups, but for past
smoking, it was more than the other NSAID groups or
the remote groups, and if you added these two
together, the rofecoxib was very similar to these,
but the celecoxib group had more smoking.
My own conclusion from this is that yes,
it is possible that some of these unmeasured risk
factors could be influencing the results. I don't
think that there is strong evidence that there is a
systemic bias that would sort of lead to
interfering with trusting the results and
that these factors are confounding the observations
that we see.
So, first, I will talk about rofecoxib,
then I will talk about celecoxib, then I will talk
about valdecoxib in terms of epidemiologic data.
These studies on the left, with their
reference groups, are the ones that looked at
myocardial infarction with rofecoxib. What I have
shown is for all doses and where it was present
less than or equal to 25 milligrams and over 25
milligrams, what the fully adjusted odds ratio and
95 percent confidence intervals were.
These studies varied in the extent of
adjustment that they did. The Ray and the Graham
studies each adjusted for about 30 cardiovascular
risk factors. The Solomon study was a somewhat
smaller number, Mamdani was a somewhat smaller
number. Kimmel, they adjusted for somewhere in the
20s, the Ingenix study somewhere in the 20s, the
Medi-Cal study adjusted for about 40 cardiovascular
What you can see is when you
the All Doses is that, in general, the point
estimates were elevated and for many the 95 percent
confidence intervals excluded 1.
More importantly, though, is looking at
the low dose and the high dose data because we know
from the clinical trials data, and we would suspect
it on just pharmacologic grounds, that if there is
an association that it might be worse with the
higher dose than with the lower.
So, four studies provide us estimates at
the low and the high doses, the Wayne Ray study and
our study from California Medicaid, and then the
two unpublished studies, one from Ingenix and the
other from California Medicaid.
We see there that in three of the four
studies, there is an elevation in the point
estimate. In the Graham study, it included one.
When we look over 25 mg, we see greater consistency
although in the Ingenix study, there is this
paradoxical finding of sort of basically a neutral
relative risk. I don't have an explanation for why
that happened, but it makes me concerned
extent about what was going on in that study,
because it is a result that goes in a very
What I would like to point out, because I
will come back to it again, is that when we are
dealing with drug safety, and the goal now is what
risk can I exclude, if my job is--now I am not
talking about efficacy anymore, what I am talking
about is safety--if my job is to protect the public
from harm, what risk can I exclude based on the
data that I have, I believe that is much more
relevant to look at the upper bound of the
confidence interval than the lower bound.
What traditionally happens is we look at
the lower bound of the confidence interval and we
say if it includes one, there isn't a problem, but
the biggest reason, as Dr. Platt showed in his
previous slide, for a wide distribution and a wide
confidence interval in your study, is that the
study doesn't have enough statistical power to get
you a narrow enough confidence interval to say that
you have the 95 percent certainty that you want.
So, if your mission is above all else I
want to do no harm, that I want to protect patients
from harm, then, based on the data you
would submit that the upper bound of the confidence
interval provides greater assurance to patients,
and then if you are going to compare a benefit to a
drug, that you might want to consider that benefit
against that upper bound of the confidence
interval, because that is compatible with the data.
In any event, that is my view, and not the FDA's.
This is a slide from California Medicaid.
It is preliminary data and I wanted to present it
to you, because what it shows is a dose-response to
rofecoxib from 12.5 mg up to and through 50 mg.
You can see that we have very wide
confidence intervals for some of them, and that is
a reflection of the limited number of cases, but I
want to point your attention to the very narrow
confidence intervals in the 12 to 25 mg and in the
25 to 50 mg, just to point out that in the previous
slide here, where we are talking about what are
these point estimates, that now you can
have done is we have fleshed them out a little bit
Another comparison that I think is
important to consider, certainly it was for us,
when we did our study in Kaiser Permanente, was at
the time there were two COX-2 selective inhibitors
on the market, celecoxib and rofecoxib.
The bigger study raised a question about
high-dose rofecoxib. Our question as researchers
was, and public health scientists, was, well, let's
suppose that rofecoxib increases the risk of
We don't know that it does, but let's
suppose that it does, what about celecoxib, because
it actually had a larger share of the market, and
if it turned out that these drugs have a benefit,
and that benefit is worthwhile, then, it would make
more sense from a practical perspective to use the
drug that had a better safety profile.
So, to us, it was very natural to want to
compare rofecoxib to celecoxib, and so several of
the epidemiologic studies felt similarly
their design they included that analysis, and some
of them it was, as Dr. Platt said, part of a we are
going to make comparisons of everything against
The Solomon study, for example, did that.
They did not state in that study what their prior
hypothesis was. In our study, we did state it. I
mean yes, in a sense we had multiple comparisons,
but we were interested in two different things. We
were interested in rofecoxib versus remote use, and
we were interested in rofecoxib versus celecoxib,
but we thought it beforehand and we planned that
But in any event, what we say is, when you
look at the all dose analysis, in all of the
published studies, rofecoxib increased the risk
compared to celecoxib. When we looked at low dose
rofecoxib, we see the increased risk. When we look
at the high doses of rofecoxib to celecoxib, again,
we see the same pattern.
Dr. Platt, in his talk before, talked
about relative risks, risk differences,
risk, and population risk. The next two slides are
intended to address this at the level of the
individual and at the level of population.
What I have done on this slide--and these
slides now, no one should interpret this as meaning
this is what actually happened in the
population--the next slide is going to have numbers
on it that are for illustrative purposes only, to
help the committee understand what does a relative
risk of 1.3 translate into at the individual level
and at the level of population.
Your typical COX-2 user is somebody in
their 60s who has several other health problems, so
I went to the National Center for Health Statistics
and got the myocardial infarction rate for 65- to
74-year-old men in the United States. That rate
turns out to be 1 per 50 per year.
What I did is I took that as the
background rate and I said if I have an individual
using this drug with that background rate and then
I applied to that person the relative risks or odds
ratios found in these studies that are
shown in the
previous slides, what would the excess risk to the
person be, sort of what would that risk difference
translate to for the individual.
For example, in the Ray study, if you
remember, for 25 mg or less, the odds ratio was
1.02. Basically, it doesn't change. If we based
it on the point estimate, that 0.02 would translate
to 1 out of 2,500 in a year increased risk of heart
Another way to view that number is, is
that is the number needed to harm. If I treated
2,500 65- to 74-year-old men for a year with
rofecoxib, and the rate was 1.02 that Ray found,
treating 2,500 patients would produce 1 extra heart
Now, with the other studies that found
higher estimates for the lower doses of rofecoxib,
you can see that the number needed to harm ranges
from about 90 to 200. That is saying for every 90
people to every 200 people I treat with low-dose
rofecoxib, I would generate 1 other case.
For high doses, because the
were higher, the number needed to harm becomes
I have also shown it based on the upper
bound of 95 percent confidence interval to show you
that based on the data we have at hand, these are
the excess risks that are consistent with the data,
and from a public policy perspective, from a public
health perspective, that is what I react to, and
when I want to see a benefit and say does benefit
exceed the risks, well, I want to know what is a
real benefit in the population in terms of reduced
hospitalization, lives saved, and does that benefit
exceed what I can say is possibly the risk of these
At the population level, now we have gone
from an individual. Remember in the Wayne Ray
study we said it is 1 out of 2,500. Well, that
would translate to 400 additional cases of heart
attack if we treated a million men who were 65 to
74 years old, and we treated them with rofecoxib
low dose for a year.
With the others, you can see
relative risks that might not look so impressive,
that 1.23, that 1.30, that 1.4, that it projects
out to a substantial number when you multiply it by
the large number of people who use these products.
For high doses it ends up being even
greater, and then if we focus on the upper bound of
the confidence interval, we again see that the
numbers are larger still. This very high number in
our study was the result of our having low
statistical power in addressing the high dose
One other question that I think is
important to consider is when does the risk of
myocardial infarction with rofecoxib kick in. Now,
we have seen data yesterday presented by both FDA
and by Merck of various survival curves.
We saw the bigger curve that showed the
separation after about 6 weeks with an overall
relative risk of about 5. We saw, for the APPROVe
study, this close overlapping line at about 18
months, and then they diverge with an overall
composite hazard ratio of about 2.
I would submit to the committee that the
reason for the failure of these studies to show
divergence of the line shortly after the
used are low statistical power, that they just
don't have enough events to show it, and as a
result, you can interpret because of the low
statistical power you basically--how to describe
it--you presume that there is nothing there, and
you err on the side of the drug rather than erring
on the side of what could the risk be to the
If you really want to know what is going
on in the population, then, you want to reduce the
uncertainty. The more uncertainty you have, if you
act basically on the lower bound of that confidence
interval, which is what you are doing when you are
saying the risk doesn't begin until 18 months, you
are basically saying that the absence of evidence
is evidence of absence.
I would say that in safety, what it is, is
you just don't have enough power.
Looking at the epidemiologic
think that we have evidence to suggest that the
risk begins much earlier. I will point it out, and
you guys and women can consider it for yourselves.
In the Graham study, when we looked at low
dose and high doses of rofecoxib, 50 percent of our
cases at the low dose and at the high dose had used
at the time--remember these are inception cohorts,
so these people, their total use, this was 1.8
months, this was 2.7 months--50 percent of our
cases occurred within 2 to 3 months of starting the
That is a lot of power and that really
speaks against the notion that the risk is
backloaded, you know, it is for the low dose, that
the risk doesn't happen until after 18 months.
Nobody in our study was on rofecoxib for more than
about 15 months. I think that was the longest
duration of use we had in our study.
Now, in the Solomon study, they looked at
the low dose and the high dose, and they presented
data in several ways. One is that they grouped
things in 1 to 90 days, and what they
that for both the low dose and the high dose, there
was evidence or risk early on.
The Kimmel study, for all its
deficiencies, most of it was low dose rofecoxib,
and almost all the patients used it for less than
12 months. So, their finding on rofecoxib, if
anything, would also speak to that the low dose
effect kicks in long before 18 months.
Finally, the Solomon and the Ingenix study
looked at the first 30 days of use of these
products, and both of them found elevated odds
ratios of 4 for cardiovascular risk in the first 30
Now, in both of these studies, they didn't
separate it out by low dose and high dose, so this
is a composite, but in both studies, about 85
percent of the use to 90 percent of the use was low
So, basically, what I am concluding from
this slide is that risk of myocardial infarction
with rofecoxib begins when rofecoxib use begins,
and that the inability to separate out
is based on the fact that if you were to count the
actual number of events in the bigger study in the
first 6 weeks, we are probably talking about 3 or 4
events, and if you look at the confidence
intervals, you are going to see they are wide.
For the APPROVe study, the same thing
holds, that you have too few events. The whole
study had 45 events, and I don't recall how many of
those were on rofecoxib and how much of those were
on placebo, but when you think about it, compare
that and then look at the epidemiologic studies,
and look at the number of cases that were in the
epidemiologic studies, and for all their problems,
and we can talk about those, they suggest there is
a big discordance, and I think the answer, the
reason is absence of statistical power in the
In the epidemiologic literature, this has
been recognized, and people have written papers
saying that when you are trying to summarize the
overall risk from a survival study, and you want to
look at specific time periods, that you
off taking the overall risk estimate for the entire
study than focusing on a small segment at a time
because of this issue of low statistical power, so
I didn't invent this.
Now, switch over to celecoxib. There are
a number of studies that have been done to look at
celecoxib risk. What I have tried to do here is
plot out for you the relative risk or the odds
ratio, the author of the study, and then the point
estimates in the 95 percent confidence intervals.
What you will see basically is that for
most of these studies, there is no evidence of a
protective or an injurious effect except for the
Kimmel study that found a substantial protective
Remember the Kimmel study and what I
believe is this reverse recall bias, as well as the
low participation rate, and I personally discount
that study. The committee can decide for
themselves that they want to do.
What about celecoxib lower dose versus
higher dose? Well, unfortunately, the only place
where this is adjusted, is looked at are in the two
unpublished studies. We have the Ingenix study and
we have the Medi-Cal study.
What I would focus your attention on are
the low dose and high dose, the low dose and the
high dose. What we see is in both studies,
evidence of a dose response. Now, the 95 percent
confidence interval in the Ingenix study includes
1, but the point estimate is pretty elevated. That
is 1.18 or so at 400 mg.
In the Medi-Cal study, we go from 1.01 up
to about 1.24. Here, you can see the 95 percent
What I would conclude from this, although
they are unpublished studies, that there is
evidence of a dose response at the higher doses of
celecoxib do confer an increased risk of myocardial
I should point out that in the Medi-Cal
study, the methodology that we used in that study
is the exact methodology that we used in our Kaiser
Permanente study that Dr. Platt before
enough to say is one of the better done studies.
There are no published studies on
valdecoxib, so what do we do? Well, preliminary
data from Medi-Cal, we had 54 exposed cases and we
found a point estimate of 0.99. Now, this was
mostly 10 and 20 mg use. I think that out of all
the patients that we had in the study, there were 2
or 3 who had 40 mg valdecoxib use.
In Medi-Cal, they only reimburse for the
10-mg tablet, and they do this in an effort to try
to discourage people having larger dose tablets and
then taking more of it.
So, this is all the epidemiologic
information that I am aware of, that I have had an
opportunity to review on valdecoxib.
I will now move to naproxen. The issue of
naproxen is important for several reasons. One,
with the VIGOR study, the medical community was
confronted with the hypothesis that naproxen was
the single greatest and most effective
cardio-protectant in the history of mankind, that
it was far better than aspirin.
We heard yesterday that aspirin reduces
cardiovascular risk about 20 to 25 percent.
Naproxen, if we were going to believe the
results, would have to reduce the risk of
cardiovascular events by about 80 to 85 percent.
So, this stimulated a lot of research.
Here, I have summarized in the same fashion as I
did for the rofecoxib studies, the various studies
that have been done. Again, I have separated them
out by the studies that I think are better done,
the studies that have more significant limitations,
and then the two unpublished studies.
I point out the Rahme study to say that
the only reason the Rahme study is listed among
this group of suboptimal studies is that its
reference group was other NSAIDs, primarily
ibuprofen, because ibuprofen was the predominant
other NSAID used in Quebec during the study.
Again, we have the various outcomes that
were done. What I would point is that you can see
the number of cases that we had to work with in
these various studies, and I would point
for the Solomon study, they had about 240 MI cases
that they studied overall, but as you will see in a
few minutes, that exposure could occur anytime in
the past 6 months, so they don't see in the paper
how many people were actually on naproxen at the
time they had their event, so I can't put down a
list of how many people were currently exposed.
The Watson study is the only study that
used a composite outcome. It included myocardial
infarction, stroke, subarachnoid hemorrhage, and
subdural hematoma. Why subarachnoid hemorrhage and
subdural hematomas are in there is beyond me. In
any event, 26 cases of that composite outcome and a
much smaller number of actual myocardial
infarctions. So, that is why that asterisk is
With the Ingenix study, the asterisk next
to the 179 is that this included both prevalent and
incident cases, and the best studies, the best
results come if you base it on incident cases only
or incident use only as opposed to prevalent use,
because prevalent use can have survivor
in any event, in the Ingenix study, they had a
number of different analyses, and they didn't
always use their full number of cases.
There are important limitations to note.
I think the one to focus is to realize (a) there is
no perfect study, we have talked about that before,
and, two, that among all the limitations listed
here, I think the most important one to note was in
the Watson study, was this composite outcome which
really just makes it very difficult from an
epidemiologic perspective to study things.
Myocardial infarction is very well
validated in claims data, and Dr. Platt has already
gone over that with you. Stroke is notoriously
difficult to work with in claims data, and subdural
hematomas most commonly occur because as people get
older, their brains shrink. They bump their heads
and then they get a little bleeding on the surface
of the brain. What that has to do with myocardial
infarction risk, which is what we are really
concerned about today, is beyond me.
I have got two slides on the
This slide shows the studies that found no
protective effect. There is four studies that
found a protective effect, and I am saving them for
a separate slide, because I want to look at those
What you can see from the majority of
these studies, and I would point out that the
studies that were the best done studies in the top
tier, they are on this slide, that all of them sort
of suggest that there is no cardio-protective
effect of naproxen. Several of the studies point
to the possibility of a small increased risk with
But we have four studies of positive
results, and we will probably all remember the
Archives of Internal Medicine publishing three of
the articles in the same issue with an accompanying
editorial that stated the issue is solved, naproxen
I want to look at those studies and just
describe to you my view of them. The top three
studies were the ones that were--well,
no, not the
Kimmel study--Rahme, Solomon, and Watson were the
In the Rahme study done in Quebec, they
compared current naproxen use versus other NSAIDs.
That other NSAID was, by and large, ibuprofen, and
they found a protective effect. Well, if ibuprofen
increases the risk of myocardial infarction, let's
just say that it does, and naproxen doesn't,
naproxen could look like it's protective compared
to ibuprofen, but not be protective really.
The data presented in that paper, if we
re-analyzed it versus non-use, we get an odds ratio
of 1.28, statistically significant. Now, this is
not adjusted. It is not possible from the data
there for me to adjust this result, but based on
what is in the paper, when you compared the
unadjusted to the adjusted point estimates, they
don't change very much, and what that suggests to
me is that this effect, this 0.128 is probably not
far off the mark.
That would then make it comparable to the
analyses I showed on the previous slide,
of these slides use non-use or remote use, so then
it would add a fourth study to an elevated point
estimate for naproxen.
Now, the Kimmel study, we have already
talked about low participation rate and this
reverse recall bias, and a small number of NSAID
cases. In fact, they don't even tell us in the
paper how many cases they had.
We move on to the Solomon study. This was
the result that was reported in the paper and was
picked up by the press, a 16 percent reduction in
heart attack risk with naproxen. The problem, in
my view, was that their definition of exposure in
the study was any use of naproxen in the past 6
months, which means that if I took naproxen 6
months ago and stopped it, I could be included in
this study as being exposed to naproxen.
So, the question is then, you know, how do
we interpret the study. Well, Solomon was good
enough to present data by current use and in recent
use, and recent use included people who stopped
their naproxen. Their naproxen prescriptions day
supply ran out between 1 day and 60 days before the
MI or the index date for their controls, and remote
users, their NSAID use, their naproxen use ended
from 61 days to 180 days prior to the event.
So, let's look at what those results are
then, and what we see is they are identical. So,
unless the committee is prepared to believe that
naproxen confers lifetime immunity to
cardiovascular disease, I think we have to conclude
from these data that what we really have here is
selection bias, and it is not the fault of the
investigator. Dr. Platt talked about before that
there are some things you can't adjust for. You
can't adjust for bias. What you can try to do is
identify bias, and if you identify it, then at
least you know what you are dealing with.
Here, I think we have what is classic
selection bias. It is not naproxen that protects
you again myocardial infarction, it is some other
factor that in this health plan, that they used to
study this drug, the patients who were being
treated with naproxen happened to have
I can't explain why that happened. Dr.
Solomon probably can't explain why it happened, but
it's not due to naproxen.
Finally, the Watson study. This study was
sponsored by Merck, and it was authored by Merck
investigators. The result that was published as
being the basis for the conclusion was this top
result, a 39 percent reduction in cardiovascular
First, I just want to remind everybody,
composite outcome here, subarachnoid hemorrhage,
subdural hematoma, stroke, as well as heart attack,
26 events total, much smaller number of heart
For this event, you can see the
checkmarks. These are the various variables that
they adjusted for in the study. The way they
handed cardiovascular risk, if you read the paper,
I would have to say that it doesn't measure up to
the standards that were set by Dr. Wayne Ray.
We modeled our study in Kaiser
Medi-Cal, and Dr. Wayne Ray, I think that he has
set the standard for how one needs to go about
adjusting for cardiovascular risk. It is not enough
to rely on diagnoses. You have to use the
medications, because medications are much more
accurate predictors of disease than diagnoses in
these administrative claims data.
In any event, they didn't adjust for
cardiovascular risk, and they didn't adjust for
smoking although they had that data. Then, they
present later on another analysis that now includes
cardiovascular risk and it is no longer, in quotes,
"statistically significant," and then they include
smoking, and again it is not statistically
My conclusion on the Watson study was that
(a) they have got a composite outcome that, in my
view, isn't very informative towards the question
of myocardial infarction; (2) that it is very small
numbers; (3) that a variety of approaches were used
in the analysis that inadequately account for the
risk factors that could confound the
result, so I
have discounted that, as well.
So, a conclusion when I look at these, in
quotes, "4 positive studies," I conclude that none
of them provide credible evidence of a protective
In light of yesterday's discussion in the
afternoon about other NSAIDs and what might explain
the differences, let's say, celecoxib and rofecoxib
studies, the rofecoxib studies used naproxen as a
background, a comparator, the celecoxib studies
using ibuprofen or diclofenac.
Dr. FitzGerald is talking and saying,
well, you know, all of these drugs could increase
the risk because what is happening, you know,
biochemically, with the balance of prostacyclin,
could be influenced by these different drugs in
ways that aren't immediately obvious or detectable
in a clinical trial.
I thought I would just share some of that
information on other NSAIDs with the committee,
recognizing a couple things that no single study is
definitive and what you want to look for
I think is
consistency across studies, but as far as
randomized trials go, I would like just to mention
that there are generally too small, too few events,
and you are not going to get the answers that you
need from them unless you make these clinical
trials substantially larger than anything people
have contemplated up to now.
So, from our California Medicaid study, it
is all preliminary and it has not been published,
for ibuprofen we found a small but statistically
significant increased risk. For indomethacin we
found a risk of 1.7. I would like to say on
indomethacin that we found an increased risk with
indomethacin in our Kaiser Permanente study. It
was 1.3 and it was highly statistically
In at least two other studies that I
reviewed in preparation for this advisory meeting,
indomethacin is noted to have an increased risk of
It is not commented on in the text because
that wasn't a primary analysis, but what
talking to you about now is consistency, and I
would submit to the committee that indomethacin is
a lot of smoke, there is a lot of smoke for
In our study, in our Kaiser study, for
example, we did not think in advance to look at
indomethacin separately. I mean we knew we were
going to look at it, but it wasn't a primary
hypothesis. We didn't adjust for gout. I mean
everyone knows that indomethacin gets used in gout.
Gout increases the risk of cardiovascular disease.
Well, in the Medi-Cal study, we adjusted
for gout. Yes, gout increases the risk of
myocardial infarction. It didn't change the odds
I think this next finding, Meloxicam, is
important. Meloxicam is now the number one selling
branded NSAID in the country. With the removal
from the market of rofecoxib, the medical
community, shying away from the coxibs, are moving
to other drugs that they perceive would have the
advantages of COX-2 selectivity without
the bad rep
that coxibs appear to be acquiring.
So, you now have a shift in the
marketplace to Meloxicam. There have been articles
in the Wall Street Journal and the New York Times
on this. The company recently raised the price on
In any event, we are presenting these data
just to say that we found an increased risk. It is
one study, but I think it is the only study. We
looked at this in Kaiser. Meloxicam is almost not
used in Kaiser, so we couldn't study it.
In our California Medicaid study, we only
looked at drugs that had more than 50 currently
exposed cases. Nabumetone came out in this study
as not showing a whiff of a problem. Sulindac,
there was an increased risk.
Regarding ibuprofen, in our Kaiser study,
we found an increased of 1.06, which sounds really
trivial. It wasn't statistically significant, but
the confidence intervals were pretty narrow. It
was 0.96 to 1.17.
My concern is, as Dr. Platt
you know, above 2 you feel really comfortable,
above 1.5, you can believe it, below that you begin
to get really edgy. The problem is most of the
risks that we are probably facing, if it turns out
that the non-coxib NSAIDs increase the risk of
cardiovascular disease, that is where the risk
level is going to be, and that is what we are going
to have to contend with, because it has tremendous
effects on the population.
Finally, dose response. This slide shows
for diclofenac. This is from California Medicaid.
What we wanted to do was show evidence of dose
response, consistency in the data. Remember we
pointed out diclofenac before. Diclofenac in this
study overall did not have an increased risk, but
at the high doses there is a suggestion of a dose
I will skip that. This slide was to say
that depending on your reference point, you can get
different results, if I use an active comparator
versus remote, and this is showing the three NSAIDs
from California Medicaid compared to
NSAIDs, and you can see the rofecoxib is different
than them, and the other two aren't necessarily
My conclusions, and I am sorry to have
gone so long. Celecoxib, we believe that based on
the evidence we have at hand, that there is no
apparent effect of risk at doses of 200 mg or less.
Above 200 mg, we think that there is evidence of
For rofecoxib, we believe that there is
evidence of increased risk at both the lower doses
and the higher doses, and that risk begin early in
therapy and is apparent during the first 30 days of
With valdecoxib, there is a paucity of
information, but the information we have at this
time suggests that the risk is not increased at
doses of 20 mg or less.
As a class, non-coxib NSAIDs may increase
the risk with differences between each of the
NSAIDs. I don't think we are going to be able to
talk so much about class effects. In the
end, it is
going to have to be looking at individual drugs.
The COX-2 hypothesis may be true, but if
it is, we are still going to have to look at these
other drugs in terms of their individual properties
and what they do.
Finally, naproxen is not
DR. WOOD: Thanks very much. David, it
will come as no surprise to you that every time
practically I pick up a newspaper, I read about
what you are not going to tell us.
So, my question to you is what have you
not told us that you think we should know, because
I would like to make sure. Lots of other people
have shown up here without slides that they forgot,
so I just want to be sure that if there is anything
else we need to hear, we hear it.
DR. GRAHAM: Well, as far as the science
goes, I think I presented the evidence that I am
happy to be able to share with the
committee that I
thought it was important for the committee to have
an opportunity to hear.
The source of controversy surrounding my
presentation related to the unpublished studies
that I was going to be permitted to present or
asked, actually asked to present the Ingenix
results, the unpublished study from Merck, but that
I was being told not to present the unpublished
data from the California Medicaid study, and
personally, I had great difficult standing here
before this committee as an investigator and as a
scientist, as a physician, and telling you the
information that I have, that I am allowed to talk
about, and remaining silent on things that I know
about that I am not allowed to talk to you about.
Fortunately, Dr. Crawford exercised great
leadership in making it possible for me to present
that data, recognizing it's preliminary, but the
methods that we used are identical to our Kaiser
study for the California Medicaid, and for me, I
think the big reservation is, is that it's an
untested database, but I think that
could be done to develop the database and to do
quality assurance and to work out the kinks has
If you look at the findings in the
California Medicaid study and you compare them to
the clinical trials data, and the anomalies and the
questions that you were discussing yesterday about
the clinical trials' data, you look back at the
California Medicaid data, and you are going to see
I think great consistency between the findings that
might help explain and interpret some of the things
that seemed questionable or uncertain yesterday.
So, in any event, I have been able to
present what I thought was important to present,
and I am happy to have had that opportunity.
DR. WOOD: So, the answer is we have seen
it all, is that right?
DR. GRAHAM: You have seen it all.
DR. WOOD: Okay, good. Let me ask you a
question. If you go back to your slide that showed
the excess population risk, put that in proportion
for us in terms of, say, the other drugs
been withdrawn from the market. I mean what sort
of numbers would we be expected to see?
DR. GRAHAM: That is a great question.
The typical drug that has come off the market in
the United States, like the leading cause of drug
withdrawals in the United States in the last 20
years has probably been acute liver failure.
Rezulin came off the market because of it,
troglitazone, bromfenac, a number of other drugs.
Acute liver failure in the general
population has a background rate of about 1 per
million per year. We are talking about that is the
rate of being struck by lightning, 1 per million
per year, and these drugs were pulled off the
market because it increased the risk of that. It
might increase the risk 5-fold, it might increase
the risk 10-fold, it might increase the risk
100-fold. The fact is the background rate was 1 in
a million and what that means is that the actual
number of people affected is sort of measured in
the tens and the hundreds for the liver failure
that could be life-threatening.
In this situation, and this is why the
lower relative risk becomes so critical, we are
talking about a serious event that has a
background rate. Heart attack is not a rare event,
and as I pointed out before, there is a 1 in 50
chance that the average American male age 65 to 74
is going to have a heart attack this year, 1 in 50.
That is an extraordinarily high risk. You
increase that risk 5-fold with a high dose. That
is what happened with VIGOR. If I have got
millions of people taking the high doses, and that
is what had in the United States, and I have
increased the risk 5-fold, you are going to get
numbers that balloon out like this.
So, there is no comparison in terms of
what the population impact is of the typical drug
that has come off the market in the United States
and what we are dealing with here, and that is
because of the high background rate of the
underlying event that we are talking about.
DR. WOOD: So, this would produce many
more cases from what I understand.
DR. GRAHAM: Many more.
Committee Questions to Speakers
DR. WOOD: From the committee, we have
questions. Let's start with Dr. Shafer.
DR. SHAFER: Dr. Graham, tomorrow we are
going to be asked, as a committee, to
question about a class effect for the selective
COX-2 antagonists and for the non-selective NSAIDs.
One of the things that I am finding, that
I am having trouble putting together here, is we
have a lot of conflicting data, and for the COX-2
antagonists we have a lot of data from randomized
Certainly for the NSAIDs, we are going to
have to go with a lot of these observational
studies because we don't have a lot of data on the
topic at hand from randomized controlled trials.
As I look at this, if we come up with some
sort of common warning as a class, and it applies
to everything, we have, in fact, communicated no
relevant information. On the other hand, if we are
going to come up with individual
recommendations, we are going to have to have very
different evidentiary standards in some ways,
because for some of these, we have very little
information, as you pointed out, and yet your data,
particularly the unpublished data from the Medi-Cal
trial, and I appreciate that there is all the
issues of not being previewed and stuff, but we are
all familiar with that process and know how it
What can you tell us to guide us? Should
we try to go drug by drug specific? How do we set
our evidentiary standards when we talk about class
effects where in some cases, we are just not going
to have a lot of data here?
DR. GRAHAM: Right. What you are going to
be getting now, of course, is my opinion, not FDA's
opinion. Probably if you were to talk to Bob Temple
or John Jenkins, or anybody else, everybody is
going to have a slightly different answer.
What we talking about now I think to some
extent is philosophy, so what that preamble, first,
I believe based on the evidence that there
COX-2 effect and that that COX-2 effect is dose
dependent, and that we see evidence of that with
rofecoxib, with celecoxib, and with valdecoxib.
The difference between rofecoxib and the
other two coxibs on the market is that a safe dose
for rofecoxib wasn't identified, the dose wasn't
low enough. That raises a question in my mind
about what is an appropriate therapeutic index for
I am giving you my opinion now, but when I
listened to Dr. Cryer's presentation yesterday, the
bottom line conclusion I came to at the end of that
was there really doesn't appear to be a need for
COX-2 selective NSAIDs based on what I heard
yesterday. There is probably other information out
there why I am wrong, but that was the conclusion I
So, in any event, that is answer one. I
believe there is an effect and it's dose related,
and with celecoxib and valdecoxib, I think we have
evidence. You said before we have a good
evidentiary base based on clinical trials
COX-2s. I would challenge that in the sense of the
survival curves and the things that I talked about
there, that we have a very weak evidentiary base
for things like protective, you know, is there a
grace period for use, and also on the dose issue,
we really don't have a great evidentiary base. But
that being said, you understand me.
Now, for the non-coxib NSAIDs, my own view
is that as an epidemiologist first, I try to report
the phenomenon I observe and leave it to brighter
minds to figure out why what I observed happens.
You are asking me sort of what do I think
is happening underneath it all. I am attracted to
the COX-2 hypothesis personally. Dr. Gurkiepal
Singh, my colleague and co-author in Medi-Cal, he
has a different view on that, but I think that we
can these in vitro tests that say, oh, this is the
COX-2 selectivity of this NSAID, you know, in a
What happens in the human body could end
up being surprisingly different. We saw yesterday
that the dynamic response of these
that the platelet effect is very quick, the
thromboxane effect is a very quick effect, the
prostacyclin effect seems to be a more gradual
effect, that this creates very complex interactions
that ibuprofen, that any of these drugs could, in
the end, end up with a deficit, a prostacyclin
deficit that results.
I think Dr. FitzGerald showed that slide
yesterday with the normal distribution of the time
area under the curve and then this little sliver
where they are not protected, and that may be the
reason why, for these different drugs, that we end
up with these different relative risks and these
different odds ratios.
In the end, for the non-selective NSAIDs,
my own advice would be let's look to see are there
somewhere in studies--it is going to be
observational studies--in observational studies
that we believe have been reasonably well done.
By "well done," here, they have to be
large. The literature is full of really small
I mean I could have presented Meloxicam
studies, 5 patients, no risk. Well, da, you know,
you have got a confidence interval that goes from
zero to infinity. They need to be large. Look in
a systematic way to identify what the body of
Can we identify bad actors? I believe
indomethacin, for example, is clearly a bad actor,
and if people looking at the data concluded that,
take appropriate action, weed the garden of the bad
Try to identify drugs that based on the
evidence we have, appear to be less risk in the
totality of their evidence, looking for consistency
study to study to study, and then, in a rational
way, suggest these are the drugs we think that the
public should use, and these other drugs, well,
then you have to decide do you want them on the
market or not.
I am not really going to comment on that,
but I think that is the approach I would take. I
would be trying to sort of identify right off the
bat the bad actors and let's get rid of them.
Things that look like they may actually be
safe, and when I say "safe" now, I mean that they
don't appear to have cardiovascular risk,
them and shift the market towards that, and then
deal with the others.
DR. WOOD: Dr. Friedman.
DR. FRIEDMAN: Thank you. Several
comments. First, as both Dr. Graham and Dr. Platt
have mentioned, observational studies are
essential, but they have a number of limitations,
and because of those limitations, it is easy after
the fact to critique away those whose results you
don't much care for as we have seen.
But a couple of other points. One, can
these particular drugs, their primary use, we are
dealing with chronic conditions, conditions that
last years, sometimes many years, and so the drugs
are intended for use over those many years
Yet, most of the clinical trials we heard
reported yesterday are 12, 18 weeks, a few of them
go longer. You mentioned that one of the reasons
we didn't see the problems early on may be numbers,
and I agree that is potentially it, but the fact is
we didn't see problems arise in the studies until
14, 18 months.
We often see analyses by patient years of
exposure. In this particular setting, I don't know
whether patient years are always equal to patient
years, and therefore, I guess I would say why
aren't we doing more bigger, longer randomized
clinical trials for these chronic conditions?
DR. GRAHAM: I am not speaking for the
DR. WOOD: We got that. Don't say it each
DR. GRAHAM: Okay. I think they are
incredibly expensive and companies don't want to do
them. There is not an incentive for them to do
them, and you would have to talk to the people from
the new drug side of the house, but the fact is
that they are not requiring them.
So, that is a very legitimate question.
You know, working as an epidemiologist,
we try to
make do with what is, and so we use the
observational data. You are going to get better
quality data if you are able to do this, but just
to give you a sense of the size of the studies that
I think you would need to do, I mean you talked
about before that you have the APPROVe study and we
see no effect until 18 months, but there was study
090 that was talked about briefly by Dr. Villalba
yesterday. It was a 6-week study at 12.5 mg, and
it showed a difference, the suggestion of a
cardiovascular risk within the 6-week study at the
lowest dose. Now, it's a small study, as well.
But I am just saying that to say that I
think the epidemiologic data, in my mind at least,
answers the question about when the effect begins.
The question is if you want to have--this is the
philosophy--how much certainty do you need to make
Right now, when it comes to efficacy, the
effect, does the drug work, you are looking at the
lower bound of the confidence interval, and you
want to see is that different than 1,
because if it
is, then, I will conclude with 95 percent certainty
or greater that the drug actually has an effect.
When it comes to safety, you are doing the
same thing. You are looking at that lower bound.
You want this 95 percent certainty that the drug is
harmful. You are presuming that the drug is safe
rather than let's presume we want to do no harm to
Let's start off at the beginning assuming
that the drug isn't safe, and we want to have a
certain level of confidence about how bad this drug
could be, and that is still tolerable to us. We
want to cap the risk. It will be a completely
different way of looking at studies for a safety
perspective, one that actually gives a priority to
safety and it maximally protective of patient
safety, just as that high standard for efficacy is
maximally protective of patient safety, because by
keeping drugs off the market that don't work, I am
protecting patients from unsafe drugs, and if I
have pneumonia and I am given a drug that doesn't
work, well, I get a harm from that.
But that's philosophy, and I think it's an
outcropping, it's a development, a natural
extension of the development of clinical
the United States where the focus has always been
DR. WOOD: Let's try and keep both the
questions and the answers reasonably short,
otherwise, we will be here until after midnight.
DR. GRAHAM: I apologize.
DR. WOOD: That's okay. Let's go on to
DR. ELASHOFF: First, I have one comment
and then one question. In terms of confounding,
just because you put a lot of variables in some
model doesn't necessarily mean that you have
adequately removed the confounding effects even of
The second has to do with Dr. Graham's
slide 13, the excess population risk. I note that
the Ingenix data has been left out of the bottom
DR. GRAHAM: That's right, because for the
DR. ELASHOFF: Yes, but the negative sign
needs to be on the slide, otherwise, it's a biased
DR. GRAHAM: Well enough. I take that
correction. Okay, fair enough.
DR. WOOD: Dr. Bathon.
DR. BATHON: Yes. As we weigh the
risk-benefit ratio of these drugs, one
consideration is that there are subgroups of
patients in which the benefit might outweigh the
With that in mind, it would be helpful for
us who are not cardiologists or epidemiologists to
be able to put the relative risks that we have been
seeing over the past day or two in context with all
the cardiovascular risk factors that exist.
So, for example, if you were take the
presumed relative risk of rofecoxib of 1.5 to 2.0,
at least at the higher dose, and put it into some
context for us of the 20 to 40 cardiovascular risk
factors that exist in a sort of rank
would you put the COX-2 drugs?
DR. GRAHAM: For the high dose it would
be probably more significant than smoking or
diabetes or hypertension, maybe more important than
the combination of several of those factors in a
patient. For the lower dose, it is probably more
than hypertension, a little less than diabetes, and
a little less than smoking.
I know, David, you know the cardiovascular
risk factors much better than I do, and so does Dr.
Hennekens, but that would be my ballpark on that.
DR. WOOD: Dr. Abramson.
DR. ABRAMSON: Yes. I want to go back to
the question Dr. Shafer asked about if these
classes of drugs or this group of drugs could be if
there was a hierarchy of risk, and you first
answered that you thought the coxibs were more
risky, but I would challenge you a bit simply on
your own presentation.
I would like you to discuss your data,
because you then went on to talk about how
indomethacin has a risk, Meloxicam has a
Based on your data, the message that came through
is that there was a dose response risk for
cardiovascular outcomes, that we saw it within the
coxibs, but we also saw it where the data were
available in the non-selective NSAIDs.
There are data that we have seen that
ibuprofen might increase risk. We didn't talk
about the McDonald and Way paper that in
cardiovascular discharge patients, people given
ibuprofen had a higher mortality 2-fold.
So, as the smoke clears, I am not sure
that the simple answer that the coxibs were
different was actually supported by your data, nor
your ultimate explanation. Can you defend that?
DR. GRAHAM: I think you are accurate.
What I was saying was I was referring, I think, to
the underlying COX-2 hypothesis and that it is
clearer, I believe, and, well, maybe it's an
overgeneralization, because we have the n that we
are viewing is so small, that looking at rofecoxib
as sort of the example where we can see very
clearly the dose response at all the
levels and its
progression, and understanding its mechanism of
action, and then seeing similar things with
celecoxib and valdecoxib.
I think what you are saying is fair.
Maybe a better thing to say is, in the end, that
you do need to look at it drug by drug.
What I was saying, though, in that answer
that I gave to Dr. Shafer, I was really talking
more about sort of the COX-2 mechanism and the
coxibs as being, in quotes, "COX-2 selective," but
I think your observation is correct.
DR. ABRAMSON: Add to that, that although
there is a hazard that we don't accomplish a lot by
simply saying the class of NSAIDs may have risk, I
think we have under-appreciated that over the last
It is not that different from the
mid-nineties recognizing that there was a class GI
effect of these drugs, and that compared to
placebo, whether it's hypertension or long-term
potential adverse outcomes, this is something that
doctors have to be aware of, even the
of checking blood pressures when you put people on
any nonsteroidal drug.
So, I don't know that it is necessarily a
bad outcome to call attention to this class effect
until we get better information on each of these
DR. WOOD: Dr. Day.
DR. DAY: I have a comment about recall
bias and reverse recall bias. There is a huge
research literature on how memory works both in the
laboratory and in the every-day world, and there
are two phenomena that have been very heavily
studied that I think might be relevant here.
One is called flashbulb memory, and the
idea is when an emotional spectacular event
happens, such as when you first learn that JFK had
been shot, or the Challenger blew up, or the World
Trade Center had been hit, it is as if the old-time
flashbulb from an old-time flash camera went off
and captured all the details, and you remember all
of those details forever afterwards associated with
the event that you might otherwise have
even noticed or forgotten.
So, there is a lot of research on
flashbulb memory that shows many of those details
are indeed correct, but some are notoriously false.
For example, there are accounts of people who
remember a certain even with great emotional
aspects to it, and they remember listening the
world series when so-and-so is pitching and it was
the bottom of the 9th, da-da-da, all these details,
and when you go back and check the evidence of what
was going on, on that day and time, that particular
game was not on.
So, that phenomenon number one, flashbulb
memory, and the second is eyewitness testimony.
How you ask a person a question will affect what
answers you get. So, if you have in the courtroom,
someone who has witnessed a car accident, if the
lawyer asks this witness, "Did you see the broken
glass," then, the witness is more likely to say yes
than if you ask, "Did you see any broken glass,"
because the broken glass presumes that there was
some, and so forth.
So, I take your points seriously about
potential recall bias and reverse recall bias, but
we would have to look at both, whether
there is an
emotional component or not. Those who have had an
MI, for example, would have that most likely, but
also how the questions are asked in these surveys,
and it is not trivial how you ask people questions
about were you taking any medications or were you
taking medication X, and for how long, and what was
the dosage, and so on.
So, I don't think that these details are
always published with the studies, and I would like
to encourage people who ask people about their
experiences with drugs, take a look at the memory
literature for some of these points.
DR. WOOD: Dr. Gibofsky.
DR. GIBOFSKY: Dr. Graham, I am wondering
if you separated out your populations based on the
indication for which they were taking the drug. I
ask that because we heard yesterday, and it's well
known, that rheumatoid arthritis is itself a risk
factor for cardiovascular disease, and
of coxibs, in particular celecoxib, are usually
given to patients with rheumatoid arthritis as
opposed to osteoarthritis.
So, I am wondering if you look at that in
DR. GRAHAM: Several of the studies that I
reviewed have looked at the indication, but in
automated claims data, it is very difficult to be
sort of be sure does the patient have rheumatoid
arthritis, and there are different algorithms one
could use, but in general, what has been found in
the studies where they have looked at that, that
the prevalence of rheumatoid arthritis in the study
populations has been low, very low, and that its
impact on the results when they adjusted for it
didn't materially affect things.
Now, in the California Medicaid study, one
difference in that study was that our base
population was limited to patients who had
diagnoses of osteoarthritis or rheumatoid
arthritis. Now, these are diagnoses, and so does
that mean that they really had
rheumatoid arthritis, I don't know, but when we did
try to eliminate in that study at least were the
people who might be using an NSAIDs for a muscle
injury, a short-term complaint as opposed to a
In none of those does the presence of
rheumatoid arthritis seem to affect things, but
again I think the prevalence is pretty low in all
of these studies.
DR. GIBOFSKY: One quick question for Dr.
Platt, if I might. I need to understand the
concept of survivor bias somewhat in that I think
there is a difference between a patient who is
drug-naive, then put on a drug, and then an event
happens versus a patient who may have seen a drug,
perhaps seen another drug after that, 3 or 4 agents
of the class, and is then switched to another agent
and something happens.
I think we have talked about remote versus
current, but there is also this issue of sequential
effect, and I am wondering how you deal with that
as a survivor, particularly because of
the paper we
saw a few weeks ago in the Archives suggesting that
discontinuation of an NSAID may itself be a risk
factor for a thrombotic event.
DR. PLATT: Your point is exactly right.
I think that the concern about survivor bias is
that if we think that some people are particularly
susceptible, which is almost certainly the case,
then, if we start the clock after a person has
already been exposed to a drug or to one that has
the same effect, then, it is very much less likely
that those individuals will have a problem.
That may be the explanation, for instance,
for the reason that the literature was so badly
wrong about postmenopausal estrogens and heart
disease, that most of the epi studies started with
I think the majority of the studies that
we were reviewing here, these were individuals who
are known to have had at least a year of prior
experience without exposure to the nonsteroidals.
Your study in Kaiser I know was an
exception cohort at least with regard to
a year of
prior history, but I am not aware that any studies
have a longer drug-free prior interval than that.
DR. WOOD: Dr. O'Neil, do you want to
comment particularly on this?
DR. O'NEIL: Yes, this is an important
point and a lot of things have been covered in
Richard's and David's presentation, but one thing I
think that is relevant that Richard did not cover,
that is, the value of a randomized trial, is the
ascertainment and follow-up, and knowing the status
of individuals in the sense of who goes off therapy
and how long they stay on therapy.
That is very critical relative to the time
dependency of the risk. It was mentioned, for
example, the use in the observational sense of
recent and remote and current use. Those are all
terms that are nice, but they don't get at the
issue that we are trying to get at with regard to
the clinical trials, and that is essentially when
does time zero start for you.
So, I think the appropriate question to
ask is what is the duration of exposure
initial exposure to the drug, because I think that
is very relevant to the interpretation of the three
clinical trials that we have, two of which are in
There is a rofecoxib-naproxen control
trial for one years, there is a placebo-control
trial in polyp prevention for three years, and
there is a placebo-control trial in Alzheimer's
disease for four years, and the time dependency
from time zero matters as you have seen in the
It is relevant to the excess risk
calculation. So, I would ask the committee, as
well as I would ask David, of the observational
studies that you have reported, how many of them
are cohort studies, and how many of them are able
to identify new initial use, and then track
continued use for that individual, so that one
could look at the relationship between the hazard
rates and the hazard ratios that we are identifying
in the randomized trials and match that to the odds
ratios that are being reported in the
DR. GRAHAM: On one of my initial slides,
you can see what the cohort studies were, and in
some of the nested case control studies, you are
also able to get the time on drug. Actually, in
Wayne Ray's cohort study, most of these cohort
studies include prevalent and incident users, so
they will do what is called a "new user"
subanalysis, which is to try to get to this issue
of when does time zero begin.
We addressed that problem in our study
here by the inception cohort design in our base
population, so that we can identify what time zero
was for the cases.
Now, none of those studies presented data
in the form of a survival analysis, which I think
in the end, that is what Dr. O'Neil would like to
DR. O'NEIL: No, my question is not so
much in survival. I don't believe, and again that
is why I am asking you, I don't think any of those
studies were designed or able to capture
question I am asking.
In fact, if I am not mistaken, in the
Wayne Ray study, he defined new use, but he did not
define any time from new use, which is essentially
critical to when those risks start.
DR. GRAHAM: That study isn't cited as one
of the studies where we are able to derive that
information. This slide was a slide that I
presented to show that from the epidemiologic
literature, those studies where the investigators
had identified when time zero began for rofecoxib
use, and they didn't present the data as a survival
analysis, but they identified when time zero began
and then, in various ways, showed you either what
the distribution of the cases were, so that you can
see that it was impossible for the risk to have
been delayed for 18 months, because nobody in the
study used the drug for 18 months, or they parsed
time out and looked at the first 30 days of use
from time zero, and found the risks that they found
But you are right, those
designed that way, and we haven't had time in our
Medicaid study to do these analyses yet, but we
have the data to now do the cohort study and time
to event, so we will have an opportunity actually
within the data to actually compare and look to see
exactly the question you are driving at.
But I would say that from the published
data, in each of these studies, time zero for
rofecoxib was identified and in some way or
another, information that I think could be useful
to the committee in establishing when does risk
begin was contained in those studies.
DR. O'NEIL: Well, the other point here,
which is the value of clinical trials, and it was
the question that was discussed yesterday with
regard to the intent-to-treat analysis, and that is
to say to analyze all outcomes once randomized to
the trial regardless of whether you want to track
the individual to 14 days post-exposure.
You can't really maybe get access to this
information in the observational studies. That is
a conjecture, but it's one or the other
it was interesting to the comment, whether one
would believe this or not, that discontinuation,
discontinuation from an NSAID alone raises risk.
If that were to be the case, that is a
different analysis altogether.
DR. GRAHAM: In that actual paper, it
could be that people were discontinuing the NSAIDs
because they were having chest pain and it was
being interpreted as dyspepsia or something, and
then they go to have their infarct.
I mean you are right about that, but this
is the nature of how epidemiology is done, and I
can't change it. I didn't make the rules, I am
only following them. Nobody is arguing that
clinical trials, if they could be large enough,
that they would give all of us answers that we
would have greater comfort trusting what they are
What I am proposing is that we don't have
that kind of data in the clinical trials. As large
as the clinical trials are, for the questions that
this committee is facing, you don't have
you need, and what I presented is the epidemiologic
data, and it is imperfect and it has its warts, and
that is why I would emphasize looking at
consistency and trying to sort of derive from that
a general sense.
I mean does it make pharmacologic sense
that you would have an 18-month delay? I mean I
guess I suppose it depends on what you think the
mechanism of action is for the underlying disease,
but even in the clinical trials, study 090 was 6
weeks long, 12.5 mg, and it had a cardiovascular
DR. WOOD: I am happy to facilitate a
discussion among the FDA, but I think we would
rather hear from the committee right now. Dr.
Farrar, you are next.
DR. FARRAR: I think that the
recommendations of the committee tomorrow are going
to depend on the assessment of the overall risk and
the overall benefit of this class of drugs.
As a researcher and after all the data
that has been presented, I am more than
accept the fact that there are serious risks even
of death from taking NSAIDs. In fact, though,
there are serious risks in taking any medication at
For some of the NSAIDs, it is
cardiovascular risks, for some of them it is
clearly GI bleeding. As a doctor, though, who
takes care of patients, I know that treating pain
or not treating pain and not treating the
disability of arthritis also has very serious risks
even of death.
Given the extensive work that you have
done, on the risk of both the cardiovascular and
the GI bleed, I wonder what level of risk is
acceptable you, and remembering that the only other
drugs that are really available is analgesics or
narcotics, and the only other drugs that are really
available in terms of limiting inflammation are
biologics or immunosuppressants, I wonder what drug
is safe enough that you would recommend that I
actually would be able to use it in patients to
prevent some of their suffering.
DR. GRAHAM: Well, I am not going to give
a product endorsement. A couple of things, though.
DR. WOOD: Try and make it brief.
DR. GRAHAM: One, the benefits of the
treatment for the traditional NSAIDs compared to
the COX-2 selective NSAIDs with GI bleed, we have
clinical trial evidence that suggest that there may
be a difference, but here, to me, is an anomaly.
Rofecoxib got the indication for being
GI-protective, celecoxib didn't based on the
clinical trials data you guys looked at yesterday.
There are two published studies in the
literature looking at what I would say is actual
benefit. There, they were looking at
hospitalization for GI bleed--they didn't look at
death from GI bleed, but I wish they had--but
hospitalization for GI bleed, and what they found
was, in both of these studies, that celecoxib was
actually more beneficial, you know, lower rate of
hospitalization for GI than rofecoxib. So, that is
the population, two large studies.
You have got your clinical
would have said it should be the reverse. So, I
throw that out as one sort of conundrum.
The second is that I don't think that the
actual benefits of these drugs are understood well
enough to sort of try to weigh these very well.
The case fatality rate for myocardial infarction in
the United States approaches 40 percent. The case
fatality rate for hospitalized GI bleeding is
probably somewhere around 5 or 10, it is a much
lower case fatality rate.
Nobody that I have seen anywhere has sort
of worked this out very well, so I would submit to
you and to the committee that you actually know
very little about the actual population benefit of
any of these products.
DR. WOOD: I don't think we are going to
get an answer to that question, so let's move on.
DR. NISSEN: Let me briefly answer the
earlier question about what does the hazard ratio
of 1.5 to 2 mean. Before I came to the meeting, I
made a point to look this up, because I
would be very relevant.
It is equivalent to raising a cholesterol
from 200 to 260, or taking up smoking. Another way
for the committee, I mean as a cardiologist I have
to deal with this all the time, the most effective
drugs we have for prevention of morbidity and
mortality are statins, and they reduce risk about
So, a hazard ratio of 1.5 to 2 is really a
very, very big effect when you are talking about
the most common cause of mortality, and that is why
this discussion is so important.
Now, my question is this. We are going to
be asked to balance risk and benefit, and so the
magnitude of the hazard ratio is very important to
all of us, and I am trying to reconcile what we see
in the randomized control trials with, let's take
rofecoxib for a moment, where it looks like the
hazard ratio in the randomized trials is in the
range of 2, 3, 4, maybe even higher, and in the
observational data it is significantly lower.
I would like to propose a
you and just ask you if you think this is right.
In your observational data, you are looking at
mostly short-term exposure, so you are looking at
less than 12 months typically of exposure.
It may well be that the hazard increases
over time, so that by the time you get to 18
months, you can actually see it in a much smaller
randomized trial, and so it doesn't rule out the
possibility that, in fact, both observations are
right, that, in fact, there is an early hazard, but
that early hazard has a smaller hazard ratio than
the hazard at 18 months or 24 months or even 36
months, and if we ever were to look out 5 years, it
might still be increasing.
Do you think that is a reasonable
DR. GRAHAM: I think more likely it is,
that in your clinical trials, early on you don't
have enough power to distinguish the risk. The
hazard is the same, but the lines are closer
together, because we are closer to the origin.
I think one other explanation
lower risk ratios in observational studies, I would
think is more likely due to misclassification of
exposure and misclassification of outcome. It is
likely to be nondifferential, so it would tend to
reduce the odds ratios and relative risks towards
Exposure, because people are going to take
it, a lot of these people are taking it on a prn
kind of basis. In a clinical trial, you have a
greater certitude that they are actually taking it
every day. That introduces a lot of
misclassification, so the a priori hypothesis going
into an observational study, with misclassification
going on, you are fighting an uphill battle to see
DR. WOOD: We have got lots of people who
want to ask questions. I want to make sure that
the people who are asking questions have questions
they want to ask for clarification of the speakers
who have spoken rather than just general points.
DR. D'AGOSTINO: I have a couple of
questions along the way here. I have spent a good
part of my career in the Framingham Heart Study,
and it's an epidemiological study and a cohort
study, and we take joy when somebody runs a
controlled trial on hypotheses and then later on
The first question is I am concerned that
even though you have gone through this careful
analysis, your conclusions are no apparent effect,
probably increased effect, probable increased risk.
They really don't help us in the sense of pinning
things down. We have a couple of very strong I
think good studies, the APPROVe study and the APC
study as placebo-controlled trials.
Tell us quickly where is the weight of how
we should look at these two pieces, the controlled
trials we have versus what you have produced.
DR. WOOD: Really quickly.
DR. D'AGOSTINO: Really quickly, it can be
DR. GRAHAM: My belief is that for the
controlled clinical trials, for the
levels of risk
that we are concerned about, that they do not have
the statistical power early on to show risk
DR. D'AGOSTINO: I think Bob O'Neil's
comment is very important here.
The other two points, and again I will
make them quick, I am very concerned about the high
dose effect you have, and I am really concerned
about the MI and the number of cases. I mean blood
pressure, cholesterol, diabetes, smoking, this is
what drives people to have heart attacks and what
have you, and that is completely missing on your
assessment of how many new cases, so I guess it is
more of a comment that I am really concerned that
that sheet needs sobering interpretation.
DR. GRAHAM: But it was based on the odds
ratios and relative risks where those factors were
adjusted for, so as well as they are adjusted for,
that is what the projection represents, the excess
DR. D'AGOSTINO: Yes, but I mean the
comment was made by you, throwing in the
doesn't necessarily adjust for them.
The last one, you made a very nice point
about the cardio-protective effect, and you tried
to show that these uses, and what have you, somehow
or other all have the same risk, and your
interpretation that there must be some confounding
going on, why doesn't that hold for all the studies
you gave, why don't that hold for the Solomon
study, which you thought was a great study, yet,
this one result you don't like?
DR. GRAHAM: For what, the Kimmel study?
DR. D'AGOSTINO: Wasn't it the Solomon
study that had the naproxen as the
DR. GRAHAM: That is because the cardio
protection was present when they were on the drug
and when they weren't on the drug.
DR. D'AGOSTINO: I understand what you are
saying, but if that's a problem, then, it means
there is some confounding going on.
DR. GRAHAM: No, it's selection bias.
DR. D'AGOSTINO: Well, it's selection
bias, but why isn't it for the whole study? Why do
you throw out a result you don't like and keep all
the results you like?
DR. GRAHAM: No, that is not what I did.
I pointed out a result where they showed the
presence of the selection bias. In other studies,
the Ingenix study is the only other study that
looked at this. I don't have a slide of it.
DR. D'AGOSTINO: I don't know if it's a
selection bias or misinterpretation of the data.
DR. GRAHAM: Well, to me it looks like
DR. WOOD: Let's continue that
DR. MORRIS: David, would you go to slide
14. That is the risk, the duration of use. I
think one of your points was that if you look at
your study, tell me if I understand this right,
that with the lower dose, that the median time to
an AMI is sooner than with a higher dose, did I
understand that right?
DR. GRAHAM: Yes.
DR. MORRIS: A month?
DR. GRAHAM: Had more cases, a greater
proportion of our cases, but the other thing is
remember, down here, we are talking about 18 cases
or so. The N here is small, the N here is like 58,
and the N here is 10. So, I wouldn't read too much
into the difference.
The more important point is that at the
low dose, nobody was out there beyond 18 months, so
all the action happened before 18 months, and the
same for the others. I see what you are saying. I
can only say that is what our data were.
DR. MORRIS: One interpretation is what
you said earlier, that for this particular drug, we
are talking about, as you said, no safe level. I
was wondering if that is the way you interpreted
it, that because we are talking about Vioxx here,
and there is no safe level, that something is going
to happen sooner, or is it something with the
populations are different.
DR. GRAHAM: The populations could be
different, but I think, you know, you would expect
the higher dose to have a shorter latency to onset
than the higher dose, but the numbers are so small.
DR. MORRIS: Okay, it's a small number
DR. WOOD: So, the answer is too small
numbers at high dose.
DR. BOULWARE: I just want to make sure I
understand something that you had proposed in your
excess population risk slide, if you would put that
As a rheumatologist, I use these drugs in
a population much greater than what you have here
with a 65 to 74 where the risk of an MI is fairly
high in that group.
Did you want us to believe that this
excess risk that you are proposing would be
extrapolated to other population groups, too?
DR. GRAHAM: Well, no.
DR. BOULWARE: Do you have any numbers
that may demonstrate that?
DR. GRAHAM: Well, the answer to the
second is no. This was an example in conversation
with people planning the talk, to try to
people connect with what it means.
Cardiovascular risks go up. I mean in the
next age group higher, the risks are higher. In
the age groups lower, they are lower, but
cardiovascular risk begins to increase in the 40s.
DR. BOULWARE: I understand, but it
wouldn't be a linear type of thing.
DR. GRAHAM: No, the background risk isn't
linear, the relative risks, though, are adjusted
DR. BOULWARE: Because one of the
questions we will be faced with is are there
subpopulations or groups that these may be safe in,
and I just want to make sure I understand the
relative risk in different age groups.
DR. GRAHAM: Nobody in any of the studies
where they have looked at it have reported effect
modification, which would be that the level of risk
differs at different ages.
DR. BOULWARE: One more question here. I
want to make sure I understand. I think I heard a
comment that says when the risk approaches
2.0--maybe I just assumed that you said this--that
it was an unacceptable level of risk.
Is there ever a case where a
drug may have
a clinical benefit in which that risk is
acceptable, because for the patients I see, not
giving them any of these drugs will confer a great
deal of risk on them, and physical impairment, and
we have studies that show that the functional
classification of rheumatoid arthritis patients
carries with it a significant mortality as that
class goes up?
DR. WOOD: I think that is a question for
the committee to answer rather than Dr. Graham.
Let's move on to Dr. Cryer. Do you have a
DR. CRYER: I do. The comment and
question I have of Dr. Graham addresses an issue
that I think is an important difference between the
observational studies and the prospective
and this difference relates to assessment of drug
compliance and missed doses, and I think it is
critical as it relates to assessing drugs which
potentially affect platelet function.
A huge difference, as you know, between
aspirin's effect and every other NSAID including
the COX-2 inhibitors, is that with the non-aspirin
NSAIDs, as soon as you remove the drugs, whatever
potential effect they would have had on the
platelet are immediately reversed.
So, with naproxen specifically, my
preconceived bias, which may be wrong, but my
preconceived bias based upon everything I know
about the pharmacology and the things that Dr.
FitzGerald has reviewed for us, is that it should
have some mild anti-platelet effects which would
only be present when the drug is on board in the
So, the specific question is, in the
observational studies, recognizing that in clinical
practice people miss doses of their NSAIDs, they
are not taking their NSAIDs consistently,
you account for the missed doses in the
observational studies recognizing that this could
potentially lead to a mitigation of whatever
negative effect or positive effect that they may
DR. GRAHAM: It ends up being
misclassification. Generally, what that means is it
will force the observed level of risk, the relative
risk of the odds ratio closer to 1. So, if we had
an increased risk, it would make it lower, if we
had a protective effect, it would sort of make it
higher, closer to 1.
DR. CRYER: Right, we agree on that. The
specific question is, is there a way to actually
recognize or to account for when people do not take
their doses in the observational databases?
DR. GRAHAM: No, there isn't, so when you
are studying, say, an increased risk, that is why I
said if you find something, you have to realize you
found it despite the misclassification.
DR. WOOD: Okay. Dr. Domanski.
DR. DOMANSKI: I will save it for
DR. WOOD: Okay, great. Dr. Furberg.
DR. FURBERG: No.
DR. WOOD: Okay, great.
Dr. Temple, who does speak for the FDA.
DR. TEMPLE: I am just asking questions.
A couple. Actually, one point is it seems to me
that since we expect that people are going to be
getting one drug or another, comparisons with other
NSAIDs seems like as good a comparison as we should
make. You might want to leave out indomethacin if
you are worried about it. That's one thing.
I guess my main question, though, is
everybody has paid appropriate lip service to the
idea that very small differences are hard to
interpret in epidemiology.
People have said 1.5, 2. Actually, I
notice in one of his editorials, Dr. Furberg cited
a paper of mine where I said anything less than 2
really needs a lot of questions. Jerry Cornfield,
who sort of invented all this stuff, used to say 3.
Well, we are talking about
here that are 0.1 differences, not that they
wouldn't be hugely important if they were true,
that is absolutely true. So, I guess I want to
know what Richard and you make of all this, because
the numbers are very small, and yet, just as an
example, there is a very great consistency that you
cite that celecoxib looks sort of okay, but you
found one study where there is a little hint that
maybe the higher dose is a problem, and since
probably we all think dose response is likely, that
looks good to you.
DR. GRAHAM: Two studies, there were 2.
DR. TEMPLE: Okay, 2. The valdecoxib
data, which shows nothing, doesn't look so good
because we probably all believe that there is
likely to be a class effect.
What I am asking is, with numbers like
this, how do you know what to do with them? That
seems very fundamental for the epidemiology.
DR. WOOD: But, Bob, there are 4
randomized clinical trials here, and your comments
don't apply to them, I assume.
DR. TEMPLE: No, they don't, although they
are not perfectly consistent either. But, no, I am
asking, what do we make of differences of
magnitude with everybody having given lip service
to the idea that small differences are hard to
interpret, and yet we seem to be enthusiastically
endorsing them, so I just want to know what Richard
and David think about that.
DR. GRAHAM: Rich, do you want to go
DR. PLATT: I think we have to be cautious
about how we interpret it, so I would say the
finding of a relative risk of 3 in an epidemiologic
study, as David found, is meaningful--
DR. TEMPLE: For high dose rofecoxib.
DR. PLATT: For high dose rofecoxib.
DR. TEMPLE: I would not dispute that at
DR. PLATT: It seems to me that in that
context, that a dose response effect, that the
information about lower doses gains weight by
borrowing from that. I think that is also worth
keeping in mind when, in other studies that are
working in that range that make us all nervous,
there appears to be a dose response effect.
It is the kind of consistency that makes
the study, in my mind, be worth more attention. I
think there is something to be said for giving more
weight to relatively small excess risks if they are
seen in a number of different environments when we
can't have good reason to think that there is a
similar kind of biases that might be contributing
After that, I agree with you. We are in
relatively difficult terrain. I think that it is
not the same as no data, though. I think we ought
to distinguish between the situation in which we
have no evidence from ones in which we have
relatively weak evidence.
We didn't talk at all, for instance, about
the enormous number of spontaneous reports of
myocardial infarction following exposure to
nonsteroidals. There are thousands and thousands
In my mind, they don't contribute at all
to the discussion, whereas, I think these need to
be weighed in the mix when we don't have clinical
trial information to depend on.
DR. GRAHAM: My answer is similar to his,
but I think that what you are identifying is, is
that we are hitting or at least right now the
frontier is the limits of what the available tools
we have to define the levels of risk that we are
We are talking about small levels of risk
that turn out for this particular event to be
enormously important in a population level. If you
are talking liver failure, we wouldn't be having
this conversation. For that reason, it becomes
important and what I would say is sort of
emphasizing what Rich said, is I would be looking
for consistency across different studies, and if I
found a number of studies, say, as with Indocin,
for example, to me, that is more persuasive.
If I found a number of studies that
pointed to a particular set of NSAIDs that seems to
have low risks, I would take comfort in
that in the
absence of perfect information. I mean some light
in a storm is probably better than no light In a
DR. TEMPLE: I take it if the differences
were at the level of 10 percent, 1.1 versus 1.2--
DR. GRAHAM: I am thinking more in a very
qualitative sense of things that they seem to
cluster around 1. I mean 1.1 for ibuprofen, it
could be that, for example, may naproxen increases
the risk 3 percent in the real world, we are never
going to figure that out, maybe ibuprofen increases
it 10 percent or 15 percent, maybe we could figure
that out, I don't know, but there is going to be a
place where qualitatively, if we see enough studies
kind of sort of pointing to the same place, you
know, most of them, they are not all going to say
the same thing, there is going to be these
conflicts, just like we have in clinical trials
But if most of the compass arrows are sort
of pointing in the same direction for particular
NSAIDs, I think those are the ones that
that I sort of place on a suspect list.
DR. TEMPLE: So, very low hazards need at
least multiple support before they are credible.
DR. GRAHAM: I think so, and I think that
you want to try to encourage to collect that
information sort of to test that out.
DR. TEMPLE: Alastair, could I take half a
second to answer a question Larry raised before?
DR. WOOD: Sure, a second.
DR. TEMPLE: Well, it's a very good
question, you know, if the drug is going to be used
forever, why don't you study them forever. The
only thing I would point out here is that what sort
of started people thinking was VIGOR, and VIGOR
didn't take 3 years to show anything, it showed up
in 9 months.
So, what you have seen is for, say,
lumiracoxib, a humongous study of about the same
length, but, of course, they didn't know about
APPROVe, did they, and whatever you think APPROVe
means, whether Bob is right that it's late, or
David is right that there weren't enough
people were pointing toward a study that by every
reasonable thought, if you think platelets are
involved, ought to be long enough to show things
But then you form a new hypothesis once
you have APPROVe, and you have to adapt it, and I
think that goes on all the time. It would not be I
must say for most things my first thought unless
you are looking for cancer that you need a 3-year
study to find it, but maybe you learned that it
Just for what is worth as an example, you
can't get an anti-arrhythmic drug approved in this
country without showing that you don't alter
survival unfavorably. One result is there are
hardly any being developed, but, you know, we had
bad experiences, we didn't like the results of
CAST, so you change.
I think there is no doubt that things
evolve and you have to expect that, and APPROVe,
depending on what you think of it, changes the
nature of what you expect.
DR. GRAHAM: Bob, just one point on that.
I think if the APPROVe study had been 5 or 10 times
larger than it was--I am talking about
now--you would be able to answer with much greater
confidence what is happening month 1 to 18. I
guess what I am saying is that you could also
shorten the latency to identification of a problem
if it turns out that the risk is early on.
DR. TEMPLE: David, I think that is
entirely possible, and if it involves platelets, I
would believe you, but if it involves a small,
long-term increase in blood pressure, then, I am
not so sure.
DR. GRAHAM: Right, but we saw yesterday--
DR. TEMPLE: We don't know.
DR. GRAHAM: We don't, but if it's
prostacyclin, that effect could occur immediately.
DR. TEMPLE: Yes, but the blood pressure
effect could be delayed.
DR. WOOD: Right. So what, Bob, you are
saying is that it is easy to be a Monday morning
quarterback, but the data were not there before.
DR. TEMPLE: I would never be that rude.
DR. WOOD: I think you are right.
DR. STEMHAGEN: I would like to clarify a
couple things. First, I am a little concerned in
terms of the unpublished data. I appreciate that
we are able to get data very quickly, right at the
minute that it is being generated, but none of us
have had a chance to really review that, so I do
have some concerns about the weight putting on this
unpublished data when the rest of us haven't had a
chance to look at it.
I think there needs to be some
clarification. There was some discussion about the
recall bias, and so on. Certainly, there is a major
concern about that in case- controlled studies, and
we don't have the questionnaires, but there were a
lot of sort of subanalysis done in the Kimmel
study, about trying to look at whether recall bias
is a problem, and I am not sure that you have
highlighted that enough that looking at all those
different things, there were really no
Similarly, in the Watson study, it's a
GPRD study, it is different than a lot of the large
databases, the automated databases.
There is a lot more personal involvement
in terms of the data and the data collection and
the adjudication of results, and I think it just
needs to be clear that all of these studies are not
the same in terms of a Medicare study where we
can't go back and validate records. A lot of them
had a much more careful review, and I am just not
sure that that was totally clear and if you hadn't
read each of the papers.
I would like to just ask a question in
terms of your definition of the inception cohort,