1
DEPARTMENT OF HEALTH AND HUMAN
SERVICES
FOOD AND DRUG
ADMINISTRATION
CENTER FOR DRUG EVALUATION AND
RESEARCH
JOINT MEETING OF
THE ARTHRITIS ADVISORY
COMMITTEE AND
THE DRUG SAFETY AND RISK
MANAGEMENT
ADVISORY COMMITTEE
VOLUME II
Hilton
2
P A R T I C I P A N T S
Alastair J.J. Wood, M.D., Chair
Kimberly Littleton Topper, M.D. Executive
Secretary
ARTHRITIS ADVISORY COMMITTEE MEMBERS
Allan Gibofsky, M.D., J.D., Chair
Joan M. Bathon, M.D.
Dennis W. Boulware, M.D.
John J. Cush, M.D.
Gary Stuart Hoffman, M.D.
Norman T. Ilowite, M.D.
Susan M. Manzi, M.D., M.P.H.
DRUG SAFETY AND RISK MANAGEMENT ADVISORY
COMMITTEE
MEMBERS
Peter A. Gross, M.D., Chair
Stephanie Y. Crawford, Ph.D., M.P.H.
Ruth S. Day, Ph.D.
Curt D. Furberg, M.D., Ph.D.
Jacqueline S. Gardner, Ph.D., M.P.H.
Eric S. Holmboe, M.D.
Arthur A. Levin, M.P.H., Consumer Rep.
Louis A. Morris, Ph.D.
Richard Platt, M.D., M.Sc.
Robyn S. Shapiro, J.D.
Annette Stemhagen,
Dr.PH., Industry Rep.
FDA CONSULTANTS (VOTING)
Steven Abramson, M.D.
Ralph B. D'Agostino,
Ph.D.
Robert H. Dworkin, Ph.D.
Janet Elashoff, Ph.D.
John T.
Farrar, M.D.
Leona M. Malone, L.C.S.W.,
Patient Rep.
Thomas Fleming, Ph.D.
Charles H. Hennekens,
M.D.
Steven
Nissen, M.D.
Emil
Paganini, M.D., FACP, FRCP
Steven L. Shafer, M.D.
Alastair J.J. Wood, M.D. (Meeting Chair)
3
P A R T I C I P A N T S (Continued)
FDA CONSULTANTS
(NON-VOTING)
Byron Cryer, M.D. (Speaker and
Discussant)
Milton Packer, M.D. (Speaker only)
NIH PARTICIPANTS (VOTING)
Richard O. Cannon, III, M.D.
Michael J. Domanski, M.D.
GUEST SPEAKERS (Non-Voting)
Garret A. FitzGerald, M.D.
Ernest Hawk, M.D., M.P.H.
Bernard Levin, M.D.
Constantine Lyketsos, M.S., M.H.S.
FDA (CDER)
Jonca Bull, M.D.
David Graham, M.D., M.P.H.
Brian Harvey, M.D.
Sharon Hertz, M.D.
John Jenkins, M.D., F.C.C.P.
Sandy Kweder, M.D.
Robert O'Neil, Ph.D.
Joel Schiffenbauer, M.D.
Paul
Seligman, M.D.
Robert Temple, M.D.
Anne
Trontell, M.D., M.P.H.
Lourdes
Villalba, M.D.
James Witter, M.D., Ph.D.
Steven Galson, M.D.
Kimberly Littleton Topper, M.S.,
Executive
Secretary
4
C O N T E N T S
Call to Order:
Alastair J.J. Wood, M.D.,
Chair 5
Conflict of Interest Statement:
Kimberly Littleton Topper,
M.S. 5
Interpretation of Observational Studies
of Cardiovascular Risk of Non-steroidal
Drugs
Richard Platt, M.D., M.S. 8
Review of Epidemiologic Studies on
Cardiovascular Risk with Selected NSAIDs
David Graham, M.D., M.P.H. 37
Committee Questions to Speakers 89
Arcoxia (etoricoxib)
Merck Research
Laboratories
Sponsor Presentation
Sean P. Curtis, M.D. 152
FDA
Presentation
Joel Schiffenbauer, M.D. 189
Lumiracoxib
Novartis
Pharmaceuticals
Sponsor
Presentation
Introduction
Mathias Hukkelhoven, Ph.D. 201
Gastrointestinal and Cardiovascular Safety
of Lumiracoxib, Ibuprofen,
and Naproxen
Patrice Matchaba, M.D. 205
Open Public Hearing 236
FDA Presentation (Lumiracoxib)
Committee Questions to Speakers 346
Committee Discussion 410
5
P R O C E E D I N G S
Call to Order
DR. WOOD: Let's get started and welcome
back to another day. We are going to begin as on
the agenda seeing we worked late last
night.
A couple of housekeeping things
first. As
they say in the movie theater, please
turn off your
cell phones. We don't have the one that
sort of,
you know, spars you into space if you do
that, the
ejector seat, but then please don't
answer your
calls in here, so we don't have to hear
the
beginning of your conversation.
Kimberly, are you going to read
the
conflict of interest? Okay.
Go ahead.
Conflict of Interest
Statement
MS. TOPPER: The following announcement
addresses the issue of conflict of
interest with
respect to this meeting and is made as
part of the
record to preclude even the appearance of
such.
Based on the agenda, it has
been
determined that the topics of today's
meeting are
issues of broad applicability and there
are no
6
products being approved. Unlike issues before a
committee in which a particular product
is
discussed, issues of broader
applicability involved
many industrial sponsors and academic
institutions.
All special government employees have
been screened
for their financial interests as they may
apply to
the general topics at hand.
To determine if any of the
conflict of
interest existed, the agency has reviewed
the
agenda and all relevant financial
interests
reported by the meeting participants. The
Food and
Drug Administration has granted general
matter
waivers to the special government
employees
participating in this meeting who require
a waiver
under Title 18, United States Code
Section 208.
A copy of the waiver statements
may be
obtained by submitting a written request
of the
agency's Freedom of Information Office,
Room 12A-30
of the
Because general topics impact
so many
entities, it is not practical to recite
all
potential conflicts of interest as they
apply to
7
each member, consultant, and guest
speaker. FDA
acknowledges that there may be potential conflicts
of interest, but because of the general
nature of
the discussions before the committee,
these
potential conflicts are mitigated.
With respect to FDA's invited
industry
representative, we would like to disclose
that Dr.
Annette Stemhagen is participating in
this meeting
as a non-voting industry representative
acting on
behalf of regulated industry.
Dr. Stemhagen's role on this
committee is
to represent industry interests in
general, and not
any one particular company. Dr. Stemhagen is vice
president of Strategic Development
Services for
Covance Periapproval Services, Inc.
In the event that the
discussions involve
any other products of firm not already on
the
agenda for which FDA participants have a
financial
interest, the participants involved and
their
exclusion will be noted for the record.
With respect to all other
participants, we
ask in the interest of fairness that they
address
8
any current or previous financial
involvement with
any first whose products they may wish to
comment
upon.
Thank you.
DR. WOOD: Thank you.
Let's go right to the first
speaker, Dr.
Platt, who is going to tell us about
observational
studies.
Interpretation of Observational
Studies of
Cardiovascular Risk of
Nonsteroidal Drugs
Richard Platt, M.D.,
M.S.
DR. PLATT: Thanks.
The framers of the
meeting thought it would be useful at
this point to
have a discussion about observational
studies to
put us all on the same page.
There was a view by some that
the
expertise around the table might be
uneven and it
would be worthwhile to have some
discussion about
some of the basics. It is clear that that is not
the case.
I realize that a number of the
people here
have written a book and several of my
teachers are
9
here, so to that extent, I think we can
either make
this a quick discuss or use this as an
opportunity
for a real interactive discussion,
because there
are some hard questions here and no
matter how we
sort we out, we are going to be left with
less than
in the way of firm answers than we would
like.
I also understand that there is
a point of
view that says that there are lies, damn
lies, and
observational studies, so part of what I
think is
worth doing is using this time maybe to
take our
temperature about whether and under what
circumstances we can put weight on
observational
studies.
We saw a version of this slide
last night
actually in the last presentation about
why perform
observational studies at all, because I
subscribe
to the general view that all things being
equal, a
clinical trial, a randomized trial is
more
credible, provides more information than
an
observational study.
The problem is all things
aren't always
equal and so there are reasons to ask
what we can
10
learn from observational studies.
I think the most important of
them is no
matter how well a clinical trial is
designed, the
individuals who are recruited and
consented to a
clinical trial are inherently going to be
different
from the actual population of users, and
if we want
to understand how an agent performs among
real
users in the way they actually use the
drug, then,
I think there is no escape but to look to
observational studies.
Additionally, observational data
is by
definition there, so when a pressing
question
arises, sometimes observational data is
the first
way we can get insight into the
relationship
between the drugs we care about and the
exposures.
I think in that regard, these
studies can
often be thought of as helping us
identify the
areas in which it would be most fruitful
to invest
in full-blown randomized trials. We will never
live in a world where we are able to do
all the
randomized trials we care about.
I know that Charlie Hennekens'
landmark
11
randomized trial of aspirin was preceded
by, as I
recollect Charlie, a large number of
observational
trials, it made you think that it was
reasonable to
do those randomized trials, so
observational
studies can be useful in that regard.
Finally, when we are talking
about trying
to understand effects that are relatively
unusual,
we stress even the largest clinical
trials. We
talked yesterday about the fact that the
most
recent drug approvals have used much
larger
populations in the NDA phase than had
been studied
in the old days, and yet they are still
small
compared to the numbers needed to parse
out
relatively small differences.
There are a lot of different
kinds of
observational trials. I have listed a few of the
most common. The ones between the lines here are
the ones that are really the subject for
discussion
here.
Tom Fleming made the absolutely
correct
and somewhat counterintuitive point that
it is
often more difficult to do good
observational
12
studies of relatively common outcomes
than rare
ones, and because of that, the group of
studies
that I think at least are reasonable to
consider
for looking at relatively common outcomes
are
case-control studies, nested case-control
studies
and cohort studies.
We have examples of each in the
materials
that have been handed to us. The study by Kimmel
is a pretty traditional case-control
study. The
studies by Ray are cohort studies, as is
the Aramis
study.
The study by Dave Graham, the Solomon study
are nested case-control studies.
Just as a quick reminder, the
distinguishing feature of cohort studies
is the
fact that the study population is defined
on the
basis of whether people are exposed to
the drug or
not, and then we look forward to what
happens to
them.
In that way, they are exactly comparable to
clinical trials, with the big difference
that the
assignment to drug is not randomized.
The strengths of those compared
to
case-control studies are you have a
reasonable shot
13
at the outset of selecting individuals
who are
representative of the group that you are
trying to
study, and if you organize the study
properly, you
have a reasonably good chance of getting
unbiased
exposure assessments.
The weaknesses, particularly of
observational cohort studies is that just
because
individuals had the right drug exposure
at the
outset, they may change that. You can deal with
that with an intention-to-treat design,
but you pay
for a price for that, and in
observational studies,
loss to followup is a big problems.
We are particularly plagued by
that
because the large majority of the
observational
studies we are working in are ones that
use
administrative data from one sort of
health plan or
another, and individuals move in and out
of health
plans, so that it becomes difficult to
follow them
over time.
Case-control studies, remember
are ones
that start with individuals who have the
outcome we
care about, myocardial infarction or
myocardial
14
infarction and sudden death, and compares
them to
individuals who haven't had that
experience, then,
you look back and ask what their drug
exposures
are, the reasons for doing those studies
are that
they are, first of all, very efficient
studies.
You don't have to study
thousands and
thousands. You can study as many cases as
you find
and a reasonable number of controls, and
you can
look back and classify exposure however
is most
useful, and that is a very convenient and
versatile
feature of case-control studies.
The big weaknesses are that it
is very
hard to assure oneself that the cases and
the
controls are really representative of the
populations that you care about, and for
conventional case-control studies, for
instance,
the study by Kimmel that we are going to
look at,
it takes a lot of work to be sure that
people who
know what they have already experienced
an MI don't
differentially report their exposure to
the drugs
that we care about.
That can be for all sorts of
reasons and
15
it might not even be wrong, but the
individual who
has had an MI and might be just thinking
harder
about whether he or she had been exposed
to a drug
that we care about.
By the way, nested case-control
studies,
for instance, the study that David Graham
did is a
hybrid that really, in my view, draws
many of the
strengths from both designs, that is,
because
nested means the case-control study is
nested in a
defined population, so it has a lot of
the
strengths of cohort studies and some of
the
efficiencies of the case-control studies.
The differences between the
observational
studies and randomized studies are pretty
clear.
Randomized trials have the tremendous
advantage
that there is lots more reason to expect
the
treated and untreated groups to be
comparable to
one another.
There is a lot more opportunity
to be sure
that the outcome assessment and adherence
to
treatment are good or at least well
known, and we
have reviewed the difference for the
observational
16
studies.
I think it is worth making the
point that
there are a substantial number of
similarities
between observational and randomized
studies. Just
because we randomize individuals in
randomized
studies, it doesn't mean that the treated
and
untreated groups are comparable.
We talked about a study
yesterday that was
a randomized trial where there was a
substantial
imbalance in important risk factors. So, it is
incumbent no matter what kind of study
you do, I
think to look for comparability, and both
studies
have as potential weaknesses that there
are risks
of false positive results and doing
subgroup
analyses and multiple comparisons
increases that
risk.
We talked a fair amount about
that
yesterday, and both are at risk for false
negative
results.
That can be partly because the studies
may not be powered well enough either
because there
is insufficient sample size or
individuals aren't
studied for a long enough duration to see
the
17
biological effects that we care about, or
a
vulnerable group just isn't included.
That is a problem with both
kinds of
studies and I think all studies have to
be
evaluated on their own merits, so let's
just step
through the various places where
observational
studies might be into trouble or at least
the
things that need careful assessment when
we look at
these studies.
The first is are we studying
the right
outcomes. It is essentially impossible in
any of
these observational studies to use the
kind of
rigorous adjudication that is a hallmark
of the
randomized study, so I think we are going
to have
to ask ourselves are these outcomes good
enough.
The several kinds of outcomes
in the
studies that we have been asked to look
at are
hospitalized MIs. The case-control study by Kimmel
uses survivors. It had to use survivors because
they were collecting the exposure
information by
interview after the individuals had left
the
hospital, so if we care about all MIs,
then, that
18
study isn't going to tell us what we want
to know.
Some of the studies use MI and
out-of-hospital sudden death by linking
to vital
statistics records. I think that is probably the
closest we can get in observational
studies to the
intention-to-treat all outcome designs of
the
randomized trials, and some of the
studies use
composite designs.
You have to ask are these
outcomes
measured appropriately. Most of the studies that
we are looking at use some form of
automated
medical record or claims data that have
been, in my
view, reasonably well validated. That is, there is
a moderate literature showing that claims
data are
not so bad for studying acute myocardial
infarction. They have sensitivities in
the 90s and
positive predictive values in the 90s.
So, they are not perfect and I think we
will have to ask as we review the studied
can the
amount of uncertainty that we know exists
in those
account for the effects that we see, or
could they
obliterate effects that we would like to
see and
19
which aren't there.
My sense is that that is
probably not a
sufficient explanation to dismiss the
studies that
we are looking at. The issue of bias is
one that I
think always has to live as a sub-text,
but quite
frankly, in the studies that do outcomes
in the way
we have been describing, I don't think
that is a
serious problem.
For cohort studies, we have to
ask are we
studying the right population, and here I
think we
really do have to stop and ask
carefully. One is
are these people selected from the
population under
study.
I think in most of these examples, they are
reasonably representative, that is, a
study of the
people of
plan.
I think that the data systems
that are
used to identify the individuals in the
cohort are
good enough to give us reasonable belief that
we
are identifying either all the people or
a
representative sample of them.
I think there is a fair
question of
20
whether they are representative of the
larger
population. We could ask are health plan members
systematically different from the general
population of individuals who are taking
these
medications.
The range of studies we have
include
health plan members. I think that there is
reasonable information that they probably
are
representative, at least with respect to
the drug
myocardial infarction outcomes that are
studied.
Studies in Medicare and population-based
studies,
such as those in
reason to think that they are
representative.
But there is an important
consideration
about whether there are issues about the
way
clinicians practice in those setting that
might
have a serious impact on selecting
individuals. In
particular, to the extent that
formularies are
restrictive of, say, newer or more
expensive drugs
like the COX-2 inhibitors, but I think we
have to
ask very carefully whether the factors
that would
influence the prescribing of one class of
drugs
21
over another is likely to seriously
impact the risk
of these outcomes.
Additionally, if there are cost
differentials for these drugs, it may be
that there
is some form of self-selection that
causes
individuals who are sicker to receive these
drugs,
and I think that it is incumbent on us to
expect
that to be a problem in every one of
these
observational studies and to ask how well
do these
studies do in adjusting for that. I will circle
back to that in a moment.
I think we have to be concerned
about
whether we are studying people who have
had prior
NSAID exposure, in which case we would be
worried
about survivor biases, of finding the
individuals
who are relatively immune to these
problems.
Finally, there are study design
issues
about whether there are restrictions of
eligibility
that might importantly color the
data. For
instance, at least one of the studies we
are
looking at requires individuals to have
received at
least two dispensings of a nonsteroidal
agent in
22
order to be eligible.
That means that you have to
live long
enough to have two dispensings, so it
certainly
doesn't tell us anything about the early
effects of
these drugs, and it might in an important
way color
the results with regard to later
exposure.
There is an important question
which is
not unique to the observational studies,
which is
who are the right comparators. We had a number of
discussions about that yesterday. I think that all
the issues that we discuss with regard to
the
clinical trials are applicable here. In
particular, there is a lot of reason to
want to
compare to other nonsteroidal users
because that
gives the best chance of having a group
that is
similar with regard to underlying disease
status
and presumably risk of myocardial
infarction.
Similarly, it is possible to
say that if
you really care about COX-2 selective
agents, you
should compared one COX-2 selective agent
to
another.
That leaves us in the uncomfortable
23
situation of not knowing what is the risk
compared
to no use at all, so we have some
comparisons that
do look at non-users or at least remote
users, and
that has its strengths. It has the big weakness,
of course, of putting us at risk of
making
comparisons against groups that are
unrelated.
So, we are really talking here
of mostly
about a study like the Kimmel study, not the nested
case-control study. The other kinds of concerns
that raise red flags are the real concern
about
losing cases who make the group who are
studied
unrepresentative.
I would point out to you, for
instance,
that in the Kimmel study, only half of
the MI
survivors who were identified were
actually
interviewed and therefore part of the
formal
analysis.
We already talked about the
fact that
since that study was limited to MI
survivors, that
restricts us to a less serious set of
outcomes.
The other problem that really
bedevils
conventional case-control studies is
knowing
24
whether the group of people who are
selected as
comparators are really comparable.
I think that is one of the
reasons that
there is so much interest in doing nested
case
control studies, because at the end of
the day it
is really extremely difficult to satisfy
oneself
that controls really are appropriate.
Much of what we need to be
concerned about
in these studies is understanding
exposures. Part
of the issue is understanding how to
characterize
exposure.
This is both a strength and a weakness
of these studied.
You will remember I made the
point at the
outset that if we want to understand how
drugs work
in actual practice, that we have to do
observational studies. On the other hand, that
means we have to find a reasonable way to
characterize these drugs.
We talked yesterday I think
about all the
important issues of understanding whether
we had to
look at absolute dose or cumulative
effects or
whether the effects start early or
whether they
25
start late.
I think that the best of the
studies that
we are looking at tackle a number of
these issues.
I will mention in a minute some of the
ways that
these studies have gone about that.
I think in terms of
ascertaining exposure,
it is probably reasonable to put the most
reliance
on the studies that use administrative
databases of
pharmacy dispensing, but I will just make
the point
that we have to be clear that these studies
are
done in situations where we have reason
to expect
that the administrative databases are
correct.
I think all the studies we are
reviewing
are ones where the investigators were
careful to
know that the individuals really had a
drug benefit
that was operating at the moment, that
would likely
find the prescription drug exposures that
we care
about, but as a general proposition, you
can't
assume that that is the case.
Most health plans have some kind of
restrictions on benefits that might lead
individuals to change their benefit
status, so
26
there would be periods of time when we
might know
that they had an MI, and we might not
know that
their drug exposure is at the moment.
I will return to a point that
we touched
on yesterday, which is that although
almost all of
the studies that we are talking about
report their
results as relative risks, a 2-fold
increase in
risk, a 70 percent decrease in risk. What we
really care about is the absolute
difference in
risk.
So, that is not different between
observational studies and randomized
studies, but I
think it is really a critical piece of
our thinking
about the problem that we are dealing
with.
The second thing that is just
worth
recalling is that when we talk about a 95
percent
confidence interval, that our expectation
about
where the true value lies is not
uniformly
distributed over that interval.
Our best guess about where the
true value
lies is around the point estimate, and if
that
point estimate is wrong, the large
majority of the
27
uncertainly is pretty close to that point
estimate,
so that it is particularly not helpful,
in my view,
to pay enormous attention to p values.
The difference between a p
value of 0.05,
as shown here, and a p value of 0.01 and
a p value
of 0.13 is not all that enormous in terms
of the
biological impact.
I think one of the things that
is a
particular concern that we need to pay
attention to
in these studies is the fact that it is
easy to
look at a lot of different comparisons,
and to the
extent that we do that, we are going to
have to
just be careful to know that the strength
of any
one comparison is weaker than it appears
to be.
For instance, this is a quote
from one of
the studies that we are looking at. We undertook
an observational study examining the
association
between rofecoxib, celecoxib, other
nonsteroidals
and myocardial infarction.
Well, there is no primary
hypothesis
there, and the results for all of the
nonsteroidals. They are all interesting to look
28
at, they are all associated with p
values. Those p
values are all relatively too extreme
given the
fact that there are so many comparisons.
It is a problem for randomized
trials. We
talked about subgroup analyses. It is important to
do those studies, those subgroup
analyses, but
absent having specified a principal
hypothesis at
the outset, I think that we have
difficulties in
knowing how much weight to put on any
particular
one.
We talked a lot about
confounding. That
is one of the most important concerns in
randomized
trials.
I know you all know what confounding is.
It wasn't obvious to me when I was making
these
slides that everyone knew that, but the
example, so
that we have it in mind is if what we
know is drug
A versus drug B, and MI or no MI, and we
don't take
into account important confounders, we
can get
importantly incorrect results.
So, here is an example of an
aggregate
analysis with a relative risk of 1.5
among 2,000
people who are exposed to two drugs. If you break
29
it apart and see that in the high-risk
group, drug
A accounted for 80 percent of the
exposure, and in
the low-risk group, drug B accounted for 80
percent
of the exposure, you see that in each of
those two
categories, the high-risk group and the
low-risk
group, that, in fact, there is no
association
between drug and outcome, but you have to
take them
apart to do that.
Well, the good news is if you
know what
the confounders are, and you have
measured them
accurately, it is possible to adjust for
them, and
all of the studies we are looking at do a
pretty
job of adjusting for the confounders that
we know
about, so I guess one of the questions is
how well
do they do at identifying the important
confounders.
I would say not bad on a lot of
that.
That is, if you take, for example, the
Graham study
or the studies that Wayne Ray did in
Medicaid, there are a number of
strengths. I will
sort of stop and back up on the things
that make
these look like relatively more credible
studies in
30
the scheme of the factors that we care
about.
They are inception cohorts of
nonsteroidal
users, that is, they are individuals who
had to
have been members of the health plan for
at least a
year before they received their
nonsteroidal.
There was a lot of information
about their
underlying medical status that was
available to the
investigators using both claims data and
medical
record data to ascertain cardiovascular
disease
along a number of dimensions, utilization
of
procedures like surgery or angioplasty or
diagnostic procedures that are intended
to find
cardiovascular disease, hospitalizations,
emergency
room visits, and a substantial amount of
information about the medications that
these
individuals took that was related to or
plausibly
related to cardiovascular risk factors.
Those large number of factors
were used to
create separate risk models using only
the
unexposed, and then to use those risk
models to
create risk indexes for the individuals
to use as
an adjuster for underlying cardiovascular
risk.
Is it perfect? No. Is
it pretty good?
It seems to me that it meets the sniff
test of
saying that it has a reasonable chance of
31
identifying important confounding.
Unfortunately, there are a
number of
important confounders for which health
care systems
typically don't have good data, like
smoking, OTC
NSAID use, obesity, family history, and
those are
typically much more problematic.
Some of these studies have
worked pretty
hard to try to either deal with it or
understand
whether it could be an important
problem. One of
the handouts we had, for instance, was
the study by
Schneeweiss and colleagues who looked
back at one
of the studies by Solomon that was
performed in the
Medicare data set, and asked how
important could
these unmeasured confounders be.
They actually had access to
information
from the Medicare Beneficiary Survey that
asked
representative Medicare beneficiaries
detailed
questions about many of the things that
we would
are about. They weren't the people who were
32
involved in that case-control study, but
if you
assume that the beneficiary survey,
members were
representative and they gave plausible
answers, it
is possible to extrapolate back to the
source
population, and the take-home message from
that
work, the answer didn't change very much,
which is
really what we want to know, not sort of
the
absolute difference, but whether those
unmeasured
confounders are important enough that
they could
cause a difference.
I think we still have to be
concerned at
the end of the day, we still have to be
concerned
about residual confounding as a
potentially
important problem.
One way I think that we can
draw relative
assurance from that work of adjusting for
confounding is to ask how much did the
estimate of
risk change between the unadjusted and
the adjusted
result.
I think there is a world of
difference
between an unadjusted result of 10 and an
adjusted
result of 1.5, and having an unadjusted
result of
33
1.6 and an adjusted result of 1.5. The former, I
think the reasonable assumption is we
arguably
haven't been able to deal with
confounding in a way
that would let us believe that 1.5 means
something.
I think there is a much
stronger case to
be made when adjusting for important
confounders
that we know about doesn't change the
risk estimate
very much, that that is a relative more
credible
answer.
Having said that, I think that
observational studies are best at finding
relative
risks that are more than 2. I think that I would
pay some attention to relative risks of
1.5. I get
very nervous about adjusted relative
risks of 1.2.
That doesn't mean that they are
not right
and I don't ignore them, but if we ask is
that for
sure the answer, my response to that is I
am just
less certain about that.
I think we are always left at
the end,
while we spend a lot of time thinking
about and
adjusting for confounding, and I think we can
do a
pretty good job of that, it is much
harder to
34
adjust for misclassification, and it is
essentially
impossible to adjust for bias.
So, I think one of the things
we have to
ask about is are there plausible sources
of
misclassification and bias, and if there
are, in
which direction do they work and would
they
seriously change our interpretation.
We talked about the fact that
absolute
differences are the important ones that
we care
about.
We have already started to look at data
that talks about person level risk and
population
level risk, so beyond saying that at the
end of the
day, I think these are the answers that
we really
need to talk about, not about relative
risk.
Personally, I think that we
need two kinds
of answers. One is what is the information that
patients and their physicians need to
have to make
decisions for them personally about
whether to
accept certain kinds of treatments in
exchange for
certain kinds of anticipated benefits.
I think there is a population
level
concern that we have to have that emerges
from the
35
same set of analyses, but takes on a
different
form.
So, you will be pleased to know
that I am
wrapping it up now, and I would say that
both the
cohort and nested case-control designs,
which are
the bulk of the observational studies
that we are
looking at, are relatively strong ones
and I think
deserve the committee's real attention.
I am sorry that not every one
of these
studies prespecified a primary hypothesis
that we
can attend to, but we should whenever
possible do
that.
Even though we don't find important effects
in some of these studies, I think it is
important
to recognize that they don't exclude one.
As I have said, I am least
certain about
attaching great weight to relatively
small excess
risks even understanding that when they
are
extrapolated to a large population, they
could
account for very important public health
problems.
Finally, I would say that the
things that
support the studies' conclusions are the
fact that
when we do subgroup analyses and look for
36
dose-response effects, that they
strengthen the
cause-effect relationship, and I think
that there
is reason to look for consistency across
studies.
I take the point that was made
yesterday
that it is possible that a dozen studies
of
naproxen could all have the same
underlying bias
that shift the point estimate in the same
direction, but it is not so clear to me
what that
bias is.
So, I think that we would have
to have a
reasonable idea of what might explain
consistent
differences across studies and ask if
they are of
sufficient magnitude to explain
that. As I say, I
am not clear that there are those kinds
of biases.
I think we have to be cautious
about the
fact that residual confounding bias and
misclassification are all issues with
these
studies.
So, I think that while they add to our
discussion, they have to be considered in
light of
the fact that they are imperfect
vehicles.
Thanks.
(Applause.)
DR. WOOD: Thanks very much.
Let's just go straight on to
the next
speaker and then we will take questions
for Dr.
37
Platt after David Graham's talk.
The next speaker is Dr. David
Graham from
the FDA.
Review of Epidemiologic
Studies on
Cardiovascular Risk with
Selected NSAIDs
David Graham, M.D.,
M.P.H.
DR. GRAHAM: Good morning. Today, I will
give a review of epidemiologic studies
and
cardiovascular risk with selected
NSAIDs. I will
be evaluating epidemiologic data from the
published
literature plus two currently unpublished
studies
that I have evaluated.
My focus will be on providing
estimates of
risk of acute myocardial infarction in
the setting
of the use of COX-2 selective NSAIDs or
naproxen,
although I will have some comments in
light of
yesterday's discussion about other NSAIDs
on those,
as well.
The methodology was to do a
PubMed search
38
by specific NSAIDs and then cross-check
the
citations in those articles to see if
there are
other articles I had missed.
I would also like to take this
moment to
thank Dr. Crawford for his leadership in
making it
possible for me to present some of our
preliminary
data from a study in California Medicaid,
which Dr.
Gurkiepal Singh from Stanford and I
recently
completed.
Before I get into the substance
of my
talk, I just want to comment a little bit
on excess
cases and projecting to the national
population
what was the impact of rofecoxib use, and
I am
doing this for two reasons - one, because
it has
been a source of controversy and
concern. We cite
a number in a paper that I and others
have
published from Kaiser Permanente in which
we made
an estimate of the impact of rofecoxib
use.
Tomorrow, FDA will present its
estimation
of the number harmed by rofecoxib,
modeling
randomized clinical trial survival
curves. A
couple of things I would like the
Committee just to
39
be aware of when they see that data
tomorrow. It
assumes a grace period at the beginning
of use that
is based on the VIGOR study and the
APPROVe, 6-week
grace period in which there is no
difference in MI
or increased risk of MI, and the first
six weeks of
high-dose use with the first 18 months of
low-dose
use of rofecoxib.
As I will show later in my
talk, I believe
that this is unreliable due to low
statistical
power early on, because we are only
talking about
in each of these studies a handful of
cases early
on in the study. Two or three cases of MI and wide
confidence intervals, you could have
divergence of
the curves very early.
The epi studies, however, that
I will
present will show that there is a 3- to
50-fold
more events to work with, more
statistical power,
and it suggests a different outcome.
The second is, is that the
patient
enrolled in randomized clinical trials
are
generally healthier than patients in the
real
world.
So, if you are going to model what is the
40
number of people who have been harmed in
the
population, you have got to assume what
is the
background rate that you are modeling off
of.
If you use a background rate
from healthy
people to model what is happening in the
population
of people who really aren't so healthy,
who have a
higher background rate, you will
underestimate the
actual population impact.
So, in any event, now on to the
substance
of my talk.
The next three slides provide a
very dense
overview of the major features of each of
the
epidemiologic studies that I
reviewed. I am
looking at COX-2 usage in acute
myocardial
infarction.
You can see that they are
grouped in
several groups. The top three studies I consider
from an epidemiologic perspective to be
stronger
studies to have been done better. In terms of the
things that Dr. Platt just talked about,
I thought
that these studies were the stronger
studies.
The next two studies from the
published
41
literature I thought were less strong,
and I will
describe why. Finally, I have separated out these
last two studies, one submitted by Merck
to the
FDA, performed by Ingenix, and the other,
the
Medi-Cal study that Dr. Gurkiepal Singh
and I have
recently completed of unpublished
studies, so they
are separated out from the group.
You can see we are talking
about different
source populations, and so if we can see
consistency of results across different
populations, different age groups, and
different
study designs, I think that that adds
support to
the notion that there is a real effect.
If we begin to see that there
is a lack of
consistency across the studies, then,
many of the
things that Dr. Platt talked about before
need to
be considered sort of the individual
level of the
studies, so what might explain why one
study shows
something and another one doesn't.
This next slide shows the case
definitions
and in a number of cases that we were
working with
to come up with the relative risk
estimates that I
42
will show you.
All of the studies began with
hospitalized
acute myocardial infarction. Several of the
studies were able to link members of
their base
cohorts to death certificate data to
identify
sudden cardiac deaths, as well. So, those are the
ones that have the +Sudden Cardiac Death.
The asterisk next to the Kimmel
study is
to remind me and to remind you that the
Kimmel
study was based on nonfatal MIs
only. By their
design, they had to interview their cases
in
person, so the patient had to survive
their
myocardial infarction to be
interviewed. So, there
are those differences in study design.
In the end, what is very
important in an
epidemiologic study in dealing with this
issue I
think in particular, is what is the
statistical
power of the study, and that is driven
primarily by
the number of events in the exposed group
that we
have to deal with.
So, in this column here, you
will see the
total number of cases of myocardial
infarction that
43
were identified in each of the studies. The
asterisk next to the Ingenix study 628 is
to remind
me that in that study, they identified
about 1,700
MIs in total, but they excluded 1,100 of
the MIs
because they occurred in people who
weren't exposed
to an NSAID at the time of the myocardial
infarction. So, as a result, they left them out,
because in the previous slide, when we
look at the
reference group, most of these studies
used either
non-use or remote use as the comparator. The
Ingenix study used active treatment with
either
diclofenac or ibuprofen.
I would like to say one thing
about
reference groups. Dr. Platt brought it up before.
In this issue, I don't believe that there
is a
single best or optimal reference
group. What you
really want to do is get as close as you
can to a
placebo group that has been randomized
and has all
the risk factors of the people who are
getting the
drug.
In the observational world we can't get
there, and so at the end of the day, if
you want to
44
do a study, you are in a sense forced to
pick among
the least evil of that you think, and
then it has
to do with how you define things.
So, non-users, for example,
could be
viewed as being close to the placebo
group, they
are not getting the drug. The problem is people
who don't use drugs tend to be healthier
than
people who do use drugs, so that raises a
host or
problems.
Yes, we can try to adjust for
confounding
and the like, but you are still left with
that
concern that they may be, in some way
that we can't
measure, different from the people who
get the
drug.
In the study I did, and in
several other
studies that people have done, we opted
to use
people who had been treated with NSAIDs in
the
past, but weren't currently taking an
NSAID at the
time of the event or the study, the
reasoning there
that whatever the selection factors are
that lead
to a patient getting an NSAID, that some
of those
selection factors are there in people who
45
previously received NSAIDs.
That is still not a perfect
group, though,
because you could argue that patients who
are no
longer taking NSAIDs might be healthier than
people
who are currently taking NSAIDs.
Finally, the problem that is
posed by
using an active comparator. If you have an active
comparator, and I am comparing another
drug to an
active comparator, and I see a
difference, I don't
know what it means. I need some place to anchor
the result, and for that reason, although
none of
them are perfect, I believe that the
non-use and
the remote use analyses at least give us
a way of
pegging results, and if we want to
compare one drug
to another drug, if we had that common
reference
point, at least it allows us to
accomplish that.
The one other thing I would
like to point
out about the number of cases is that for
rofecoxib, especially at the high doses
of
rofecoxib, most of these studies had
relatively few
exposed cases. The exception is the
Medicaid study where we had 157 exposed
cases to
46
the higher dose of rofecoxib.
Now, this is a very busy slide
and I won't
spend a lot of time going over it, but I
will be
happy to answer questions later.
Basically, before we heard
there are
unmeasured risk factors in automated
databases that
frequently can't be accounted for,
aspirin use and
smoking are among the most common. So, you can see
here that most of these studies, that
information
isn't obtainable.
Kimmel was able to get both
because they
interviewed the patients, the cases and
the
controls.
In the Medi-Cal study, it turns out that
aspirin is reimbursed, and so we have a handle on
it there.
In the Graham study, a survey
of controls
was done to see what these unmeasured
factors might
look like in the source population. The Solomon
study did the same thing, relying on the
Medicare
Beneficiary Survey that Dr. Platt talked
about
before.
Important limitations I think
that need to
47
be highlighted are that in the Mamdani
study, they
excluded patients who had less than 30
days of
NSAID use, so the survivor bias Dr. Platt
talked
about before, in my view, is big concern with this
study, and for that reason I ranked it in
sort of
that category of low quality studies.
In the Kimmel study, as Dr.
Platt also
mentioned, there was low participation
rate.
Basically, half of the cases and half of
the
controls who approached volunteered to be
in the
study.
More importantly I think in that study, and
it's unfortunate, is that there was what
I would
refer to as the potential for, in quote
"reverse
recall bias."
Normally, with recall bias, we
think oh, I
have had a heart attack, I am going to
remember
more efficiently what happened to me
immediately
before the heart attack compared to some
control
where I say to the control what were you
doing four
months ago on this particular day.
That is the classic recall
bias. This
situation I think had what I would
describe as
48
reverse recall bias. They interviewed the people
who had heart attacks within four months
of getting
out of the hospital - what happened to
you the day
and the week before you had your heart
attack four
months ago.
For the controls, they call
them on the
phone and they way what happened to you
yesterday
and the week before, so it is actually
the reverse.
The controls actually would have better
recall of
what they were actually doing than the
cases
potentially, and we will see how this is
reflected
in some of the results.
Finally, with the Medi-Cal
study, I think
the single greatest concern for the
committee in
considering these data (a) that it is
preliminary
data, and (b) that this is a new database
for
research purposes.
For that reason, I am just
including a
slide to orient people to that. The other
databases are ones that have been used
before.
This is a database that only in the last
two years
has come online to be sort of a quality
sufficient
49
to begin contemplating doing studies.
Its strengths are that it is
very large,
it
captures aspirin use, it doesn't censor people
by age.
It combines Medicare coverage when you go
over the age of 65 with the prescription
benefits
of Medicaid, so you get the drugs and the
outcomes.
Matching has been done to
multiple cause
of death tape, so that we have death data
in this
database up through 2002. We didn't include it in
the data I will show today because we
really want
the information up through 2004.
Once people get into Medicaid or Medicare,
they don't tend to drop out. The limitations are
that we can't get medical records, and
that is
something to understand, and that is a
very
complicated database. Dr. Singh from Stanford who
is the principal investigator for our
Medi-Cal
work, and who has worked to bring this
database
online, spent two years putting things
together and
working out the kinks in it before
contemplating
doing research with it, so at least you
understand
the limitations of that.
There is always the concern
about
unmeasured risk factors and Dr. Platt
talked about
that.
I want to review for you very briefly some
50
of the evidence from the published
literature where
efforts were made to look at what
unmeasured
confounding looked like and did it differ
across
NSAID type.
In our study using Kaiser
Permanente data,
we did a survey, a random survey of
random sample
of controls, and we looked at aspirin
use, smoking,
and over-the-counter NSAID use. You say see by
NSAID that there really was not significant
or
substantial differences in the
distribution of
these risk factors.
So, if they don't vary in the
control
group, they can't really confound that
observation
that you see very much.
In the Solomon study, these are
the data
from the beneficiary survey. Dr. Platt already
mentioned a further analyses of these
data that
showed that the actual impact of all
these
unmeasured confounders on the measure of
the
51
relative risk at the end was measured in
the
hundredths of an odds ratio, so if the
odds ratio
was 1.34, adjusting for these things and
projecting
it out would change it to maybe 1.35 or
1.33. We
are talking about minuscule differences,
not
qualitatively important differences.
Finally, in the Kimmel study,
they also,
through their interview, were able to see
that for
most of these factors, there was
similarity across
NSAID groups except for current smoking
where the
rofecoxib group had much lower current
smoking than
any of the other NSAID groups, but for
past
smoking, it was more than the other NSAID
groups or
the remote groups, and if you added these
two
together, the rofecoxib was very similar
to these,
but the celecoxib group had more smoking.
My own conclusion from this is
that yes,
it is possible that some of these
unmeasured risk
factors could be influencing the
results. I don't
think that there is strong evidence that
there is a
systemic bias that would sort of lead to
interfering with trusting the results and
thinking
52
that these factors are confounding the
observations
that we see.
So, first, I will talk about
rofecoxib,
then I will talk about celecoxib, then I
will talk
about valdecoxib in terms of
epidemiologic data.
These studies on the left, with
their
reference groups, are the ones that
looked at
myocardial infarction with
rofecoxib. What I have
shown is for all doses and where it was
present
less than or equal to 25 milligrams and
over 25
milligrams, what the fully adjusted odds
ratio and
95 percent confidence intervals were.
These studies varied in the
extent of
adjustment that they did. The Ray and the Graham
studies each adjusted for about 30
cardiovascular
risk factors. The Solomon study was a somewhat
smaller number, Mamdani was a somewhat
smaller
number.
Kimmel, they adjusted for somewhere in the
20s, the Ingenix study somewhere in the
20s, the
Medi-Cal study adjusted for about 40
cardiovascular
risk factors.
What you can see is when you
look across
53
the All Doses is that, in general, the
point
estimates were elevated and for many the
95 percent
confidence intervals excluded 1.
More importantly, though, is
looking at
the low dose and the high dose data
because we know
from the clinical trials data, and we
would suspect
it on just pharmacologic grounds, that if
there is
an association that it might be worse
with the
higher dose than with the lower.
So, four studies provide us estimates at
the low and the high doses, the Wayne Ray
study and
our study from California Medicaid, and
then the
two unpublished studies, one from Ingenix
and the
other from California Medicaid.
We see there that in three of the four
studies, there is an elevation in the
point
estimate.
In the Graham study, it included one.
When we look over 25 mg, we see greater
consistency
although in the Ingenix study, there is
this
paradoxical finding of sort of basically
a neutral
relative risk. I don't have an explanation for why
that happened, but it makes me concerned
to some
54
extent about what was going on in that
study,
because it is a result that goes in a
very
unexpected direction.
What I would like to point out,
because I
will come back to it again, is that when
we are
dealing with drug safety, and the goal
now is what
risk can I exclude, if my job is--now I
am not
talking about efficacy anymore, what I am
talking
about is safety--if my job is to protect
the public
from harm, what risk can I exclude based
on the
data that I have, I believe that is much
more
relevant to look at the upper bound of
the
confidence interval than the lower bound.
What traditionally happens is
we look at
the lower bound of the confidence
interval and we
say if it includes one, there isn't a
problem, but
the biggest reason, as Dr. Platt showed
in his
previous slide, for a wide distribution
and a wide
confidence interval in your study, is
that the
study doesn't have enough statistical
power to get
you a narrow enough confidence interval
to say that
you have the 95 percent certainty that
you want.
So, if your mission is above
all else I
want to do no harm, that I want to
protect patients
from harm, then, based on the data you
have, I
55
would submit that the upper bound of the
confidence
interval provides greater assurance to
patients,
and then if you are going to compare a
benefit to a
drug, that you might want to consider
that benefit
against that upper bound of the
confidence
interval, because that is compatible with
the data.
In any event, that is my view, and not
the FDA's.
This is a slide from California
Medicaid.
It is preliminary data and I wanted to
present it
to you, because what it shows is a
dose-response to
rofecoxib from 12.5 mg up to and through
50 mg.
You can see that we have very
wide
confidence intervals for some of them,
and that is
a reflection of the limited number of
cases, but I
want to point your attention to the very
narrow
confidence intervals in the 12 to 25 mg
and in the
25 to 50 mg, just to point out that in
the previous
slide here, where we are talking about
what are
these point estimates, that now you can
what we
56
have done is we have fleshed them out a
little bit
more.
Another comparison that I think
is
important to consider, certainly it was
for us,
when we did our study in Kaiser
Permanente, was at
the time there were two COX-2 selective
inhibitors
on the market, celecoxib and rofecoxib.
The bigger study raised a
question about
high-dose rofecoxib. Our question as researchers
was, and public health scientists, was,
well, let's
suppose that rofecoxib increases the risk
of
myocardial infarction.
We don't know that it does, but
let's
suppose that it does, what about
celecoxib, because
it actually had a larger share of the
market, and
if it turned out that these drugs have a
benefit,
and that benefit is worthwhile, then, it
would make
more sense from a practical perspective
to use the
drug that had a better safety profile.
So, to us, it was very natural to want to
compare rofecoxib to celecoxib, and so
several of
the epidemiologic studies felt similarly
and in
57
their design they included that analysis,
and some
of them it was, as Dr. Platt said, part
of a we are
going to make comparisons of everything
against
everything.
The Solomon study, for example,
did that.
They did not state in that study what
their prior
hypothesis was. In our study, we did
state it. I
mean yes, in a sense we had multiple
comparisons,
but we were interested in two different
things. We
were interested in rofecoxib versus
remote use, and
we were interested in rofecoxib versus
celecoxib,
but we thought it beforehand and we
planned that
analysis.
But in any event, what we say
is, when you
look at the all dose analysis, in all of
the
published studies, rofecoxib increased
the risk
compared to celecoxib. When we looked at low dose
rofecoxib, we see the increased
risk. When we look
at the high doses of rofecoxib to
celecoxib, again,
we see the same pattern.
Dr. Platt, in his talk before, talked
about relative risks, risk differences,
individual
58
risk, and population risk. The next two slides are
intended to address this at the level of
the
individual and at the level of
population.
What I have done on this
slide--and these
slides now, no one should interpret this
as meaning
this is what actually happened in the
population--the next slide is going to
have numbers
on it that are for illustrative purposes
only, to
help the committee understand what does a
relative
risk of 1.3 translate into at the
individual level
and at the level of population.
Your typical COX-2 user is
somebody in
their 60s who has several other health
problems, so
I went to the National Center for Health
Statistics
and got the myocardial infarction rate
for 65- to
74-year-old men in the United
States. That rate
turns out to be 1 per 50 per year.
What I did is I took that as
the
background rate and I said if I have an
individual
using this drug with that background rate
and then
I applied to that person the relative
risks or odds
ratios found in these studies that are
shown in the
59
previous slides, what would the excess
risk to the
person be, sort of what would that risk
difference
translate to for the individual.
For example, in the Ray study,
if you
remember, for 25 mg or less, the odds
ratio was
1.02.
Basically, it doesn't change. If
we based
it on the point estimate, that 0.02 would
translate
to 1 out of 2,500 in a year increased
risk of heart
attack.
Another way to view that number
is, is
that is the number needed to harm. If I treated
2,500 65- to 74-year-old men for a year
with
rofecoxib, and the rate was 1.02 that Ray
found,
treating 2,500 patients would produce 1
extra heart
attack.
Now, with the other studies
that found
higher estimates for the lower doses of
rofecoxib,
you can see that the number needed to
harm ranges
from about 90 to 200. That is saying for every 90
people to every 200 people I treat with
low-dose
rofecoxib, I would generate 1 other case.
For high doses, because the
relative risks
60
were higher, the number needed to harm
becomes
lower.
I have also shown it based on
the upper
bound of 95 percent confidence interval
to show you
that based on the data we have at hand,
these are
the excess risks that are consistent with
the data,
and from a public policy perspective,
from a public
health perspective, that is what I react
to, and
when I want to see a benefit and say does
benefit
exceed the risks, well, I want to know
what is a
real benefit in the population in terms
of reduced
hospitalization, lives saved, and does
that benefit
exceed what I can say is possibly the
risk of these
products.
At the population level, now we
have gone
from an individual. Remember in the Wayne Ray
study we said it is 1 out of 2,500. Well, that
would translate to 400 additional cases
of heart
attack if we treated a million men who
were 65 to
74 years old, and we treated them with
rofecoxib
low dose for a year.
With the others, you can see
that those
61
relative risks that might not look so
impressive,
that 1.23, that 1.30, that 1.4, that it
projects
out to a substantial number when you
multiply it by
the large number of people who use these
products.
For high doses it ends up being
even
greater, and then if we focus on the
upper bound of
the confidence interval, we again see
that the
numbers are larger still. This very high number in
our study was the result of our having
low
statistical power in addressing the high
dose
rofecoxib.
One other question that I think
is
important to consider is when does the
risk of
myocardial infarction with rofecoxib kick
in. Now,
we have seen data yesterday presented by
both FDA
and by Merck of various survival curves.
We saw the bigger curve that
showed the
separation after about 6 weeks with an
overall
relative risk of about 5. We saw, for the APPROVe
study, this close overlapping line at
about 18
months, and then they diverge with an
overall
composite hazard ratio of about 2.
I would submit to the committee
that the
reason for the failure of these studies
to show
divergence of the line shortly after the
drugs are
62
used are low statistical power, that they
just
don't have enough events to show it, and
as a
result, you can interpret because of the
low
statistical power you basically--how to
describe
it--you presume that there is nothing
there, and
you err on the side of the drug rather
than erring
on the side of what could the risk be to
the
population.
If you really want to know what
is going
on in the population, then, you want to
reduce the
uncertainty. The more uncertainty you have, if you
act basically on the lower bound of that
confidence
interval, which is what you are doing
when you are
saying the risk doesn't begin until 18
months, you
are basically saying that the absence of
evidence
is evidence of absence.
I would say that in safety,
what it is, is
you just don't have enough power.
Looking at the epidemiologic
studies, I
63
think that we have evidence to suggest that
the
risk begins much earlier. I will point it out, and
you guys and women can consider it for
yourselves.
In the Graham study, when we
looked at low
dose and high doses of rofecoxib, 50
percent of our
cases at the low dose and at the high
dose had used
at the time--remember these are inception
cohorts,
so these people, their total use, this
was 1.8
months, this was 2.7 months--50 percent
of our
cases occurred within 2 to 3 months of
starting the
drug.
That is a lot of power and that
really
speaks against the notion that the risk
is
backloaded, you know, it is for the low
dose, that
the risk doesn't happen until after 18
months.
Nobody in our study was on rofecoxib for
more than
about 15 months. I think that was the longest
duration of use we had in our study.
Now, in the Solomon study, they
looked at
the low dose and the high dose, and they
presented
data in several ways. One is that they grouped
things in 1 to 90 days, and what they
showed was
64
that for both the low dose and the high
dose, there
was evidence or risk early on.
The Kimmel study, for all its
deficiencies, most of it was low dose
rofecoxib,
and almost all the patients used it for
less than
12 months. So, their finding on rofecoxib, if
anything, would also speak to that the
low dose
effect kicks in long before 18 months.
Finally, the Solomon and the
Ingenix study
looked at the first 30 days of use of
these
products, and both of them found elevated
odds
ratios of 4 for cardiovascular risk in the first 30
days.
Now, in both of these studies,
they didn't
separate it out by low dose and high
dose, so this
is a composite, but in both studies,
about 85
percent of the use to 90 percent of the
use was low
dose.
So, basically, what I am
concluding from
this slide is that risk of myocardial
infarction
with rofecoxib begins when rofecoxib use
begins,
and that the inability to separate out
those curves
65
is based on the fact that if you were to
count the
actual number of events in the bigger
study in the
first 6 weeks, we are probably talking
about 3 or 4
events, and if you look at the confidence
intervals, you are going to see they are
wide.
For the APPROVe study, the same
thing
holds, that you have too few events. The whole
study had 45 events, and I don't recall
how many of
those were on rofecoxib and how much of
those were
on placebo, but when you think about it,
compare
that and then look at the epidemiologic
studies,
and look at the number of cases that were
in the
epidemiologic studies, and for all their
problems,
and we can talk about those, they suggest
there is
a big discordance, and I think the
answer, the
reason is absence of statistical power in
the
clinical trials.
In the epidemiologic literature,
this has
been recognized, and people have written
papers
saying that when you are trying to
summarize the
overall risk from a survival study, and
you want to
look at specific time periods, that you
are better
66
off taking the overall risk estimate for
the entire
study than focusing on a small segment at
a time
because of this issue of low statistical
power, so
I didn't invent this.
Now, switch over to
celecoxib. There are
a number of studies that have been done
to look at
celecoxib risk. What I have tried to do here is
plot out for you the relative risk or the
odds
ratio, the author of the study, and then
the point
estimates in the 95 percent confidence
intervals.
What you will see basically is
that for
most of these studies, there is no
evidence of a
protective or an injurious effect except
for the
Kimmel study that found a substantial
protective
effect.
Remember the Kimmel study and
what I
believe is this reverse recall bias, as
well as the
low participation rate, and I personally
discount
that study. The committee can decide for
themselves that they want to do.
What about celecoxib lower dose
versus
higher dose? Well, unfortunately, the only place
67
where this is adjusted, is looked at are
in the two
unpublished studies. We have the Ingenix
study and
we have the Medi-Cal study.
What I would focus your
attention on are
the
low dose and high dose, the low dose and the
high dose. What we see is in both studies,
evidence of a dose response. Now, the 95 percent
confidence interval in the Ingenix study
includes
1, but the point estimate is pretty
elevated. That
is 1.18 or so at 400 mg.
In the Medi-Cal study, we go
from 1.01 up
to about 1.24. Here, you can see the 95 percent
confidence intervals.
What I would conclude from
this, although
they are unpublished studies, that there
is
evidence of a dose response at the higher
doses of
celecoxib do confer an increased risk of
myocardial
infarction.
I should point out that in the
Medi-Cal
study, the methodology that we used in
that study
is the exact methodology that we used in
our Kaiser
Permanente study that Dr. Platt before
was gracious
68
enough to say is one of the better done
studies.
There are no published studies
on
valdecoxib, so what do we do? Well, preliminary
data from Medi-Cal, we had 54 exposed
cases and we
found a point estimate of 0.99. Now, this was
mostly 10 and 20 mg use. I think that out of all
the patients that we had in the study,
there were 2
or 3 who had 40 mg valdecoxib use.
In Medi-Cal, they only
reimburse for the
10-mg tablet, and they do this in an effort
to try
to discourage people having larger dose
tablets and
then taking more of it.
So, this is all the
epidemiologic
information that I am aware of, that I
have had an
opportunity to review on valdecoxib.
I will now move to
naproxen. The issue of
naproxen is important for several
reasons. One,
with the VIGOR study, the medical
community was
confronted with the hypothesis that
naproxen was
the single greatest and most effective
cardio-protectant in the history of
mankind, that
it was far better than aspirin.
We heard yesterday that aspirin
reduces
cardiovascular risk about 20 to 25
percent.
Naproxen, if we were going to believe the
VIGOR
69
results, would have to reduce the risk of
cardiovascular events by about 80 to 85
percent.
So, this stimulated a lot of
research.
Here, I have summarized in the same fashion
as I
did for the rofecoxib studies, the
various studies
that have been done. Again, I have
separated them
out by the studies that I think are
better done,
the studies that have more significant
limitations,
and then the two unpublished studies.
I point out the Rahme study to
say that
the only reason the Rahme study is listed
among
this group of suboptimal studies is that
its
reference group was other NSAIDs,
primarily
ibuprofen, because ibuprofen was the
predominant
other NSAID used in Quebec during the
study.
Again, we have the various
outcomes that
were done. What I would point is that you can see
the
number of cases that we had to work with in
these various studies, and I would point
out that
70
for the Solomon study, they had about 240
MI cases
that they studied overall, but as you
will see in a
few minutes, that exposure could occur
anytime in
the past 6 months, so they don't see in
the paper
how many people were actually on naproxen
at the
time they had their event, so I can't put
down a
list of how many people were currently
exposed.
The Watson study is the only
study that
used a composite outcome. It included myocardial
infarction, stroke, subarachnoid
hemorrhage, and
subdural hematoma. Why subarachnoid hemorrhage and
subdural hematomas are in there is beyond
me. In
any event, 26 cases of that composite
outcome and a
much smaller number of actual myocardial
infarctions. So, that is why that asterisk is
there.
With the Ingenix study, the
asterisk next
to the 179 is that this included both
prevalent and
incident cases, and the best studies, the
best
results come if you base it on incident
cases only
or incident use only as opposed to
prevalent use,
because prevalent use can have survivor
bias. But
71
in any event, in the Ingenix study, they
had a
number of different analyses, and they
didn't
always use their full number of cases.
There are important limitations
to note.
I think the one to focus is to realize
(a) there is
no perfect study, we have talked about
that before,
and, two, that among all the limitations
listed
here, I think the most important one to
note was in
the Watson study, was this composite
outcome which
really just makes it very difficult from
an
epidemiologic perspective to study things.
Myocardial infarction is very
well
validated in claims data, and Dr. Platt
has already
gone over that with you. Stroke is notoriously
difficult to work with in claims data,
and subdural
hematomas most commonly occur because as
people get
older, their brains shrink. They bump their heads
and then they get a little bleeding on
the surface
of the brain. What that has to do with myocardial
infarction risk, which is what we are really
concerned about today, is beyond me.
I have got two slides on the
results.
72
This slide shows the studies that found
no
protective effect. There is four studies that
found a protective effect, and I am
saving them for
a separate slide, because I want to look
at those
individually.
What you can see from the
majority of
these studies, and I would point out that
the
studies that were the best done studies
in the top
tier, they are on this slide, that all of
them sort
of suggest that there is no
cardio-protective
effect of naproxen. Several of the studies point
to the possibility of a small increased
risk with
naproxen.
But we have four studies of
positive
results, and we will probably all
remember the
Archives of Internal Medicine publishing
three of
the articles in the same issue with an
accompanying
editorial that stated the issue is
solved, naproxen
is cardio-protective.
I want to look at those studies
and just
describe to you my view of them. The top three
studies were the ones that were--well,
no, not the
73
Kimmel study--Rahme, Solomon, and Watson
were the
Archive studies.
In the Rahme study done in
Quebec, they
compared current naproxen use versus
other NSAIDs.
That other NSAID was, by and large,
ibuprofen, and
they found a protective effect. Well, if
ibuprofen
increases the risk of myocardial
infarction, let's
just say that it does, and naproxen doesn't,
naproxen could look like it's protective
compared
to ibuprofen, but not be protective
really.
The data presented in that
paper, if we
re-analyzed it versus non-use, we get an
odds ratio
of 1.28, statistically significant. Now, this is
not adjusted. It is not possible from the data
there for me to adjust this result, but
based on
what is in the paper, when you compared
the
unadjusted to the adjusted point
estimates, they
don't change very much, and what that
suggests to
me is that this effect, this 0.128 is
probably not
far off the mark.
That would then make it
comparable to the
analyses I showed on the previous slide,
that all
74
of these slides use non-use or remote
use, so then
it would add a fourth study to an
elevated point
estimate for naproxen.
Now, the Kimmel study, we have
already
talked about low participation rate and
this
reverse recall bias, and a small number
of NSAID
cases.
In fact, they don't even tell us in the
paper how many cases they had.
We move on to the Solomon study. This was
the result that was reported in the paper
and was
picked up by the press, a 16 percent
reduction in
heart attack risk with naproxen. The problem, in
my view, was that their definition of
exposure in
the study was any use of naproxen in the
past 6
months, which means that if I took
naproxen 6
months ago and stopped it, I could be
included in
this study as being exposed to naproxen.
So, the question is then, you
know, how do
we interpret the study. Well, Solomon was good
enough to present data by current use and
in recent
use, and recent use included people who
stopped
their naproxen. Their naproxen prescriptions day
75
supply ran out between 1 day and 60 days
before the
MI or the index date for their controls,
and remote
users, their NSAID use, their naproxen
use ended
from 61 days to 180 days prior to the
event.
So, let's look at what those
results are
then, and what we see is they are
identical. So,
unless the committee is prepared to
believe that
naproxen confers lifetime immunity to
cardiovascular disease, I think we have
to conclude
from these data that what we really have
here is
selection bias, and it is not the fault
of the
investigator. Dr. Platt talked about
before that
there are some things you can't adjust
for. You
can't adjust for bias. What you
can try to do is
identify bias, and if you identify it,
then at
least you know what you are dealing with.
Here, I think we have what is
classic
selection bias. It is not naproxen that protects
you again myocardial infarction, it is
some other
factor that in this health plan, that
they used to
study this drug, the patients who were
being
treated with naproxen happened to have
lower
76
cardiovascular risk.
I can't explain why that
happened. Dr.
Solomon probably can't explain why it
happened, but
it's not due to naproxen.
Finally, the Watson study. This study was
sponsored by Merck, and it was authored
by Merck
investigators. The result that was published as
being the basis for the conclusion was
this top
result, a 39 percent reduction in
cardiovascular
risk.
First, I just want to remind
everybody,
composite outcome here, subarachnoid
hemorrhage,
subdural hematoma, stroke, as well as
heart attack,
26 events total, much smaller number of
heart
attacks.
For this event, you can see the
checkmarks. These are the various variables that
they adjusted for in the study. The way they
handed cardiovascular risk, if you read
the paper,
I would have to say that it doesn't measure
up to
the standards that were set by Dr. Wayne
Ray.
We modeled our study in Kaiser
and in
77
Medi-Cal, and Dr. Wayne Ray, I think that
he has
set the standard for how one needs to go
about
adjusting for cardiovascular risk. It is
not enough
to rely on diagnoses. You have to use the
medications, because medications are much
more
accurate predictors of disease than
diagnoses in
these administrative claims data.
In any event, they didn't
adjust for
cardiovascular risk, and they didn't
adjust for
smoking although they had that data. Then, they
present later on another analysis that
now includes
cardiovascular risk and it is no longer,
in quotes,
"statistically significant,"
and then they include
smoking, and again it is not
statistically
significant.
My conclusion on the Watson
study was that
(a) they have got a composite outcome
that, in my
view, isn't very informative towards the
question
of myocardial infarction; (2) that it is
very small
numbers; (3) that a variety of approaches
were used
in the analysis that inadequately account
for the
risk factors that could confound the
result, so I
78
have discounted that, as well.
So, a conclusion when I look at
these, in
quotes, "4 positive studies," I
conclude that none
of them provide credible evidence of a
protective
effect.
In light of yesterday's
discussion in the
afternoon about other NSAIDs and what
might explain
the differences, let's say, celecoxib and
rofecoxib
studies, the rofecoxib studies used
naproxen as a
background, a comparator, the celecoxib
studies
using ibuprofen or diclofenac.
Dr. FitzGerald is talking and saying,
well, you know, all of these drugs could
increase
the risk because what is happening, you
know,
biochemically, with the balance of
prostacyclin,
could be influenced by these different
drugs in
ways that aren't immediately obvious or
detectable
in a clinical trial.
I thought I would just share
some of that
information on other NSAIDs with the
committee,
recognizing a couple things that no
single study is
definitive and what you want to look for
I think is
79
consistency across studies, but as far as
randomized trials go, I would like just
to mention
that there are generally too small, too
few events,
and you are not going to get the answers
that you
need from them unless you make these
clinical
trials substantially larger than anything
people
have contemplated up to now.
So, from our California
Medicaid study, it
is all preliminary and it has not been
published,
for ibuprofen we found a small but
statistically
significant increased risk. For
indomethacin we
found a risk of 1.7. I would like to say on
indomethacin that we found an increased
risk with
indomethacin in our Kaiser Permanente
study. It
was 1.3 and it was highly statistically
significant.
In at least two other studies
that I
reviewed in preparation for this advisory
meeting,
indomethacin is noted to have an
increased risk of
myocardial infarction.
It is not commented on in the
text because
that wasn't a primary analysis, but what
I am
80
talking to you about now is consistency,
and I
would submit to the committee that
indomethacin is
a lot of smoke, there is a lot of smoke
for
indomethacin.
In our study, in our Kaiser
study, for
example, we did not think in advance to
look at
indomethacin separately. I mean we knew
we were
going to look at it, but it wasn't a
primary
hypothesis. We didn't adjust for gout. I mean
everyone knows that indomethacin gets
used in gout.
Gout increases the risk of cardiovascular
disease.
Well, in the Medi-Cal study, we
adjusted
for gout. Yes, gout increases the risk of
myocardial infarction. It didn't change the odds
ratio here.
I think this next finding,
Meloxicam, is
important. Meloxicam is now the number one selling
branded NSAID in the country. With the removal
from the market of rofecoxib, the medical
community, shying away from the coxibs,
are moving
to other drugs that they perceive would
have the
advantages of COX-2 selectivity without
the bad rep
81
that coxibs appear to be acquiring.
So, you now have a shift in the
marketplace to Meloxicam. There have been articles
in the Wall Street Journal and the New
York Times
on this.
The company recently raised the price on
the tablets.
In any event, we are presenting
these data
just to say that we found an increased
risk. It is
one study, but I think it is the only
study. We
looked at this in Kaiser. Meloxicam is almost not
used in Kaiser, so we couldn't study it.
In our California Medicaid
study, we only
looked at drugs that had more than 50
currently
exposed cases. Nabumetone came out in this study
as not showing a whiff of a problem. Sulindac,
there was an increased risk.
Regarding ibuprofen, in our
Kaiser study,
we found an increased of 1.06, which
sounds really
trivial.
It wasn't statistically significant, but
the confidence intervals were pretty narrow. It
was 0.96 to 1.17.
My concern is, as Dr. Platt
talked about,
82
you know, above 2 you feel really
comfortable,
above 1.5, you can believe it, below that
you begin
to get really edgy. The problem is most of the
risks that we are probably facing, if it
turns out
that the non-coxib NSAIDs increase the
risk of
cardiovascular disease, that is where the
risk
level is going to be, and that is what we
are going
to have to contend with, because it has
tremendous
effects on the population.
Finally, dose response. This slide shows
for diclofenac. This is from California Medicaid.
What we wanted to do was show evidence of
dose
response, consistency in the data. Remember we
pointed out diclofenac before. Diclofenac in this
study overall did not have an increased
risk, but
at the high doses there is a suggestion
of a dose
response.
I will skip that. This slide was to say
that depending on your reference point,
you can get
different results, if I use an active
comparator
versus remote, and this is showing the
three NSAIDs
from California Medicaid compared to
non-coxib
83
NSAIDs, and you can see the rofecoxib is
different
than them, and the other two aren't
necessarily
that different.
My conclusions, and I am sorry
to have
gone so long. Celecoxib, we believe that based on
the evidence we have at hand, that there
is no
apparent effect of risk at doses of 200
mg or less.
Above 200 mg, we think that there is
evidence of
increased risk.
For rofecoxib, we believe that
there is
evidence of increased risk at both the
lower doses
and the higher doses, and that risk begin
early in
therapy and is apparent during the first
30 days of
use.
With valdecoxib, there is a
paucity of
information, but the information we have
at this
time suggests that the risk is not
increased at
doses of 20 mg or less.
As a class, non-coxib NSAIDs
may increase
the risk with differences between each of
the
NSAIDs.
I don't think we are going to be able to
talk so much about class effects. In the
end, it is
84
going to have to be looking at individual
drugs.
The COX-2 hypothesis may be
true, but if
it is, we are still going to have to look
at these
other drugs in terms of their individual
properties
and what they do.
Finally, naproxen is not
cardio-protective.
Thank you.
(Applause.)
DR. WOOD: Thanks very much. David, it
will come as no surprise to you that
every time
practically I pick up a newspaper, I read
about
what you are not going to tell us.
So, my question to you is what
have you
not told us that you think we should
know, because
I would like to make sure. Lots of other people
have shown up here without slides that
they forgot,
so I just want to be sure that if there
is anything
else we need to hear, we hear it.
DR. GRAHAM: Well, as far as the science
goes, I think I presented the evidence that I am
happy to be able to share with the
committee that I
85
thought it was important for the
committee to have
an opportunity to hear.
The source of controversy
surrounding my
presentation related to the unpublished
studies
that I was going to be permitted to
present or
asked, actually asked to present the
Ingenix
results, the unpublished study from
Merck, but that
I was being told not to present the
unpublished
data from the California Medicaid study,
and
personally, I had great difficult
standing here
before this committee as an investigator
and as a
scientist, as a physician, and telling
you the
information that I have, that I am
allowed to talk
about, and remaining silent on things
that I know
about that I am not allowed to talk to
you about.
Fortunately, Dr. Crawford
exercised great
leadership in making it possible for me
to present
that data, recognizing it's preliminary,
but the
methods that we used are identical to our
Kaiser
study for the California Medicaid, and
for me, I
think the big reservation is, is that
it's an
untested database, but I think that
everything that
86
could be done to develop the database and
to do
quality assurance and to work out the
kinks has
been done.
If you look at the findings in
the
California Medicaid study and you compare
them to
the clinical trials data, and the
anomalies and the
questions that you were discussing
yesterday about
the clinical trials' data, you look back
at the
California Medicaid data, and you are
going to see
I think great consistency between the
findings that
might help explain and interpret some of
the things
that seemed questionable or uncertain
yesterday.
So, in any event, I have been
able to
present what I thought was important to
present,
and I am happy to have had that
opportunity.
DR. WOOD: So, the answer is we have seen
it all, is that right?
DR. GRAHAM: You have seen it all.
DR. WOOD: Okay, good.
Let me ask you a
question. If you go back to your slide
that showed
the excess population risk, put that in
proportion
for us in terms of, say, the other drugs
that have
87
been withdrawn from the market. I mean what sort
of numbers would we be expected to see?
DR. GRAHAM: That is a great question.
The typical drug that has come off the
market in
the United States, like the leading cause
of drug
withdrawals in the United States in the
last 20
years has probably been acute liver
failure.
Rezulin came off the market because of
it,
troglitazone, bromfenac, a number of
other drugs.
Acute liver failure in the
general
population has a background rate of about
1 per
million per year. We are talking about that is the
rate of being struck by lightning, 1 per
million
per year, and these drugs were pulled off
the
market because it increased the risk of
that. It
might increase the risk 5-fold, it might
increase
the risk 10-fold, it might increase the
risk
100-fold.
The fact is the background rate was 1 in
a million and what that means is that the
actual
number of people affected is sort of
measured in
the tens and the hundreds for the liver
failure
that could be life-threatening.
In this situation, and this is
why the
lower relative risk becomes so critical,
we are
talking about a serious event that has a
very high
88
background rate. Heart attack is not a rare event,
and as I pointed out before, there is a 1
in 50
chance that the average American male age
65 to 74
is going to have a heart attack this
year, 1 in 50.
That is an extraordinarily high
risk. You
increase that risk 5-fold with a high
dose. That
is what happened with VIGOR. If I have got
millions of people taking the high doses,
and that
is what had in the United States, and I
have
increased the risk 5-fold, you are going
to get
numbers that balloon out like this.
So, there is no comparison in
terms of
what the population impact is of the
typical drug
that has come off the market in the United
States
and what we are dealing with here, and
that is
because of the high background rate of
the
underlying event that we are talking
about.
DR. WOOD: So, this would produce many
more cases from what I understand.
DR. GRAHAM: Many more.
Committee Questions to
Speakers
DR. WOOD: From the committee, we have
questions. Let's start with Dr. Shafer.
DR. SHAFER: Dr. Graham, tomorrow we are
going to be asked, as a committee, to
consider the
89
question about a class effect for the
selective
COX-2 antagonists and for the
non-selective NSAIDs.
One of the things that I am
finding, that
I am having trouble putting together
here, is we
have a lot of conflicting data, and for
the COX-2
antagonists we have a lot of data from randomized
controlled trials.
Certainly for the NSAIDs, we
are going to
have to go with a lot of these
observational
studies because we don't have a lot of
data on the
topic at hand from randomized controlled
trials.
As I look at this, if we come
up with some
sort of common warning as a class, and it
applies
to everything, we have, in fact,
communicated no
relevant information. On the other hand, if we are
going to come up with individual
drug-specific
90
recommendations, we are going to have to
have very
different evidentiary standards in some
ways,
because for some of these, we have very
little
information, as you pointed out, and yet
your data,
particularly the unpublished data from
the Medi-Cal
trial, and I appreciate that there is all
the
issues of not being previewed and stuff,
but we are
all familiar with that process and know
how it
works.
What can you tell us to guide
us? Should
we try to go drug by drug specific? How do we set
our evidentiary standards when we talk
about class
effects where in some cases, we are just
not going
to have a lot of data here?
DR. GRAHAM: Right.
What you are going to
be getting now, of course, is my opinion,
not FDA's
opinion. Probably if you were to talk to
Bob Temple
or John Jenkins, or anybody else,
everybody is
going to have a slightly different
answer.
What we talking about now I
think to some
extent is philosophy, so what that
preamble, first,
I believe based on the evidence that there
is a
91
COX-2 effect and that that COX-2 effect
is dose
dependent, and that we see evidence of
that with
rofecoxib, with celecoxib, and with
valdecoxib.
The difference between rofecoxib and
the
other two coxibs on the market is that a
safe dose
for rofecoxib wasn't identified, the dose
wasn't
low enough. That raises a question in my mind
about what is an appropriate therapeutic
index for
a drug.
I am giving you my opinion now,
but when I
listened to Dr. Cryer's presentation
yesterday, the
bottom line conclusion I came to at the
end of that
was there really doesn't appear to be a
need for
COX-2 selective NSAIDs based on what I
heard
yesterday. There is probably other information out
there why I am wrong, but that was the
conclusion I
came from.
So, in any event, that is
answer one. I
believe there is an effect and it's dose
related,
and with celecoxib and valdecoxib, I
think we have
evidence.
You said before we have a good
evidentiary base based on clinical trials
for the
92
COX-2s.
I would challenge that in the sense of the
survival curves and the things that I
talked about
there, that we have a very weak
evidentiary base
for things like protective, you know, is
there a
grace period for use, and also on the
dose issue,
we really don't have a great evidentiary
base. But
that being said, you understand me.
Now, for the non-coxib NSAIDs,
my own view
is that as an epidemiologist first, I try
to report
the phenomenon I observe and leave it to
brighter
minds to figure out why what I observed
happens.
You are asking me sort of what
do I think
is happening underneath it all. I am attracted to
the COX-2 hypothesis personally. Dr. Gurkiepal
Singh, my colleague and co-author in
Medi-Cal, he
has a different view on that, but I think
that we
can these in vitro tests that say, oh,
this is the
COX-2 selectivity of this NSAID, you
know, in a
test tube.
What happens in the human body
could end
up being surprisingly different. We saw yesterday
that the dynamic response of these
differences,
93
that the platelet effect is very quick,
the
thromboxane effect is a very quick
effect, the
prostacyclin effect seems to be a more
gradual
effect, that this creates very complex
interactions
that ibuprofen, that any of these drugs
could, in
the end, end up with a deficit, a
prostacyclin
deficit that results.
I think Dr. FitzGerald showed
that slide
yesterday with the normal distribution of
the time
area under the curve and then this little
sliver
where they are not protected, and that
may be the
reason why, for these different drugs,
that we end
up with these different relative risks
and these
different odds ratios.
In the end, for the non-selective NSAIDs,
my own advice would be let's look to see
are there
somewhere in studies--it is going to be
observational studies--in observational
studies
that we believe have been reasonably well
done.
By "well done," here,
they have to be
large.
The literature is full of really small
studies.
I mean I could have presented Meloxicam
94
studies, 5 patients, no risk. Well, da, you know,
you have got a confidence interval that
goes from
zero to infinity. They need to be large. Look in
a systematic way to identify what the
body of
evidence is.
Can we identify bad actors? I believe
indomethacin, for example, is clearly a
bad actor,
and if people looking at the data
concluded that,
take appropriate action, weed the garden
of the bad
actors.
Try to identify drugs that
based on the
evidence we have, appear to be less risk
in the
totality of their evidence, looking for
consistency
study to study to study, and then, in a
rational
way, suggest these are the drugs we think
that the
public should use, and these other drugs,
well,
then you have to decide do you want them
on the
market or not.
I am not really going to
comment on that,
but I think that is the approach I would
take. I
would be trying to sort of identify right
off the
bat the bad actors and let's get rid of
them.
Things that look like they may
actually be
safe, and when I say "safe"
now, I mean that they
don't appear to have cardiovascular risk,
identify
95
them and shift the market towards that,
and then
deal with the others.
DR. WOOD: Dr. Friedman.
DR. FRIEDMAN: Thank you.
Several
comments.
First, as both Dr. Graham and Dr. Platt
have mentioned, observational studies are
essential, but they have a number of
limitations,
and because of those limitations, it is
easy after
the fact to critique away those whose
results you
don't much care for as we have seen.
But a couple of other
points. One, can
these particular drugs, their primary
use, we are
dealing with chronic conditions,
conditions that
last years, sometimes many years, and so
the drugs
are intended for use over those many
years
potentially.
Yet, most of the clinical
trials we heard
reported yesterday are 12, 18 weeks, a
few of them
go longer. You mentioned that one of the reasons
96
we didn't see the problems early on may
be numbers,
and I agree that is potentially it, but
the fact is
we didn't see problems arise in the
studies until
14, 18 months.
We often see analyses by
patient years of
exposure.
In this particular setting, I don't know
whether patient years are always equal to
patient
years, and therefore, I guess I would say
why
aren't we doing more bigger, longer
randomized
clinical trials for these chronic
conditions?
DR. GRAHAM: I am not speaking for the
agency now.
DR. WOOD: We got that.
Don't say it each
time.
DR. GRAHAM: Okay.
I think they are
incredibly expensive and companies don't
want to do
them.
There is not an incentive for them to do
them, and you would have to talk to the
people from
the
new drug side of the house, but the fact is
that they are not requiring them.
So, that is a very legitimate
question.
You know, working as an epidemiologist,
we try to
97
make do with what is, and so we use the
observational data. You are going to get better
quality data if you are able to do this,
but just
to give you a sense of the size of the
studies that
I think you would need to do, I mean you
talked
about before that you have the APPROVe
study and we
see no effect until 18 months, but there
was study
090 that was talked about briefly by Dr.
Villalba
yesterday. It was a 6-week study at 12.5 mg, and
it showed a difference, the suggestion of
a
cardiovascular risk within the 6-week
study at the
lowest dose. Now, it's a small study, as well.
But I am just saying that to
say that I
think the epidemiologic data, in my mind
at least,
answers the question about when the
effect begins.
The question is if you want to have--this
is the
philosophy--how much certainty do you
need to make
a decision.
Right now, when it comes to
efficacy, the
effect, does the drug work, you are
looking at the
lower bound of the confidence interval,
and you
want to see is that different than 1,
because if it
98
is, then, I will conclude with 95 percent
certainty
or greater that the drug actually has an
effect.
When it comes to safety, you
are doing the
same thing. You are looking at that lower bound.
You
want this 95 percent certainty that the drug is
harmful.
You are presuming that the drug is safe
rather than let's presume we want to do
no harm to
patients.
Let's start off at the
beginning assuming
that the drug isn't safe, and we want to
have a
certain level of confidence about how bad
this drug
could be, and that is still tolerable to
us. We
want to cap the risk. It will be a completely
different way of looking at studies for a
safety
perspective, one that actually gives a
priority to
safety and it maximally protective of
patient
safety, just as that high standard for
efficacy is
maximally protective of patient safety,
because by
keeping drugs off the market that don't
work, I am
protecting patients from unsafe drugs,
and if I
have pneumonia and I am given a drug that
doesn't
work, well, I get a harm from that.
But that's philosophy, and I
think it's an
outcropping, it's a development, a
natural
extension of the development of clinical
trials in
99
the United States where the focus has
always been
on efficacy.
DR. WOOD: Let's try and keep both the
questions and the answers reasonably
short,
otherwise, we will be here until after
midnight.
DR. GRAHAM: I apologize.
DR. WOOD: That's okay.
Let's go on to
Dr. Elashoff.
DR. ELASHOFF: First, I have one comment
and then one question. In terms of confounding,
just because you put a lot of variables
in some
model doesn't necessarily mean that you
have
adequately removed the confounding
effects even of
those variables.
The second has to do with Dr.
Graham's
slide 13, the excess population
risk. I note that
the Ingenix data has been left out of the
bottom
category.
DR. GRAHAM: That's right, because for the
100
high dose.
DR. ELASHOFF: Yes, but the negative sign
needs to be on the slide, otherwise, it's
a biased
presentation.
DR. GRAHAM: Well enough.
I take that
correction. Okay, fair enough.
DR. WOOD: Dr. Bathon.
DR. BATHON: Yes.
As we weigh the
risk-benefit ratio of these drugs, one
consideration is that there are subgroups
of
patients in which the benefit might
outweigh the
risk possibly.
With that in mind, it would be
helpful for
us who are not cardiologists or
epidemiologists to
be able to put the relative risks that we
have been
seeing over the past day or two in
context with all
the cardiovascular risk factors that
exist.
So, for example, if you were
take the
presumed relative risk of rofecoxib of
1.5 to 2.0,
at least at the higher dose, and put it
into some
context for us of the 20 to 40
cardiovascular risk
factors that exist in a sort of rank
order, where
101
would you put the COX-2 drugs?
DR. GRAHAM: For the high dose it would
be probably more significant than smoking
or
diabetes or hypertension, maybe more
important than
the combination of several of those
factors in a
patient.
For the lower dose, it is probably more
than hypertension, a little less than
diabetes, and
a little less than smoking.
I know, David, you know the
cardiovascular
risk factors much better than I do, and
so does Dr.
Hennekens, but that would be my ballpark
on that.
DR. WOOD: Dr. Abramson.
DR. ABRAMSON: Yes. I
want to go back to
the question Dr. Shafer asked about if
these
classes of drugs or this group of drugs
could be if
there was a hierarchy of risk, and you
first
answered that you thought the coxibs were
more
risky, but I would challenge you a bit
simply on
your own presentation.
I would like you to discuss your data,
because you then went on to talk about
how
indomethacin has a risk, Meloxicam has a
risk.
102
Based on your data, the message that came
through
is that there was a dose response risk
for
cardiovascular outcomes, that we saw it
within the
coxibs, but we also saw it where the data
were
available in the non-selective NSAIDs.
There are data that we have
seen that
ibuprofen might increase risk. We didn't talk
about the McDonald and Way paper that in
cardiovascular discharge patients, people
given
ibuprofen had a higher mortality 2-fold.
So, as the smoke clears, I am
not sure
that the simple answer that the coxibs
were
different was actually supported by your
data, nor
your ultimate explanation. Can you defend that?
DR. GRAHAM: I think you are accurate.
What I was saying was I was referring, I
think, to
the underlying COX-2 hypothesis and that
it is
clearer, I believe, and, well, maybe it's
an
overgeneralization, because we have the n
that we
are viewing is so small, that looking at
rofecoxib
as sort of the example where we can see
very
clearly the dose response at all the
levels and its
103
progression, and understanding its
mechanism of
action, and then seeing similar things with
celecoxib and valdecoxib.
I think what you are saying is
fair.
Maybe a better thing to say is, in the
end, that
you do need to look at it drug by drug.
What I was saying, though, in
that answer
that I gave to Dr. Shafer, I was really
talking
more about sort of the COX-2 mechanism
and the
coxibs as being, in quotes, "COX-2
selective," but
I think your observation is correct.
DR. ABRAMSON: Add to that, that although
there is a hazard that we don't
accomplish a lot by
simply saying the class of NSAIDs may
have risk, I
think we have under-appreciated that over
the last
10 years.
It is not that different from the
mid-nineties recognizing that there was a
class GI
effect of these drugs, and that compared
to
placebo, whether it's hypertension or
long-term
potential adverse outcomes, this is
something that
doctors have to be aware of, even the
simple thing
104
of checking blood pressures when you put
people on
any nonsteroidal drug.
So, I don't know that it is
necessarily a
bad outcome to call attention to this
class effect
until we get better information on each
of these
individual drugs.
DR. WOOD: Dr. Day.
DR. DAY: I have a comment about recall
bias and reverse recall bias. There is a huge
research literature on how memory works
both in the
laboratory and in the every-day world,
and there
are two phenomena that have been very
heavily
studied that I think might be relevant
here.
One is called flashbulb memory,
and the
idea is when an emotional spectacular
event
happens, such as when you first learn
that JFK had
been shot, or the Challenger blew up, or
the World
Trade Center had been hit, it is as if
the old-time
flashbulb from an old-time flash camera
went off
and captured all the details, and you
remember all
of those details forever afterwards
associated with
the event that you might otherwise have
just not
105
even noticed or forgotten.
So, there is a lot of research
on
flashbulb memory that shows many of those
details
are indeed correct, but some are
notoriously false.
For example, there are accounts of people
who
remember a certain even with great
emotional
aspects to it, and they remember
listening the
world series when so-and-so is pitching
and it was
the
bottom of the 9th, da-da-da, all these details,
and when you go back and check the
evidence of what
was going on, on that day and time, that
particular
game was not on.
So, that phenomenon number one,
flashbulb
memory, and the second is eyewitness
testimony.
How you ask a person a question will
affect what
answers you get. So, if you have in the courtroom,
someone who has witnessed a car accident,
if the
lawyer asks this witness, "Did you
see the broken
glass," then, the witness is more
likely to say yes
than if you ask, "Did you see any
broken glass,"
because the broken glass presumes that
there was
some, and so forth.
So, I take your points
seriously about
potential recall bias and reverse recall
bias, but
we would have to look at both, whether
there is an
106
emotional component or not. Those who have had an
MI, for example, would have that most
likely, but
also how the questions are asked in these
surveys,
and it is not trivial how you ask people
questions
about were you taking any medications or
were you
taking medication X, and for how long,
and what was
the dosage, and so on.
So, I don't think that these
details are
always published with the studies, and I
would like
to encourage people who ask people about
their
experiences with drugs, take a look at
the memory
literature for some of these points.
DR. WOOD: Dr. Gibofsky.
DR. GIBOFSKY: Dr. Graham, I am wondering
if you separated out your populations
based on the
indication for which they were taking the
drug. I
ask that because we heard yesterday, and
it's well
known, that rheumatoid arthritis is
itself a risk
factor for cardiovascular disease, and
higher doses
107
of coxibs, in particular celecoxib, are
usually
given to patients with rheumatoid
arthritis as
opposed to osteoarthritis.
So, I am wondering if you look
at that in
your breakdown.
DR. GRAHAM: Several of the studies that I
reviewed have looked at the indication,
but in
automated claims data, it is very
difficult to be
sort of be sure does the patient have
rheumatoid
arthritis, and there are different
algorithms one
could use, but in general, what has been
found in
the studies where they have looked at
that, that
the prevalence of rheumatoid arthritis in
the study
populations has been low, very low, and
that its
impact on the results when they adjusted
for it
didn't materially affect things.
Now, in the California Medicaid
study, one
difference in that study was that our
base
population was limited to patients who
had
diagnoses of osteoarthritis or rheumatoid
arthritis. Now, these are diagnoses, and so does
that mean that they really had
osteoarthritis or
108
rheumatoid arthritis, I don't know, but
when we did
try to eliminate in that study at least
were the
people who might be using an NSAIDs for a
muscle
injury, a short-term complaint as opposed
to a
chronic illness.
In none of those does the
presence of
rheumatoid arthritis seem to affect
things, but
again I think the prevalence is pretty
low in all
of these studies.
DR. GIBOFSKY: One quick question for Dr.
Platt, if I might. I need to understand the
concept of survivor bias somewhat in that
I think
there is a difference between a patient
who is
drug-naive, then put on a drug, and then
an event
happens versus a patient who may have
seen a drug,
perhaps seen another drug after that, 3 or
4 agents
of the class, and is then switched to
another agent
and something happens.
I think we have talked about
remote versus
current, but there is also this issue of
sequential
effect, and I am wondering how you deal
with that
as a survivor, particularly because of
the paper we
109
saw a few weeks ago in the Archives
suggesting that
discontinuation of an NSAID may itself be a
risk
factor for a thrombotic event.
DR. PLATT: Your point is exactly right.
I think that the concern about survivor
bias is
that if we think that some people are
particularly
susceptible, which is almost certainly
the case,
then, if we start the clock after a
person has
already been exposed to a drug or to one
that has
the same effect, then, it is very much
less likely
that those individuals will have a
problem.
That may be the explanation,
for instance,
for the reason that the literature was so
badly
wrong about postmenopausal estrogens and
heart
disease, that most of the epi studies
started with
prevalent users.
I think the majority of the
studies that
we were reviewing here, these were
individuals who
are known to have had at least a year of
prior
experience without exposure to the
nonsteroidals.
Your study in Kaiser I know was an
exception cohort at least with regard to
a year of
110
prior history, but I am not aware that
any studies
have a longer drug-free prior interval
than that.
DR. WOOD: Dr. O'Neil, do you want to
comment particularly on this?
DR. O'NEIL: Yes, this is an important
point and a lot of things have been
covered in
Richard's and David's presentation, but
one thing I
think that is relevant that Richard did
not cover,
that is, the value of a randomized trial,
is the
ascertainment and follow-up, and knowing
the status
of individuals in the sense of who goes off
therapy
and how long they stay on therapy.
That is very critical relative
to the time
dependency of the risk. It was mentioned, for
example, the use in the observational
sense of
recent and remote and current use. Those are all
terms that are nice, but they don't get
at the
issue that we are trying to get at with
regard to
the clinical trials, and that is
essentially when
does time zero start for you.
So, I think the appropriate
question to
ask is what is the duration of exposure
since your
111
initial exposure to the drug, because I
think that
is very relevant to the interpretation of
the three
clinical trials that we have, two of
which are in
placebo-control populations.
There is a rofecoxib-naproxen
control
trial for one years, there is a
placebo-control
trial in polyp prevention for three
years, and
there is a placebo-control trial in
Alzheimer's
disease for four years, and the time
dependency
from time zero matters as you have seen
in the
plots.
It is relevant to the excess
risk
calculation. So, I would ask the committee, as
well as I would ask David, of the
observational
studies that you have reported, how many
of them
are cohort studies, and how many of them
are able
to identify new initial use, and then
track
continued use for that individual, so
that one
could look at the relationship between
the hazard
rates and the hazard ratios that we are
identifying
in the randomized trials and match that
to the odds
ratios that are being reported in the
observational
112
studies.
DR. GRAHAM: On one of my initial slides,
you can see what the cohort studies were,
and in
some of the nested case control studies,
you are
also able to get the time on drug. Actually, in
Wayne Ray's cohort study, most of these
cohort
studies include prevalent and incident
users, so
they will do what is called a "new
user"
subanalysis, which is to try to get to
this issue
of when does time zero begin.
We addressed that problem in
our study
here by the inception cohort design in
our base
population, so that we can identify what
time zero
was for the cases.
Now, none of those studies
presented data
in the form of a survival analysis, which
I think
in the end, that is what Dr. O'Neil would
like to
see.
DR. O'NEIL: No, my question is not so
much in survival. I don't believe, and again that
is why I am asking you, I don't think any
of those
studies were designed or able to capture
the
113
question I am asking.
In fact, if I am not mistaken,
in the
Wayne Ray study, he defined new use, but
he did not
define any time from new use, which is
essentially
critical to when those risks start.
DR. GRAHAM: That study isn't cited as one
of the studies where we are able to
derive that
information. This slide was a slide that I
presented to show that from the
epidemiologic
literature, those studies where the investigators
had identified when time zero began for
rofecoxib
use, and they didn't present the data as
a survival
analysis, but they identified when time
zero began
and then, in various ways, showed you
either what
the distribution of the cases were, so
that you can
see that it was impossible for the risk
to have
been delayed for 18 months, because
nobody in the
study used the drug for 18 months, or
they parsed
time out and looked at the first 30 days
of use
from time zero, and found the risks that
they found
down here.
But you are right, those
studies aren't
114
designed that way, and we haven't had
time in our
Medicaid study to do these analyses yet,
but we
have the data to now do the cohort study
and time
to event, so we will have an opportunity
actually
within the data to actually compare and
look to see
exactly the question you are driving at.
But I would say that from the
published
data, in each of these studies, time zero
for
rofecoxib was identified and in some way
or
another, information that I think could
be useful
to
the committee in establishing when does risk
begin was contained in those studies.
DR. O'NEIL: Well, the other point here,
which is the value of clinical trials,
and it was
the question that was discussed yesterday
with
regard to the intent-to-treat analysis,
and that is
to say to analyze all outcomes once
randomized to
the trial regardless of whether you want
to track
the individual to 14 days post-exposure.
You can't really maybe get
access to this
information in the observational
studies. That is
a conjecture, but it's one or the other
biases, and
115
it was interesting to the comment,
whether one
would believe this or not, that
discontinuation,
discontinuation from an NSAID alone
raises risk.
If that were to be the case,
that is a
different analysis altogether.
DR. GRAHAM: In that actual paper, it
could be that people were discontinuing
the NSAIDs
because they were having chest pain and
it was
being interpreted as dyspepsia or
something, and
then they go to have their infarct.
I mean you are right about
that, but this
is the nature of how epidemiology is
done, and I
can't change it. I didn't make the rules, I am
only following them. Nobody is arguing that
clinical trials, if they could be large
enough,
that they would give all of us answers
that we
would have greater comfort trusting what
they are
saying.
What I am proposing is that we
don't have
that kind of data in the clinical
trials. As large
as the clinical trials are, for the
questions that
this committee is facing, you don't have
the data
116
you need, and what I presented is the
epidemiologic
data, and it is imperfect and it has its
warts, and
that is why I would emphasize looking at
consistency and trying to sort of derive
from that
a general sense.
I mean does it make
pharmacologic sense
that you would have an 18-month
delay? I mean I
guess I suppose it depends on what you
think the
mechanism of action is for the underlying
disease,
but even in the clinical trials, study
090 was 6
weeks long, 12.5 mg, and it had a
cardiovascular
effect.
DR. WOOD: I am happy to facilitate a
discussion among the FDA, but I think we
would
rather hear from the committee right
now. Dr.
Farrar, you are next.
DR. FARRAR: I think that the
recommendations of the committee tomorrow
are going
to depend on the assessment of the
overall risk and
the overall benefit of this class of
drugs.
As a researcher and after all
the data
that has been presented, I am more than
happy to
117
accept the fact that there are serious
risks even
of death from taking NSAIDs. In fact, though,
there are serious risks in taking any
medication at
all.
For some of the NSAIDs, it is
cardiovascular risks, for some of them it
is
clearly GI bleeding. As a doctor, though, who
takes care of patients, I know that
treating pain
or not treating pain and not treating the
disability of arthritis also has very
serious risks
even of death.
Given the extensive work that
you have
done, on the risk of both the
cardiovascular and
the GI bleed, I wonder what level of risk
is
acceptable you, and remembering that the
only other
drugs that are really available is
analgesics or
narcotics, and the only other drugs that
are really
available in terms of limiting
inflammation are
biologics or immunosuppressants, I wonder
what drug
is safe enough that you would recommend
that I
actually would be able to use it in
patients to
prevent some of their suffering.
DR. GRAHAM: Well, I am not going to give
a product endorsement. A couple of things, though.
DR. WOOD: Try and make it brief.
118
DR. GRAHAM: One, the benefits of the
treatment for the traditional NSAIDs
compared to
the COX-2 selective NSAIDs with GI bleed,
we have
clinical trial evidence that suggest that
there may
be a difference, but here, to me, is an
anomaly.
Rofecoxib got the indication
for being
GI-protective, celecoxib didn't based on the
clinical trials data you guys looked at
yesterday.
There are two published studies
in the
literature looking at what I would say is
actual
benefit.
There, they were looking at
hospitalization for GI bleed--they didn't
look at
death from GI bleed, but I wish they
had--but
hospitalization for GI bleed, and what
they found
was, in both of these studies, that
celecoxib was
actually more beneficial, you know, lower
rate of
hospitalization for GI than
rofecoxib. So, that is
the population, two large studies.
You have got your clinical
trials that
119
would have said it should be the
reverse. So, I
throw that out as one sort of conundrum.
The second is that I don't
think that the
actual benefits of these drugs are
understood well
enough to sort of try to weigh these very
well.
The case fatality rate for myocardial
infarction in
the United States approaches 40
percent. The case
fatality rate for hospitalized GI
bleeding is
probably somewhere around 5 or 10, it is
a much
lower case fatality rate.
Nobody that I have seen
anywhere has sort
of worked this out very well, so I would
submit to
you and to the committee that you
actually know
very little about the actual population
benefit of
any
of these products.
DR. WOOD: I don't think we are going to
get an answer to that question, so let's
move on.
Dr. Nissen.
DR. NISSEN: Let me briefly answer the
earlier question about what does the
hazard ratio
of 1.5 to 2 mean. Before I came to the
meeting, I
made a point to look this up, because I
thought it
120
would be very relevant.
It is equivalent to raising a cholesterol
from 200 to 260, or taking up
smoking. Another way
for the committee, I mean as a
cardiologist I have
to deal with this all the time, the most
effective
drugs we have for prevention of morbidity
and
mortality are statins, and they reduce
risk about
35 percent.
So, a hazard ratio of 1.5 to 2
is really a
very, very big effect when you are
talking about
the most common cause of mortality, and
that is why
this discussion is so important.
Now, my question is this. We are going to
be asked to balance risk and benefit, and
so the
magnitude of the hazard ratio is very
important to
all of us, and I am trying to reconcile
what we see
in the randomized control trials with,
let's take
rofecoxib for a moment, where it looks
like the
hazard ratio in the randomized trials is
in the
range of 2, 3, 4, maybe even higher, and
in the
observational data it is significantly lower.
I would like to propose a
hypothesis to
121
you and just ask you if you think this is
right.
In your observational data, you are
looking at
mostly short-term exposure, so you are
looking at
less than 12 months typically of
exposure.
It may well be that the hazard
increases
over time, so that by the time you get to
18
months, you can actually see it in a much
smaller
randomized trial, and so it doesn't rule
out the
possibility that, in fact, both
observations are
right, that, in fact, there is an early
hazard, but
that early hazard has a smaller hazard
ratio than
the hazard at 18 months or 24 months or
even 36
months, and if we ever were to look out 5
years, it
might still be increasing.
Do you think that is a
reasonable
hypothesis?
DR. GRAHAM:
I think more likely it is,
that in your clinical trials, early on
you don't
have enough power to distinguish the
risk. The
hazard is the same, but the lines are
closer
together, because we are closer to the
origin.
I think one other explanation
for the
122
lower risk ratios in observational
studies, I would
think is more likely due to
misclassification of
exposure and misclassification of
outcome. It is
likely to be nondifferential, so it would
tend to
reduce the odds ratios and relative risks
towards
1.
Exposure, because people are
going to take
it, a lot of these people are taking it
on a prn
kind of basis. In a clinical trial, you have a
greater certitude that they are actually
taking it
every day. That introduces a lot of
misclassification, so the a priori
hypothesis going
into an observational study, with
misclassification
going on, you are fighting an uphill
battle to see
an effect.
DR. WOOD: We have got lots of people who
want to ask questions. I want to make sure that
the people who are asking questions have
questions
they want to ask for clarification of the
speakers
who have spoken rather than just general
points.
Dr. D'Agostino.
DR. D'AGOSTINO: I have a couple of
123
questions along the way here. I have spent a good
part of my career in the Framingham Heart
Study,
and it's an epidemiological study and a
cohort
study, and we take joy when somebody runs
a
controlled trial on hypotheses and then
later on
confirms it.
The first question is I am
concerned that
even though you have gone through this
careful
analysis, your conclusions are no
apparent effect,
probably increased effect, probable
increased risk.
They really don't help us in the sense of
pinning
things down. We have a couple of very strong I
think good studies, the APPROVe study and
the APC
study as placebo-controlled trials.
Tell us quickly where is the
weight of how
we should look at these two pieces, the
controlled
trials we have versus what you have
produced.
DR. WOOD: Really quickly.
DR. D'AGOSTINO: Really quickly, it can be
done quickly.
DR. GRAHAM: My belief is that for the
controlled clinical trials, for the
levels of risk
124
that we are concerned about, that they do
not have
the statistical power early on to show
risk
differences.
DR. D'AGOSTINO: I think Bob O'Neil's
comment is very important here.
The other two points, and again
I will
make them quick, I am very concerned
about the high
dose effect you have, and I am really
concerned
about the MI and the number of
cases. I mean blood
pressure, cholesterol, diabetes, smoking,
this is
what drives people to have heart attacks
and what
have you, and that is completely missing
on your
assessment of how many new cases, so I
guess it is
more of a comment that I am really
concerned that
that sheet needs sobering interpretation.
DR. GRAHAM: But it was based on the odds
ratios and relative risks where those
factors were
adjusted for, so as well as they are
adjusted for,
that is what the projection represents,
the excess
after adjustment.
DR. D'AGOSTINO: Yes, but I mean the
comment was made by you, throwing in the
analysis
125
doesn't necessarily adjust for them.
The last one, you made a very
nice point
about the cardio-protective effect, and
you tried
to show that these uses, and what have
you, somehow
or other all have the same risk, and your
interpretation that there must be some
confounding
going on, why doesn't that hold for all
the studies
you gave, why don't that hold for the
Solomon
study, which you thought was a great
study, yet,
this one result you don't like?
DR. GRAHAM: For what, the Kimmel study?
DR. D'AGOSTINO: Wasn't it the Solomon
study that had the naproxen as the
cardio-protective?
DR. GRAHAM: That is because the cardio
protection was present when they were on
the drug
and
when they weren't on the drug.
DR. D'AGOSTINO: I understand what you are
saying, but if that's a problem, then, it
means
there is some confounding going on.
DR. GRAHAM: No, it's selection bias.
DR. D'AGOSTINO: Well, it's selection
126
bias, but why isn't it for the whole
study? Why do
you throw out a result you don't like and
keep all
the results you like?
DR. GRAHAM: No, that is not what I did.
I pointed out a result where they showed
the
presence of the selection bias. In other studies,
the Ingenix study is the only other study
that
looked at this. I don't have a slide of it.
DR. D'AGOSTINO: I don't know if it's a
selection bias or misinterpretation of
the data.
DR. GRAHAM: Well, to me it looks like
selection bias.
DR. WOOD: Let's continue that
conversation later.
Dr. Morris.
DR. MORRIS: David, would you go to slide
14.
That is the risk, the duration of use.
I
think one of your points was that if you
look at
your study, tell me if I understand this
right,
that with the lower dose, that the median
time to
an AMI is sooner than with a higher dose,
did I
understand that right?
DR. GRAHAM: Yes.
DR. MORRIS: A month?
DR. GRAHAM: Had more cases, a greater
127
proportion of our cases, but the other
thing is
remember, down here, we are talking about
18 cases
or so.
The N here is small, the N here is like 58,
and the N here is 10. So, I wouldn't read too much
into the difference.
The more important point is
that at the
low dose, nobody was out there beyond 18
months, so
all the action happened before 18 months,
and the
same for the others. I see what you are saying. I
can only say that is what our data were.
DR. MORRIS: One interpretation is what
you said earlier, that for this
particular drug, we
are talking about, as you said, no safe
level. I
was wondering if that is the way you
interpreted
it, that because we are talking about
Vioxx here,
and there is no safe level, that
something is going
to happen sooner, or is it something with
the
populations are different.
DR. GRAHAM: The populations could be
128
different, but I think, you know, you
would expect
the
higher dose to have a shorter latency to onset
than the higher dose, but the numbers are
so small.
DR. MORRIS: Okay, it's a small number
problem.
DR. WOOD: So, the answer is too small
numbers at high dose.
Dr. Boulware.
DR. BOULWARE: I just want to make sure I
understand something that you had
proposed in your
excess population risk slide, if you
would put that
back up.
As a rheumatologist, I use
these drugs in
a population much greater than what you
have here
with a 65 to 74 where the risk of an MI
is fairly
high in that group.
Did you want us to believe that
this
excess risk that you are proposing would
be
extrapolated to other population groups,
too?
DR. GRAHAM: Well, no.
DR. BOULWARE: Do you have any numbers
that may demonstrate that?
DR. GRAHAM: Well, the answer to the
second is no. This was an example in
conversation
with people planning the talk, to try to
help
129
people connect with what it means.
Cardiovascular risks go
up. I mean in the
next age group higher, the risks are
higher. In
the age groups lower, they are lower, but
cardiovascular risk begins to increase in
the 40s.
DR. BOULWARE: I understand, but it
wouldn't be a linear type of thing.
DR. GRAHAM: No, the background risk isn't
linear, the relative risks, though, are
adjusted
out.
DR. BOULWARE: Because one of the
questions we will be faced with is are
there
subpopulations or groups that these may
be safe in,
and I just want to make sure I understand
the
relative risk in different age groups.
DR. GRAHAM: Nobody in any of the studies
where they have looked at it have
reported effect
modification, which would be that the
level of risk
differs at different ages.
DR. BOULWARE: One more question here. I
want to make sure I understand. I think I heard a
comment that says when the risk
approaches
2.0--maybe I just assumed that you said
this--that
it was an unacceptable level of risk.
Is there ever a case where a
drug may have
130
a clinical benefit in which that risk is
acceptable, because for the patients I
see, not
giving them any of these drugs will
confer a great
deal of risk on them, and physical
impairment, and
we have studies that show that the
functional
classification of rheumatoid arthritis
patients
carries with it a significant mortality
as that
class goes up?
DR. WOOD: I think that is a question for
the committee to answer rather than Dr.
Graham.
Let's move on to Dr.
Cryer. Do you have a
question?
DR. CRYER: I do.
The comment and
question I have of Dr. Graham addresses
an issue
that I think is an important difference
between the
observational studies and the prospective
studies,
131
and this difference relates to assessment
of drug
compliance and missed doses, and I think
it is
critical as it relates to assessing drugs
which
potentially affect platelet function.
A huge difference, as you know,
between
aspirin's effect and every other NSAID
including
the COX-2 inhibitors, is that with the
non-aspirin
NSAIDs, as soon as you remove the drugs,
whatever
potential effect they would have had on
the
platelet are immediately reversed.
So, with naproxen specifically,
my
preconceived bias, which may be wrong,
but my
preconceived bias based upon everything I
know
about the pharmacology and the things
that Dr.
FitzGerald has reviewed for us, is that
it should
have some mild anti-platelet effects
which would
only be present when the drug is on board
in the
system.
So, the specific question is,
in the
observational studies, recognizing that
in clinical
practice people miss doses of their
NSAIDs, they
are not taking their NSAIDs consistently,
how do
132
you account for the missed doses in the
observational studies recognizing that
this could
potentially lead to a mitigation of
whatever
negative effect or positive effect that
they may
have?
DR. GRAHAM: It ends up being
misclassification. Generally, what that
means is it
will force the observed level of risk,
the relative
risk of the odds ratio closer to 1. So, if we had
an increased risk, it would make it
lower, if we
had a protective effect, it would sort of
make it
higher, closer to 1.
DR. CRYER: Right, we agree on that. The
specific question is, is there a way to
actually
recognize or to account for when people
do not take
their doses in the observational
databases?
DR. GRAHAM: No, there isn't, so when you
are studying, say, an increased risk,
that is why I
said if you find something, you have to
realize you
found it despite the misclassification.
DR. WOOD: Okay.
Dr. Domanski.
DR. DOMANSKI: I will save it for
133
tomorrow.
DR. WOOD: Okay, great.
Dr. Furberg.
DR. FURBERG: No.
DR. WOOD: Okay, great.
Dr. Temple, who does speak for
the FDA.
DR. TEMPLE: I am just asking questions.
A couple.
Actually, one point is it seems to me
that since we expect that people are
going to be
getting one drug or another, comparisons
with other
NSAIDs seems like as good a comparison as
we should
make.
You might want to leave out indomethacin if
you are worried about it. That's one thing.
I guess my main question,
though, is
everybody has paid appropriate lip
service to the
idea that very small differences are hard
to
interpret in epidemiology.
People have said 1.5, 2. Actually, I
notice in one of his editorials, Dr.
Furberg cited
a paper of mine where I said anything
less than 2
really needs a lot of questions. Jerry Cornfield,
who sort of invented all this stuff, used
to say 3.
Well, we are talking about
differences
134
here that are 0.1 differences, not that
they
wouldn't be hugely important if they were
true,
that is absolutely true. So, I guess I want to
know what Richard and you make of all
this, because
the numbers are very small, and yet, just
as an
example, there is a very great consistency
that you
cite that celecoxib looks sort of okay,
but you
found one study where there is a little
hint that
maybe the higher dose is a problem, and
since
probably we all think dose response is
likely, that
looks good to you.
DR. GRAHAM: Two studies, there were 2.
DR. TEMPLE: Okay, 2.
The valdecoxib
data, which shows nothing, doesn't look
so good
because we probably all believe that
there is
likely to be a class effect.
What I am asking is, with
numbers like
this, how do you know what to do with
them? That
seems very fundamental for the
epidemiology.
DR. WOOD: But, Bob, there are 4
randomized clinical trials here, and your
comments
don't apply to them, I assume.
DR. TEMPLE: No, they don't, although they
are not perfectly consistent either. But, no, I am
asking, what do we make of differences of
this
135
magnitude with everybody having given lip
service
to the idea that small differences are
hard to
interpret, and yet we seem to be
enthusiastically
endorsing them, so I just want to know
what Richard
and David think about that.
DR. GRAHAM: Rich, do you want to go
first?
DR. PLATT: I think we have to be cautious
about how we interpret it, so I would say
the
finding of a relative risk of 3 in an
epidemiologic
study, as David found, is meaningful--
DR. TEMPLE: For high dose rofecoxib.
DR. PLATT: For high dose rofecoxib.
DR. TEMPLE: I would not dispute that at
all.
DR. PLATT: It seems to me that in that
context, that a dose response effect,
that the
information about lower doses gains
weight by
borrowing from that. I think that is also worth
136
keeping in mind when, in other studies
that are
working in that range that make us all
nervous,
there appears to be a dose response
effect.
It is the kind of consistency that
makes
the study, in my mind, be worth more
attention. I
think there is something to be said for
giving more
weight to relatively small excess risks
if they are
seen in a number of different
environments when we
can't have good reason to think that
there is a
similar kind of biases that might be
contributing
to it.
After that, I agree with
you. We are in
relatively difficult terrain. I think that it is
not the same as no data, though. I think we ought
to distinguish between the situation in
which we
have no evidence from ones in which we
have
relatively weak evidence.
We didn't talk at all, for
instance, about
the enormous number of spontaneous
reports of
myocardial infarction following exposure
to
nonsteroidals. There are thousands and thousands
of them.
In my mind, they don't contribute at all
137
to the discussion, whereas, I think these
need to
be weighed in the mix when we don't have
clinical
trial information to depend on.
DR. GRAHAM: My answer is similar to his,
but I think that what you are identifying
is, is
that we are hitting or at least right now
the
frontier is the limits of what the
available tools
we have to define the levels of risk that
we are
talking about.
We are talking about small levels
of risk
that turn out for this particular event
to be
enormously important in a population
level. If you
are talking liver failure, we wouldn't be
having
this conversation. For that reason, it becomes
important and what I would say is sort of
emphasizing what Rich said, is I would be
looking
for consistency across different studies,
and if I
found a number of studies, say, as with
Indocin,
for example, to me, that is more
persuasive.
If I found a number of studies
that
pointed to a particular set of NSAIDs
that seems to
have low risks, I would take comfort in
that in the
138
absence of perfect information. I mean some light
in a storm is probably better than no
light In a
storm.
DR. TEMPLE: I take it if the differences
were at the level of 10 percent, 1.1
versus 1.2--
DR. GRAHAM: I am thinking more in a very
qualitative sense of things that they
seem to
cluster around 1. I mean 1.1 for ibuprofen, it
could be that, for example, may naproxen
increases
the risk 3 percent in the real world, we
are never
going to figure that out, maybe ibuprofen
increases
it 10 percent or 15 percent, maybe we
could figure
that out, I don't know, but there is
going to be a
place where qualitatively, if we see
enough studies
kind of sort of pointing to the same
place, you
know, most of them, they are not all
going to say
the same thing, there is going to be
these
conflicts, just like we have in clinical
trials
data.
But if most of the compass
arrows are sort
of pointing in the same direction for
particular
NSAIDs, I think those are the ones that
at least
139
that I sort of place on a suspect list.
DR. TEMPLE: So, very low hazards need at
least multiple support before they are
credible.
DR. GRAHAM: I think so, and I think that
you want to try to encourage to collect
that
information sort of to test that out.
DR. TEMPLE: Alastair, could I take half a
second to answer a question Larry raised
before?
DR. WOOD: Sure, a second.
DR. TEMPLE: Well, it's a very good
question, you know, if the drug is going
to be used
forever, why don't you study them
forever. The
only thing I would point out here is that
what sort
of started people thinking was VIGOR, and
VIGOR
didn't take 3 years to show anything, it
showed up
in 9 months.
So, what you have seen is for,
say,
lumiracoxib, a humongous study of about
the same
length, but, of course, they didn't know
about
APPROVe, did they, and whatever you think
APPROVe
means, whether Bob is right that it's
late, or
David is right that there weren't enough
cases,
140
people were pointing toward a study that
by every
reasonable thought, if you think
platelets are
involved, ought to be long enough to show
things
up.
But then you form a new
hypothesis once
you have APPROVe, and you have to adapt
it, and I
think that goes on all the time. It would not be I
must say for most things my first thought
unless
you are looking for cancer that you need
a 3-year
study to find it, but maybe you learned
that it
does.
Just for what is worth as an
example, you
can't get an anti-arrhythmic drug
approved in this
country without showing that you don't
alter
survival unfavorably. One result is there are
hardly any being developed, but, you
know, we had
bad experiences, we didn't like the
results of
CAST, so you change.
I think there is no doubt that
things
evolve and you have to expect that, and
APPROVe,
depending on what you think of it,
changes the
nature of what you expect.
DR. GRAHAM: Bob, just one point on that.
I think if the APPROVe study had been 5
or 10 times
larger than it was--I am talking about
retrospect
141
now--you would be able to answer with
much greater
confidence what is happening month 1 to
18. I
guess what I am saying is that you could
also
shorten the latency to identification of
a problem
if it turns out that the risk is early
on.
DR. TEMPLE: David, I think that is
entirely possible, and if it involves
platelets, I
would believe you, but if it involves a
small,
long-term increase in blood pressure,
then, I am
not so sure.
DR. GRAHAM: Right, but we saw yesterday--
DR. TEMPLE: We don't know.
DR. GRAHAM: We don't, but if it's
prostacyclin, that effect could occur
immediately.
DR. TEMPLE: Yes, but the blood pressure
effect could be delayed.
DR. WOOD: Right.
So what, Bob, you are
saying is that it is easy to be a Monday
morning
quarterback, but the data were not there
before.
DR. TEMPLE: I would never be that rude.
DR. WOOD: I think you are right.
Dr. Stemhagen.
DR. STEMHAGEN: I would like to clarify a
couple things. First, I am a little concerned in
terms of the unpublished data. I appreciate that
142
we are able to get data very quickly,
right at the
minute that it is being generated, but
none of us
have had a chance to really review that,
so I do
have some concerns about the weight putting
on this
unpublished data when the rest of us
haven't had a
chance to look at it.
I think there needs to be some
clarification. There was some discussion
about the
recall bias, and so on. Certainly, there is
a major
concern about that in case- controlled
studies, and
we don't have the questionnaires, but
there were a
lot of sort of subanalysis done in the
Kimmel
study, about trying to look at whether
recall bias
is a problem, and I am not sure that you
have
highlighted that enough that looking at
all those
different things, there were really no
differences
143
found.
Similarly, in the Watson study,
it's a
GPRD study, it is different than a lot of
the large
databases, the automated databases.
There is a lot more personal
involvement
in terms of the data and the data
collection and
the
adjudication of results, and I think it just
needs to be clear that all of these
studies are not
the same in terms of a Medicare study
where we
can't go back and validate records. A lot of them
had a much more careful review, and I am
just not
sure that that was totally clear and if
you hadn't
read each of the papers.
I would like to just ask a
question in
terms of your definition of the inception
cohort,