DEPARTMENT OF HEALTH AND HUMAN SERVICES
FOOD AND DRUG ADMINISTRATION
CENTER FOR DRUG EVALUATION AND RESEARCH
JOINT MEETING OF
THE ARTHRITIS ADVISORY COMMITTEE AND
THE DRUG SAFETY AND RISK MANAGEMENT
P A R T I C I P A N T S
Alastair J. Wood, M.D., Chair
Arthritis Advisory Committee:
Allan Gibofsky, M.D., J.D.
Joan M. Bathon, M.D.
Dennis W. Boulware, M.D.
John J. Cush, M.D.
Gary Stuart Hoffman, M.D.
Norman T. Ilowite, M.D.
Susan M. Manzi, M.D., M.P.H.
Drug Safety and Risk Management Advisory Committee:
Peter A. Gross, M.D.
Stephanie Y. Crawford, Ph.D., M.P.H.
Ruth S. Day, Ph.D.
Curt D. Furberg, M.D., Ph.D.
Jacqueline S. Gardner, Ph.D., M.P.H.
Eric S. Holmboe, M.D.
Arthur A. Levin, M.P.H., Consumer Representative
Louis A. Morris, Ph.D.
Richard Platt, M.D., M.Sc.
Robyn S. Shapiro, J.D.
Annette Stemhagen, Dr.PH. Industry Representative
Steven Abramson, M.D.
Ralph B. D'Agostino, Ph.D.
Robert H. Dworkin, Ph.D.
John T. Farrar, M.D.
Leona M. Malone, L.C.S.W., Patient Representative
Thomas Fleming, Ph.D.
Charles H. Hennekens, M.D.
Steven Nissen, M.D.
Emil Paganini, M.D., FACP, FRCP
Steven L. Shafer, M.D.
National Institutes of Health Participants
Richard O. Cannon, III, M.D.
Michael J. Domanski, M.D.
P A R T I C I P A N T S (Continued)
Guest Speakers (Non-Voting):
Garret A. FitzGerald, M.D.
Ernest Hawk, M.D., M.P.H.
Bernard Levin, M.D.
Jonca Bull, M.D.
David Graham, M.D., M.P.H.
Brian Harvey, M.D.
John Jenkins, M.D., F.C.C.P.
Sandy Kweder, M.D.
Robert O'Neill, Ph.D.
Joel Schiffenbauer, M.D.
Paul Seligman, M.D.
Robert Temple, M.D.
Anne Trontell, M.D., M.P.H.
Lourdes Villalba, M.D.
James Witter, M.D., Ph.D.
Steve Galson, M.D.
Kimberly Littleton Topper, M.S., Executive
C O N T E N T S
Call to Order:
Alastair J. Wood, M.D. 5
Conflict of Interest Statement:
Kimberly Littleton Topper, M.S. 5
Naproxen Investigator Presentation
Alzheimer Prevention Study: ADAPT
(Alzheimer's Disease Anti-Inflammatory
Constantine Lyketsos, M.D. 14
Additional Background Presentations
Interpretation of Observed Differences
in the Frequency of Events When the
Number of Events is Small:
Milton Packer, M.D. 42
Clinical Trial Design and Patient Safety:
Future Directions for COX-2 Selective NSAIDS
Robert Temple, M.D. 95
Issues in Projecting Increased Risk of
Cardiovascular Events to the Exposed Population
Robert O'Neill, Ph.D. 109
Summary of Meeting Presentations:
Sharon Hertz, M.D. 132
Sponsor Responses 140
Advisory Committee Discussion of Questions 147
Question 1: 165
Question 2: 284
Question 3: 320
Question 4: 356
Question 5: 367
Question 6: 391
Question 8: 418
Question 7: 432
Meeting Wrap-up 438
P R O C E E D I N G S
Call to Order
DR. WOOD: Let's get started. This is our
third day and thanks to everybody for coming back.
We have obviously entertained you sufficiently.
Kimberly has a statement to read.
Conflict of Interest Statement
MS. TOPPER: The following announcement
addresses the issue of conflict of interest with
respect to this meeting and is made a part of the
record to preclude even the appearance of such.
Based on the agenda, it has been determined that
the topics of today's meeting are issues of broad
applicability and there are no products being
Unlike issues before a committee in which
a particular product is discussed, issues of
broader applicability involve many industry
sponsors in academic institutions. All special
government employees have been screened for their
financial interests as they may apply to the
general topics at hand.
To determine if an conflict of interest
existed, the agency has reviewed the agenda and all
relevant financial interests reported by
meeting participants. The Food and Drug
Administration has granted general-matter waivers
to the special government employees participating
in this meeting who require a waiver under Title
waiver statements may be obtained by submitting a
written request to the agency's Freedom of
Information Office, Room 12A-30, of the Parklawn
Because general topics impact so many
entities, it is not practical to recite all
potential conflicts of interest as they apply to
each member, consultant and guest speaker. FDA
acknowledges that there may be some potential
conflicts of interest but, because of the general
nature of the discussions before the committee,
these potential conflicts are mitigated.
With respect to the FDA's invited industry
representatives, we would like to
disclose that Dr.
Annette Stemhagen is participating in this meeting
as a non-voting industry representative on behalf
of regulated industry. Dr. Stemhagen's role on
this committee is to represent industry interests
in general and not any one particular company. Dr.
Stemhagen is Vice President of Strategic Develop
Services for Covance Periapproval Services, Inc.
In the event that the discussions involve
any other products or firms not already on the
agenda for which an FDA participant has a financial
interest, the participants' involvement and their
exclusion will be noted for the record.
With respect to all other participants, we
ask, in the interest of fairness, that they address
any current or previous financial involvement with
any firm whose product they may wish to comment
There is one administrative announcement.
Would you please make sure that you take your phone
calls outside. It is messing up with our audio and
we would really appreciate it. Thank you.
DR. WOOD: The other administrative thing
that the sound person has asked me to say is, to
the committee, try and remember to switch off your
microphones when you are not using them.
Apparently, it messes it up.
MR. LEVIN: Mr. Chairman?
DR. WOOD: Yes, Arthur?
MR. LEVIN: I wanted to express a concern
I have in terms of the agenda for today's meeting.
For those of us who have been at advisory committee
meetings before, we know that there is often a
tendency to sort of squeeze the most important part
of these advisory committee meetings which is the
discussion and answers to the questions and giving
directions to FDA.
My concern is that, given the lengthy
discussions we have had over the past two days and,
given the fact that this is last day, that we will
not have enough time to fully explore all of the
questions that have been raised over the last two
days and to give some definite direction to the FDA
as to how to pursue these issues.
So I would like to suggest to
that we might shorten the presentations, or
eliminate them entirely, in order to have adequate
time to fully discuss all of our concerns and
different points of view around the table. I think
it would be really unacceptable to leave here today
unable, because of a time constraint, to give
direction to the FDA on this issue.
DR. WOOD: Did you have any particular
people you wanted to eliminate? Or do you want to
pass me a note, privately?
MR. LEVIN: It may be something the
committee as a whole should decide.
DR. WOOD: Let me make a suggestion. I
think that is a reasonable approach. I am sure the
committee will want to hear the data from the ADAPT
study and we should hear that in its totality.
Milt Packer has come a long way so we should hear
from him, I think. Milt is always entertaining,
Do we really need to hear from the two
DR. TEMPLE: I don't have any ego involved
in this. A fair amount of--some of what I am
talking about is about the adverse consequences of
blood-pressure elevation which I think I could
skip. So I could shorten it considerably. But you
guys decide. It is there for you to read if you
DR. WOOD: Why don't you do this. Why
don't you distribute your talk to us.
DR. TEMPLE: I think it has been.
DR. WOOD: Right; I understand that. I
will take that as a given. And both of you make
whatever remarks you would like to make from your
seats there at the times that you are allotted, but
brief and pointed. And let's not revisit all the
things we have visited before.
DR. TEMPLE: That's fine.
DR. WOOD: Does that sound fair? Dr.
DR. O'NEILL: Yes; that is fine.
DR. WOOD: That will save us some time.
So that is a good thought. In addition, we have
got Sharon Hertz's talk which, I notice,
40-something slides here--45 slides--which is a lot
to get through in a few minutes. So I think, while
we are sort of working up to that, she may want to
look at that and decide what she really needs to
say. I mean, after all, it is very unusual for the
FDA to summarize the meeting for the committee,
which is partly what the committee is here to do, I
So let's make sure that she can finish
that taking the time she has been allotted for it
which is 30 minutes. She would be better to remove
some slides rather than rush through it, I think.
Having said all that, let's get to the
first presentation. Does anyone else have any
thoughts on that? Yes, Annette?
DR. STEMHAGEN: I would like to ask
whether the manufacturers could have just one or
two minutes to make some summary comments before we
start our deliberations after lunch.
DR. WOOD: Do they want to do that now?
Is that what you are asking?
DR. STEMHAGEN: No; I think after these
DR. WOOD: Okay.
DR. STEMHAGEN: Thank you. I appreciate
DR. WOOD: Let's have some discussion
amongst the committee.
their having--they have had lots of time already to
present their data and had lots of mike time in the
DR. STEMHAGEN: Just in terms of the
deliberations that have gone on, there might be
some clarifying comments.
we can ask for clarifying comments. I think that
is what we--I would suggest--and I agree with
Arthur Levin in that we should get on to discussion
as quickly as possible.
DR. STEMHAGEN: I realize this is sort of
in contrast to try to shorten it. But I would like
to ask that that time be awarded.
DR. WOOD: Any other thoughts on that?
Let me get a sense of the committee. What is the
committee's pleasure about that? Yes?
DR. BOULWARE: I actually support that
recommendation, too, and would suggest you give
them a limited time, like you did with the public
comment where you will cut them off at two minutes,
so we know it will be limited. I would be
interested in the direction they plan to take. We
heard some startling news yesterday about the
possible remarketing of a product that they have
DR. WOOD: Does anyone object to them
getting two minutes apart from Dr.
think, the answer on that is that that is fine.
Remind them that, in contrast to most of their
experiences in the past for senior managers, the
microphone will be cut off.
DR. STEMHAGEN: Thank you very much. I
think we saw evidence of that yesterday.
DR. WOOD: Right. So they got the
message; right? Okay. Let's move along to the
first speaker, Dr. Lyketsos.
Alzheimer's Prevention Study: ADAPT
DR. LYKETSOS: Good morning, everyone. I
do not have slides. My name is
I am a professor at
presenting here today on behalf of the ADAPT study,
Alzheimer's Disease Anti-inflammatory Prevention
Trial. I would like to thank the committee for
inviting us to present. I am here today with my
colleague, Steve Piantadosi, who is also on the
steering committee and will be available to answer
any questions that might come up later on as well.
I have a prepared statement that will be
distributed to the committee later on today. I
delivered it to the staff this morning as I was
Before I get into the statement, I just
wanted to take a few moments to remind us of the
public-health importance of Alzheimer's disease to
somewhat set the context about how the ADAPT trial
has started specifically. Alzheimer's, as we all
know, is a major public-health
problem. It is a
devastating disease, typically runs a ten-year
course of neurodegeneration affecting probably
close to 4 or 4-and-a-half million of our citizens
at present and the number is expected to rise given
the aging of the population of the next several
decades to approach, perhaps, 12 to 15 million,
based on current projections.
Because of the these public-health
numbers, there has been a very significant effort
in our field for the last several years to develop
preventive strategies for Alzheimer's disease
because, once neuronal degeneration has started,
the evidence that treatments work, so far, is very
These preventive strategies have centered
on several possible treatments but the most
supported by the observational literature have been
nonsteroidals with over 24 studies right now
including four prospective population studies
suggesting substantial reductions of risk of
Alzheimer's disease perhaps with risk ratios, in
some cases, as much as 0.4 or 0.5. So it is within
that context that ADAPT was started with the
support of the National Institute of Aging.
I will move now to reading the prepared
The steering committee of the ADAPT study
welcomes the opportunity to present the rationale
for its decision, on
the NSAID treatments in ADAPT. This presentation
is important because there is much public
misunderstanding about our decisions and their
The ADAPT Steering Committee is deeply
committed to the safety of human subjects, even
more so in the context of prevention trials where
risks are typically not balanced by any promise of
tangible near-term benefit. In this notable way,
prevention trials differ from treatment trials
whose participants may hope for relief of symptoms
or improved outcomes in a condition already
The risk:benefit balance in prevention
trials is even further removed from a
the benefits of a proven treatment with its
acknowledged risks. Because ADAPT has not quite
completed the process of auditing and tabulating
the trial's cardiovascular safety on the date of
suspension, we cannot, today, present the trial
safety results at the time of the decision to
We defer that presentation to a
peer-reviewed publication planned for the near
future. For today, we note that, even with the
risk:benefit calculus of a prevention trial, these
data would not, in themselves, have led to our
decision to suspend either treatment. In reality,
those decisions were made in very unusual
circumstances. They reflected events external to
ADAPT that raised strong concerns about the
practicalities of continuing the treatments.
As the advisory committee probably knows,
ADAPT is a randomized, double-masked, multicenter
trial of celecoxib, 200 milligrams twice daily, or
naproxen sodium 220 milligrams twice daily versus
placebo for the primary prevention of
dementia and for the prevention of age-related
cognitive decline which is, in many instances, a
prodrome of Alzheimer's disease.
ADAPT also provides an opportunity to
study the long-term safety of its treatments in a
healthy elderly population. Eligibility criteria
include an age of 70 years or older at enrollment
and a health history that excludes many of the
known risk factor for adverse events with NSAID
treatments; for example, we exclude those with
preexisting uncontrolled hypertension, anemia or a
history of gastrointestinal bleeding, perforation
To provide independent recommendations
regarding continuation of the trial, the ADAPT
Treatment Effects Monitoring Committee, or TEMC,
which, I suppose, is our term for a DSMB, meets
twice a year. In response to emerging concerns
about cardiovascular risks with NSAIDs, membership
of the TEMC was recently expanded to include Dr.
Bruce Psaty, a physician with expertise in
evaluation of cardiovascular risks in
As an additional safeguard for participant
safety, the ADAPT study officers and consultants
also conduct reviews of safety data at intervals
between TEMC meetings. Amid the emerging
controversy about the cardiovascular safety of
selective COX-2 inhibitors, the ADAPT study officer
had been relatively reassured by their periodic
reviews of the celecoxib safety data. The study
chair communicated this information in a telephone
Hertz at FDA.
suspension of treatments and enrollment in ADAPT,
we had enrolled 2,528 participants. Of these,
2,463 had been randomized before October 1 of '04
with some 20 months average duration of
observation. These participants contributed a
total of 3,888 person years of follow up to
analyses that were presented to the TEMC on
Those analyses suggested a weak
suggesting increased risks of cardiovascular and
cerebrovascular events with naproxen. Reviewing
the data, however, we understood well the TEMC's
evident conclusion that this signal was not
sufficiently compelling or definitive to warrant a
recommendation to suspend the treatment or to
otherwise alter the protocol. This was on December
Thus, the study officers were surprised on
December 17 by announcements that two trials of
celecoxib for the prevention of recurrent
adenomatous colon polyps had been suspended citing
increased cardiovascular risks with treatment in
one of these studies, the Adenoma Prevention with
Celecoxib trial, or APC. This news led to
extensive discussion among the steering committee
on that day centering on the following
Number one; one arm of the APC trial had
used the same celecoxib dosing as ADAPT, 200
milligrams twice daily, but over a longer period of
News reports cited a relative risk of 2.5
for cardiac events in this arm of APC. Although
this risk was reported as only "marginally
significant," a greater cardiac-risk signal was
reported with the higher APC dosage of 400
milligrams twice daily.
Thus, we took seriously the possibility of
harm over time to ADAPT participants receiving
celecoxib. Especially in a prevention trial with
no strong prospects of immediate benefit, we had
strong misgivings about continuing celecoxib
Knowing almost nothing at the time about
the particulars of the APC trial and, in light of
the apparent lack of risk with celecoxib in the
other prevention trial, we might have discounted
the APC data and continued celecoxib. To do so,
however, we would clearly have needed the
concurrence of the seven IRBs that oversee ADAPT.
These IRBs began almost immediately to question us
about implications of the APC results and seemed
likely to question a decision to continue.
Even if we had persuaded them
continuation of celecoxib using a revised consent
process, we would surely be involved in lengthy
discussions with these IRBs. In the meantime, we
would be unable to offer much explanation to our
participants, thereby endangering the relationship
of trust that is vital to the success of long-term
Number three; as is common in long-term
trials, ADAPT was experiencing some difficulty with
adherence to treatments. This difficulty grew
following the withdrawal of rofecoxib and we
expected the announcement of the APC results to
exaggerate the problem further with scores of
participants stopping treatment, in effect, "voting
with their feet." This would erode statistical
power and increase the potential for bias in ADAPT.
Thus, even though the ADAPT safety data
did not, themselves, warrant suspension of
celecoxib treatments. There seemed little
practical choice but to do so.
We next confronted the dilemma of what to
do about naproxen and its placebo. As suggested
above, we regarded the accumulated naproxen safety
data as being somewhat more concerning than the
celecoxib safety data. Yet, they, also, were not
compelling. Although some post hoc data composites
barely reached statistical significance--these are
post hoc data composites barely reached statistical
significance for naproxen versus placebo, no
singular vascular event was clearly more frequent
with naproxen versus placebo.
Furthermore, vascular risks were not
expected with naproxen treatment. In fact, a
substantial body of prior data at the time had
suggested that naproxen offers some cardiovascular
protection. This lack of prior expectation cast
further doubt on the meaning of the naproxen data
in ADAPT which were vulnerable, in any case, to the
problem of multiple comparisons.
We could, therefore, have attempted to
have revised ADAPT to a two-armed trial of naproxen
versus placebo, instructing our participant to stop
taking their "white pills," as they are known in
the study, which are celecoxib and its
continue to take their "blue pills," which contain
naproxen and its placebo.
However the dangers were several.
Participants might end up getting confused and
taking the wrong pills and many would stop taking
their treatments altogether. We faced an ethical
dilemma. The suspension of celecoxib and
continuation of naproxen would have created the
impression among participants and among the general
public that celecoxib was risky but naproxen was
"safe." At least based on the signals from the
ADAPT data, this impression would have been
What would we then tell participants about
the risks with naproxen as we led through the
inevitable process of revised consent necessitated
by the protocol revision. Would the multiplicity
of IRBs even allow us to follow this course?
Finally, there was another risk to
consider. We began ADAPT expecting to see some
increase with naproxen in gastrointestinal bleeding
and other events. Even though we attempted to
reduce these excess G.I. risks by excluding
participants with prominent risk factors other than
age, the ADAPT data showed a notable increase in
G.I. bleeding with naproxen versus placebo.
Especially amid concerns that ADAPT was
exposing its participants to potential risks that
were immediate, while the trial's hoped-for
benefits lay in the future, the totality of the
above arguments lead the steering committee to
suspend both treatments and to also suspend
enrollment into ADAPT.
As noted above, we expect, within a few
weeks, to submit a scientific paper for peer review
and publication. The paper's focus will be on the
process and rationale underlying the decision to
suspend treatments and enrollment in ADAPT.
Because these decisions did rely, in some measure,
on the ADAPT safety data as of 10 December, the
paper will, also, disclose some of these data.
We are also cooperating with ongoing
efforts at the NIH to investigate the
cardiovascular and cerebrovascular risks
In addition, the NIA and the ADAPT Steering
Committee are committed to a further two years of
additional safety monitoring of our participants.
In preparation for a later, more
definitive discussion of the ADAPT safety data, we
plan to revisit a number of the adverse events to
collect additional information and then to submit
all information available now or later to a process
of expert adjudication. Depending on particulars,
the latter process will take months. In the nearer
term, we concur with the expert opinion that,
having taken these widely publicized decisions, the
steering committee must fulfill its obligation to
disclose its reasons for doing so based upon the
At the same time, we are intent that our
public presentation even of the current "working"
data must be at the highest attainable standards of
DR. WOOD: Thank you very much. Are there
questions directed to the speaker? Dr. Nissen?
DR. NISSEN: I fully understand your
rationale and I understand that the trial was
fundamentally stopped because of an issue
futility. You didn't think that you could keep
people in the celecoxib arm. That is all well and
good. The problem that occurred here is that a
warning was issued on naproxen which had the effect
of being the medical equivalent of screaming "fire"
in a crowded auditorium.
All over the country, many of us got calls
from patients saying, "I want to stop my naproxen
because it causes a cardiovascular risk." I think,
just a comment here, that it would have been far
better to have announced that the trial was
suspended for futility rather than for hazard when
there was a non-statistically significant hazard.
So, one man's comment.
DR. WOOD: I agree with that. Any other
DR. FARRAR: I wonder if you could comment
on the G.I. bleed component since, obviously, one
the deliberations we have to undertake is the
relative problems with G.I. bleed versus
cardiovascular risk. Certainly, that was known a
priori before starting the study.
As you commented very carefully, that
wasn't the only consideration. But, in a drug
trial where the outcome is unknown and the risk is
really fairly well known, I wondered how you
thought about that in terms of putting patients at
risk of something on the order of a few percentage
over the course of a five-year trial who might have
serious complications from the G.I. bleeding.
DR. LYKETSOS: I guess you are asking me a
DR. FARRAR: I am asking how, in the
design of the study, obviously the choice was made
to accept that risk for the unknown potential
benefit of reduction in Alzheimer's disease over
the course of the same trial. I am wondering if
you have any insights into how that decision was
made because, clearly, there are issues there about
the use of these drugs and their risks.
DR. LYKETSOS: Well, I am glad you are
asking the question. It certainly is an issue that
we have spent a lot of time discussing and which we
discussed with study sections, IRBs, at quite some
length and continue to discuss.
I think the fundamental point that I would
start with is where I started my presentation which
is the devastation that Alzheimer's disease brings
and the fact that all the study participants were
individuals who had a first-degree relative with
the disease and had, therefore, personal
In that context, we were very careful and
very clear with them about what we thought at the
time the known G.I. risks were so that, in the
process of consent, and that was revealed through
careful discussions in the consent process as well
as the consent form, the risk of G.I. bleed was
stated very clearly and that that, in some cases,
might lead to death.
So I think we felt that this was a
decision that our participants could make, given
that the risks were relatively small, and
that they would develop Alzheimer's disease was
higher and that we felt they could make the
decision for themselves if they were willing to
take the risk:benefit calculus as we saw it.
DR. WOOD: Dr. Gibofsky?
DR. GIBOFSKY: I share Dr. Nissen's
concern about this effect of crying fire in a
crowded theater. Many of our patients called and
suggested that they were going to stop their
celecoxib because of the concerns that were raised
from ADAPT as well. But you raised a very
interesting concern that I confess I hadn't given
enough thought to and that is the difference
between a prevention trial and an outcome trial.
Much of our discussion here later today, I
suspect, is going to focus on what action should be
taken, if any, to restrict drugs based on treatment
from data on prevention trials. I would be very
curious to hear you expound on that a bit more.
DR. LYKETSOS: That is an interesting
question. Let me just, if I could, because there
have been three comments now--I just
would like to
refer you to the early part of my statement where I
said the presentation is important because there is
much public misunderstanding about our decisions
and their rationale.
Several of you pointed out that there was
a cry of fire. I don't believe that that came from
DR. WOOD: We won't ask you to speculate
where it came from. There is certainly a view on
DR. LYKETSOS: I am not sure where it came
from. But, to address the other issue, I must say
I have not given it much thought as to whether
prevention-trial safety data would generalize in
the way that you are thinking about it. So I will
defer on that because I think it would need a fair
bit more thought by people who are more expert in
DR. WOOD: Dr. Fleming.
DR. FLEMING: It is my understanding, from
what you are saying, that the steering committee
was particularly influenced by the APC
not by the internal data from ADAPT; i.e., there
were, from you were describing, some emerging
trends that, in my words, were in the unfavorable
direction but in the context of monitoring trials,
we know that one has to be extremely cautious, when
you are looking at data continually over time, not
to overinterpret emerging trends that can easily
ebb and flow.
So my understanding, from what you are
saying, is it wasn't that there were, at this
point, some emerging trends that happen to be in
the unfavorable direction on naproxen. Rather, it
was the external data on the APC trial for Celebrex
that was the driving issue behind the
DR. WOOD: Just to develop that question,
what I understood you to say was you hadn't passed
some stopping boundary; is that correct?
DR. LYKETSOS: I'm sorry? I didn't hear
DR. WOOD: You hadn't violated your
stopping rule, or whatever stopping
rules, you had
DR. LYKETSOS: I think that our TEMC, our
DSMB, had opined the week before with the same data
from within the trial that they felt that we should
continue. So it was interesting how the two events
DR. FLEMING: I would like to come to that
second. I am leading to that. But first I wanted
to make sure that I understood what was the nature
of the concern. Is my interpretation correct?
DR. LYKETSOS: I think so. Back to how I
put it, the issue really was one of practicalities
more than our internal data, is that we felt we
would have to talk to IRBs and participants and
tell them something about--
DR. FLEMING: Could I first understand
what your sense of the evidence was. I want to
discuss that first, versus the practicality.
DR. LYKETSOS: The sense of the study
DR. FLEMING: The sense of the evidence
that was the basis for the decision in
adverse effects. I have heard two things. One is
the naproxen, but that was not compelling evidence.
That was within the framework of emerging results
that could be by chance alone when you are
monitoring data frequently. But external APC data
was very influential to you. That is what I am
hearing. Is that correct?
DR. LYKETSOS: Well, in fact, we didn't
know all the details of the APC data, as I pointed
out. I think it was that plus the climate that had
been created by rofecoxib coming off the market,
the influence that that had to some extent on our
participants, then the widely publicized APC
results and the sense that, even though the data we
were seeing and that our TEMC the week before had
seen, did not compel us to stop treatment based on
our own data, that there was now a climate created
where, practically speaking, we had to stop and
take stock and get more information, et cetera.
So it was that sort of the decision. I
was a complicated decision and that is why it takes
a three-page statement to try and explain
through our minds.
DR. FLEMING: There may not have been, to
the steering committee at this time, access to data
on PRECEPT for celecoxib or to the etoricoxib, the
lumiracoxib, data on naproxen that were very
favorable, but you did have access to the VIGOR
data which was very reassuring for naproxen and you
had evidence from the CLASS trial and some other
data from Celebrex.
I am perplexed that you would look at the
totality of these data and say that the results
were conclusive in terms of at least not being able
to provide information to the IRBs and to the
patients and caregivers in the trial representing
the totality of the data when your data-monitoring
committee had looked at the totality of the
evidence for benefit to risk.
On a data-monitoring committee, I have
always argued, don't just show me the safety data,
even if we are just looking at early assessments
for safety. It always has to be benefit to risk.
Even though, as you are pointing out,
this wasn't a
therapeutic setting, prevention trials also provide
major opportunity for benefit. Preventing major
diseases is also a very significant benefit.
My understanding is your data-monitoring
committee, in looking at the data, looking at the
benefit as well as the risk, indicated the study
should continue. How did the steering committee
judge, without access to ongoing data, that benefit
to risk couldn't be sufficiently favorable and that
a notification to the investigators, to the
patients and to the IRBs, that the monitoring
committee has carefully looked at benefit and risk
and that the totality of the data is beyond the APC
trial when you are looking at Celebrex and
naproxen? Why wasn't that strategy pursued?
DR. LYKETSOS: First, as I pointed out in
my statement, some members of the steering
committee did have access to the data that the DSMB
had seen. That is the first point. The second
point is, as you point out and as I think this
whole discussion points out, is these are very
difficult judgment calls. They have to take into
account evidence but also practical aspects of
continuing to conduct this sort of a prevention
trial in this sort of a population.
I think it was the judgment call, and I
can tell you, there was substantial discussion
around this when we had the steering committee
meeting, about these very issues. It was the
collective judgement at the time that this was the
right thing to do, given the various issues that I
have articulated in my statement.
DR. FLEMING: I will just pursue one more.
I am dismayed to hear the steering committee, some
steering committee members, had access to the data.
That is also a violation of the principles of
monitoring trials. It should have been in the sole
possession of the data-monitoring committee.
I am also distressed because I am not
hearing that monitoring committee was front and
center in terms of having these issues brought back
to it for reassessment. So, to me, what I am
hearing raises very significant concerns about
putting at risk the integrity of studies
prejudgments using only access to partial external
DR. WOOD: There was one other thing,
though, at least the word on the street was, and
you sort of mentioned that as well, I understood
there was a very large number of dropouts from the
trial after the Vioxx withdrawal and others and
that one of the perceptions was it was no longer
possible to continue the trial. Is that true?
DR. LYKETSOS: Let me clarify that. The
adherence had been declining on an annual basis
even before rofecoxib was withdrawn from the
market. So adherence was perceived as an issue in
that we felt that now there were data about one of
the study drugs and that that would further erode
adherence. We did not see a huge erosion in
adherence with rofecoxib, specifically, but there
had already been an erosion that was concerning and
we anticipated a further erosion.
DR. WOOD: Right. But the question for
this committee that Dr. Fleming is pursuing
vigorously, and I agree with him, is that
announcement that you all made--the announcement,
as it was picked up--maybe I should put it like
that--was that this trial was being stopped for a
What I heard in your statement and what I
hear from you now is that the trial was being
stopped for operational problems in the trial and
the safety signal was a convenient moment at which
to do that. But you had operational difficulties.
That is a very different interpretation and a very
different interpretation for the public and
Is that what you are hearing, Tom?
DR. FLEMING: It certainly appears to be.
It is part of what is concerning to me.
DR. LYKETSOS: I think my statement should
speak for itself. In terms of what the data were,
as I have pointed out, they will be submitted very
soon so that you can judge for yourselves.
DR. WOOD: Okay. Any other questions?
Sorry; Dr. Farrar. I beg your pardon. Dr. Farrar,
DR. FARRAR: I think, actually, that this
study provide some vitally important information
with regards to our consideration of the
class of drugs; namely, the NSAIDs. I would like
to just read on sentence from the statement.
It said, "Although some post hoc data
composites barely reached statistical significance
for naproxen versus placebo." Now, clearly, this
discussion would be much clearer after the
presentation of the data, a careful review of the
data. But Dr. Fleming noted that, in the VIGOR
study, there was some reassurance about naproxen.
I would like to just question that.
What is very clear in the VIGOR study is
that naproxen was safer than rofecoxib. But it
does not comment at all with regards to the
potential risk compared to placebo. In fact, I was
surprised when I heard the statement by Dr. Fleming
because, in fact, I have assumed, based on all the
data that we have, that every NSAID will not fare
well against a placebo.
I think that this data, and
be supported by the publication although I don't
want to try and foresee the future, but my guess is
that naproxen will not fare particularly well
against placebo in terms of its cardiovascular
safety. I think we need to be able to accept the
fact that all of them have some risk with regards
to cerebrovascular disease and this study is likely
to provide the data to support that.
DR. WOOD: Dr. Nissen?
DR. NISSEN: I don't want to belabor this
because we have got a lot more to discuss today,
but I think it is extremely important that, as a
medical community, we learn from this episode. In
the kind of media frenzy that was going on during
that period of time, this announcement, this
warning that was issued on a national basis about
naproxen, was inappropriate, led to some panic
amongst the public and we simply can't do business
We can't operate in this kind of a
fashion. I would urge any of the individuals who
were involved in the decision to issue a
go back and look at what happened and try to ensure
that we don't do this sort of thing again, because
once this gets picked up by the media, it passes
through generations of people and becomes the topic
of extensive discussion and may lead patients who
don't have the ability that we have around this
table to filter data--they don't understand
data-safety and monitoring boards. They don't
understand stopping rules. And it caused a panic
that was unnecessary and it shouldn't have
happened, and I hope it doesn't happen again.
DR. WOOD: Thanks very much. Let's move
on to next speaker, Dr. Packer.
Additional Background Presentations
Interpretation of Observed Differences in the
Frequency of Events When the Number
of Events is Small
DR. PACKER: Thank you, Alastair, members
of the advisory committee, FDA, ladies and
gentlemen. Today I have been invited by FDA to
address a specific question which is how should be
interpret differences in the observed
events in a clinical trial when the number of
events is small.
Let me just say arbitrarily that I will
define, for purposes of today, what I mean by a
small number of events and that would have provided
less than 70 percent power to have detected a true
treatment difference assuming an effect size
similar to that generally encountered in clinical
This is just a thought. Just suppose you
do a trial for a noncardiovascular indication and
you note that there are 13 major adverse
cardiovascular events in the placebo group and 33
such events in the drug-treatment group. How
should this difference be interpreted?
Many would simply perform a statistical
test, derive the p-value, and get excited if the
p-value were less than some arbitrary value such as
0.05. In this example, the p-value of 0.002 would
suggest, to some, that this difference between 13
and 33 in a trial of about 3,000 patients, would
have been observed only two times out of
effect unlikely to have been due to the play of
However, before getting excited, we should
remember that p-values must be interpreted in some
context. P-values are most easily interpreted when
they refer to predefined primary endpoints in
trials adequately powered, more than 80, 90 percent
power, to detect differences between treatments.
However, even under such circumstances, p-values
are not necessarily reproducible.
Bob O'Neill and others have made the point
that, if a p-value in the trial is 0.05, the
likelihood of seeing 0.05 in a second identical
trial is only about 50 percent. It is only when
the p-value in the first study is 0.001 that the
likelihood of seeing 0.05 or less in the second
identical trial is at least 90 percent.
These calculations are the basis of the
frequent FDA guidance that, to demonstrate
persuasive evidence for efficacy, a sponsor needs
to provide two trials with 0.05 or less or one
trial with a very, very small p-value.
But what if the event was not the primary
endpoint in the study? What, in fact, if the event
was not even precisely defined before the
the trial? What if the trial was not adequately
powered to detect a treatment difference for the
endpoint? What does a p-value mean under these
Unfortunately, this happens quite
frequently in clinical trials under a variety of
circumstances. But it is particularly true in the
analysis of adverse events. So lets make a list of
things to worry about when using p-values to
compare the frequency of adverse events in a
First, there are literally hundreds of
adverse events in a clinical trial and, therefore,
there are hundreds of possible comparisons that can
be made. Now, this is classically referred to as
the multiple comparisons problem. For example, if
a typical large-scale clinical trial yields as many
of 500 individual terms describing adverse events
and if a p-value were calculated for each
comparison, one would, of course, by chance alone,
expect about 5 percent of the terms, or about 25
events, at a p-value of 0.05 or less and 1 percent
of the terms are about 5 events to have a p-value
of 0.01 or less.
The second issue in interpreting
comparison of frequency of adverse events is the
fact that adverse events are spontaneous
nonadjudicated reports. Now, adverse events are
reported at the discretion of the investigator and
then translated into standardized terms. There is
little uniformity on how an event is identified,
defined or reported and this uncertainty increases
when the event is in a field remote from the
Now, some of you may believe that you can
fix this problem by carrying out blinded
adjudication of events after the fact.
Unfortunately, the rules guiding post hoc
adjudication are inevitably influenced by the
knowledge that a treatment difference has been
In fact, any bar set by a post hoc process,
is capable of magnifying or diluting an effect.
For example, if you set very strict
criteria, a committee could reduce the number of
events and, therefore, reduce statistical power.
By setting very loose criteria, the committee can
include many questionable events and reduce the
magnitude of a treatment difference.
To make things more complicated,
adjudication committees do not generally examine
individuals who did not report an event to make
sure they didn't have an event.
The third issue in interpreting
comparisons of frequencies is that some signals are
apparently only if adverse events are grouped
together. Now, that is not much of a problem if
the difference is fairly straightforward and
focuses on one single event. But things can become
a little bit more complicated if the analysis
requires a combining event and combining trends
across two or more events in order to reach some
magical level of statistical significance.
Now, the problem is that these
are frequently constructed after the fact, making
it possible to include only events that showed the
trend the investigator is interested in. For
example, if an investigator believed the drug
increased the risk of a major cardiovascular event,
he or she might first look at myocardial infarction
and stroke, but, finding little difference here, he
or she might be tempted to look at other related
events; for example, not seeing a difference in
myocardial infarction, an investigator might be
tempted to broaden the definition of a myocardial
ischemic event to include sudden death or unstable
angina if the differences between the groups
supported some predetermined judgment.
Similarly, not seeing a difference in
stroke, an investigator might be tempted to broaden
the definition to include a TIA. But the
possibilities of grouping is very, very large and
the possibilities of finding something, if you want
to be creative, are also quite large, even though
these differences may be related to the play of
As a result, the definition of grouping
may vary from study to study. Now, some
investigators try to fix this problem by
a uniform definition to be used across all studies.
But when the definition is developed after a
concern has been raised, those creating the
definition have frequently already looked at the
data or have communicated with those who have
looked at the data, and know either consciously or
subconsciously what kind of definition is required
to capture the events of interest.
The fourth, and what I want to focus on
the most in my presentation, is the issue of
interpreting comparisons of frequency of adverse
events because the number of adverse events is
small and, because they are small, they result in
extremely imprecise estimates.
Now, you may think that investigators
generally understand the difficulties of analyzing
small numbers of events. For example, most
investigators know that, when the number of events
is small, the lack of an observed
not rule out the existence of a true difference.
We have been taught that this should be apparent by
looking at the confidence interval and, as you can
see here, the confidence interval is very wide and
includes the possibility of benefit and harm.
So investigators, basically, consider
these kind of data to be inconclusive. But what is
generally not appreciated is that, when the number
of events is small, the confidence interval is
necessarily so wide that it may not truly represent
the range of values that would include the true
effect of the drug. As a result, even the finding
of an observed difference does not necessarily
prove the existence of a true difference.
To illustrate this point, this slide shows
the effect size and confidence intervals required
to reach statistical significance in a hypothetical
trial of 3,000 patients assuming a range from a
very small to a very large number of events.
Now, assuming the trial shows a
statistically significant effect--that means that
we are only going to look at this if a
let's say, is less than 0.05--the smaller the
number of events, the larger must be the treatment
effect in order for this effect to be statistically
significant and the wider the confidence intervals
have to be.
Put it another way, if the number of
events is small, the trial will show a significant
difference only if the treatment effect is very
large and the estimate of the effect is very
Unfortunately, when you look at adverse
events in a trial, the number of events will always
be small. This is because the trial, as you know,
was designed to provide enough data to examine the
primary endpoint, the trial produces a very precise
estimate of, but it is not powered to look at any
other analyses and, therefore, at the end of the
trial, you get generally a less precise estimate of
the secondary endpoint and an extremely imprecise
estimate of any specific adverse event.
Now, you may ask, what is wrong with an
imprecise estimate? Well, imprecise estimates are
fine if the intent is to withhold judgement until
more data are collected to make the estimates more
precise. But imprecise estimates are problematic
if the intent is to stop and reach a conclusion.
That is because, when calculated in the
usual manner, p-values and 95 percent confidence
intervals are most easily interpreted in the
context of a completed experiment. Unfortunately,
the adverse-event data generated in a typical trial
is not the result of a completed experiment. In
fact, viewed from the amount of data needed for a
precise estimate, the adverse-event data in a
single study only represents a snapshot of an
ongoing experiment to characterize the safety of
As a result, performing an analysis of
adverse-event data is akin to performing an interim
analysis of primary endpoint data in an ongoing
clinical trial. Now, this is important because we
know a fair amount of how to interpret interim
analyses in a clinical trial and here I really must
apologize to Tom Fleming because what I
am going to
review here very quickly is borrowed heavily from
his extensive work in this area.
But it is really important to think about
small numbers of adverse events as an interim look
on a global effort to characterize the safety of a
Now, as you know, when you look at interim
analyses in a clinical trial, one plots the
treatment difference represented by a z-score
against the amount of information that we have, and
that is generally represented by the fraction of
We start the trial at zero effect and zero
information. At the end of each interim analysis,
we add a point until we get to get to the end of
the study. Now, if we have assigned an alpha of
0.05 to the endpoint, we want to make sure that we
evaluate the treatment difference seen at the end
of the trial against an alpha of about 0.05 which
generally corresponds to a z-score of about 2.0.
Now, some might think, naively, that,
during the course of a study, the
difference between treatments will be so
predictable that we would observe a linear march
between the start of the study and the end of the
trial. But know that when the amount of data is
small, things tend to bounce around a lot, so much
so that early results can be very misleading.
It is sort of like the situation of trying
to predict the results of an election when only 1
percent of the precincts have been reported and
they are not even representative. So, as a result,
if we got excited about any difference in z-score
more 2.0 early in the trial, we would be getting
excited about effects that were not likely to be
seen or sustained if we had more data even though a
z-score of 2.0 would normally correspond to a
p-value of less than 0.05.
In fact, the smaller the amount of data,
the more things can bounce around a lot, the more
it is likely that what we will be seeing will be
due to the play of chance. Therefore, to prevent
investigators from reaching a conclusion when the
estimates are imprecise, statisticians,
particularly Tom, have recommended that
investigators refrain from getting excited about
nominally significant z-scores when the amount of
data is scarce.
Specifically, they have proposed that
boundaries must be crossed before we can feel
comfortable that an effect seen early is likely to
be present at the end of an experiment.
Now, Tom, in particular, has proposed a
curvilinear boundary like this. There are many
other boundaries that have been performed by
others. But this is very, very commonly used in
an alpha of 0.05 for a primary endpoint. It sort
of looks like this. Because it is curvilinear, to
be significant at the 0.05 level, the treatment
difference must be extreme when the amount of
information is small as would be the case early in
However, as the trial proceeds, treatment
differences required to conclude that there is an
effect at the 0.05 level decreases and
closer and closer to a z-score of about 2.0 at the
end of the study.
Now, this is a very different thought
process and a very different approach than getting
excited about a p-value less than 0.05 no matter
when you observed it during the study. For
example, a z-score of 2.5--that is right
here--would be meaningful if seen at the end of the
study but it wouldn't be considered significant if
seen early in the study even though the nominal
p-value at this time is less than 0.05.
Now, if the number of events is small, the
difference would need to be far more extreme--say,
a z-score up here--to be meaningful at the 0.05
Here is a specific example. This is an
old cardiovascular trial. This is the Coronary
Drug Project. It was carried out more than 30
years ago. It included a comparison of clofibrate,
a lipid-lowering drug, and placebo on coronary
events. At four separate times during the study,
the difference in favor of clofibrate was
statistically significant at a nominal p of 0.05 or
less. But, at the end of the trial, there was no
difference between placebo and clofibrate. The
difference seen early in the trial was related to
the imprecision inherent when analyzing small
numbers of events.
In fact, if a boundary had been used in
this study, at no time during the trial would the
treatment effect have crossed the boundary and led
to the conclusion that clofibrate was better than
Now, let me say this kind of fluctuation
early in a study is very, very common. There are
even examples that at treatment has been associated
with a nominally significant adverse effect which
later was reversed during the course of the trial
and became statistically significant at the end of
Now, I should mention that the boundary
that I have shown you is a boundary with an alpha
of 0.05. This means, when the boundary is crossed,
the p-value for the treatment effect is
0.05 not less than the nominal p-value that
corresponds to the disease score that allowed the
boundary to be crossed.
Now, for each p-value or each alpha, there
is a separate boundary. The requirement for
strength of evidence as it becomes more stringent,
the boundary is shifted upward and to the right.
You might ask why am I going through all
this. Because analyzing data derived in an
underpowered trial raises the same concerns as
analyzing data derived from an underpowered interim
analysis in an adequately powered study.
The cardiovascular field is replete with
examples of how misleading small numbers of events
can be. Let me give you a few examples. For
example, in an early pilot trial, the ACE/NEP
inhibitor, Omapatrilat, reduced the risk of a major
cardiovascular event by 47 percent when compared
with an ACE inhibitor. As you can see, the
confidence intervals are extremely wide because the
analysis here was based on only 39 events.
Later, a definitive trial was
that recorded nearly 1900 events. There was no
difference between Omapatrilat and the comparator
ACE inhibitor on the same endpoint in the same
Here is another example. In an early
pilot trial, amlodipine reduced the risk of a major
cardiovascular event by 45 percent, small p-value
but wide confidence intervals. Later, in a
definitive trial which recorded four times as many
events, there was no effect of amlodipine on the
same endpoint in the same population using the same
There are even examples when the effect
seen in a pilot trial was reversed when the
definitive study was carried out. Two examples.
In two pilot trials, both in heart failure, one
with the drug Vesnarinone, one with the drug
Losartan, both drugs significantly reduced the risk
of death--not a minor endpoint; death--by 50 to 60
percent. But these benefits were seen in trials
that were each recorded fewer than 50 events and
thus produced treatment estimates with
wide confidence intervals.
When both drugs were reevaluated in
definitive trials that recorded ten times as many
events, both drugs were associated with increased
risks of death, in one case, significant at the
less than 0.05 level.
Now, notice that the confidence intervals
of the treatment effect in the definitive trials do
not overlap with the confidence intervals of the
treatment effect in the early pilot studies. So
here we have an effect, two examples, of an
underpowered trial that showed a significant
benefit whereas the definitively powered study
showed significant harm.
Here is another example. This is a
meta-analysis of a small number of trials looking
at the effect of magnesium in acute myocardial
infarction. A meta-analysis of a number of studies
showed intravenous magnesium associated with the
striking reduction in mortality, a 55 percent
reduction in risk of death, but wide confidence
intervals, a very small p-value, in a
This effect appeared to be reinforced
smaller treatment effect but wide confidence
intervals and then, subsequently, in a definitive
trial that recorded 4,000 deaths, there was a
nearly significant adverse event of magnesium on
the same endpoint in the same population.
Now, again, please note that the
confidence intervals of the treatment estimate in
this definitive study do not overlap at all, with
the confidence intervals of the estimates in the
earlier moderately sized study, and not at all in
the meta-analysis. Again, this is really a
reflection of the imprecision inherent in looking
at small numbers of events.
Let me give you one final example because
it actually deals with an adverse effect. In an
early pilot trial with extended-release
metoprolol--this is a study that looked at a very
small number of events, about 20 events, showed a
three-fold increase in the risk of hospitalization
of heart failure in the metoprolol group
with the placebo group. Look at the confidence
intervals here. They go from about Washington to
California, very, however, nominally significant
When this trial was replicated in a
similar population with exactly the same drug,
exactly the same formulation, exactly the same
dose, there was now a reduction in the frequency of
hospitalization for heart failure. Let me just
emphasize, this was recorded as an adverse event in
this earlier trial.
So what have we learned from all this?
Well, a couple of thoughts. To achieve statistical
significance in an underpowered analysis, the
effect size must be extreme and the estimate must
be imprecise. Yet the more extreme the effect, the
more imprecise the estimate, the less likely it
will be reproduced in a definitive trial. That is
why I think, of all the things that we can worry
about in looking at adverse events, the most
worrisome is the imprecision inherent in the
analysis of small numbers of events.
Let me just close with a few final
thoughts. You might ask, based on all of this,
what should we do. Well, I think the first step,
perhaps the most important first step, is to
develop an approach to analyzing data in trials
with small numbers of events which actually
accurately reflects the true imprecision of the
treatment effect estimate and its statistical
Let me just emphasize one thing, and I
just want to put this as a proposal. In no way,
would I propose this as a definitive solution but,
to get the discussion going, this might be an
interesting first way of thinking about this.
The conventional way of comparing small
numbers of events is to calculate 95 percent
confidence intervals followed by the derivation of
the p-value. However, the conventional calculation
of the confidence intervals incorporates into it a
z-score that the investigator designates as the
target value for statistical significance. For
example, most statisticians, in
confidence interval, would simply use a z-score of
And they would do that because that is the
critical value for the z-score at the end of an
adequately powered trial with an alpha of 0.05. So
what they would do is they would take this z-score
and they will use it to calculate the confidence
interval. What a lot of people, I think, fail to
realize is that this z-score is not the critical
value for decision making if one looks early in the
Early in that experiment, the critical
value for a z-score should be determined by the
interim monitoring boundary appropriate for the
information content, not the z-score at end of the
Now, if one uses the boundary z-score in
the calculation of the 95 percent confidence
intervals, the confidence intervals here will be
much, much wider resulting in a p-value that will
no longer be statistically significant. Now this
is important because everyone talks about
at these meetings. I showed you these data before.
Conventionally calculated, the p-value would be
0.002 meaning the likelihood of chance alone being
2 in 1000.
Well, if, in fact, if one recognized that
the data here really result in a very imprecise
estimate and one incorporates the thinking process
of an O'Brien-Fleming boundary into this, as a
reflection of this imprecision, then the confidence
intervals now truly reflect the imprecision in the
estimate and now the p-value is a lot interesting
than it was before.
Now, the use of boundary-adjusted
confidence intervals would, I think, appropriately
describe the great uncertainty inherent in the
analysis of small-numbers events, hopefully
markedly reducing the false-positive error rate.
In spite of using a boundary-adjusted
confidence interval, adverse effects that are known
to be characteristic of specific drugs would
generally remain statistically significant.
However, this approach, and it is just a
experiment, would not provide a way to interpret
trends observed in imprecise data.
So, lastly, let me just conclude with some
thoughts about what we should do with worrisome
trends in imprecise data. The first thing we could
do is believe in those that are biologically
plausible. However, we need to be very careful
here. Everyone knows physicians can always be
relied on to propose a biological mechanism to
explain the validity of an unexpected and
potentially preposterous finding simply because it
happens to have an interested p-value. Anyone who
doesn't believe this, you know, I would be happy to
show you overwhelming evidence that this is the
Second, is we could look for confirmatory
evidence in other studies reminding that we
shouldn't be selective. But, even if every study
showed the same trend, how would you know that you
had enough evidence to reach a conclusion? Some
have proposed doing a cumulative meta-analysis in
which each trial is considered to
interim analysis on the way to a final judgement.
Indeed, Salim Yusef has proposed that, as
each trial is added to the meta-analysis, that one
use interim monitoring boundaries to interpret this
cumulative meta-analysis. This has, certainly, a
considerable amount of appeal.
Let me just emphasize. Salim has, in
fact, underscored the fact that the conditions here
are not identical those that exist for a true
interim analysis. In the case of a true interim
analysis, we generally know that the types of
patients in studies are similar at all observation
points. Here it is different.
In the case of a cumulative meta-analysis,
the types of patients in studies differ across the
various trials. So, as a result, Salim has
proposed that, when reaching a conclusion based on
data that has been combined across trials, that a
boundary more strict than 0.05 be used.
Now, he has specifically outlined the
importance of this using the example of intravenous
magnesium. I showed you the data on intravenous
magnesium in myocardial infarction. When the early
trials with magnesium were carried out, the z-score
of greater than 2.0 was crossed early. As the
cumulative evidence occurred, the initial boundary
of 0.05 was crossed.
But then a large study, when added to the
other cumulative analyses, brought this treatment
effect down to a 0 level. So Salim, and others, in
fact, have emphasized that, when you are using a
meta-analysis approach and using intra-monitoring
boundaries, that maybe one should require a p-value
of less than 0.05 or even, perhaps, a small
Let me say that most of the effects the
committee has seen over the past two days would not
come even close to meeting these criteria.
Now, some of you may say, why not avoid
all of this uncertainty and simply carry out an
adequately powered definitive trial with the
adverse event as the primary endpoint. Is this
crazy? No; it is not crazy at all. Sponsors
pursue encouraging trends. Most are disappointed,
but they will pursue them. Sponsors, therefore,
should have an obligation to pursue discouraging
trends realizing that most of them probably won't
be confirmed either.
On a definitive trial can address
ascertainment and classification biases as well as
concerns about multiplicity of comparisons and
imprecision of the data. However, can we really
expect sponsors to pursue every adverse trend?
There are some obvious limitations to doing this.
Furthermore, if you could decide which adverse
trend you wanted to pursue, how easy would it be to
carry out the trial intended to definitively
evaluate an increased risk of an adverse effect?
Can you imagine the consent forms for the
IRBs for such a study? Some may say that we are
being too stringent here, the that criteria of
raising a safety concern need not be as stringent
as the criteria for establishing efficacy. But I
am not so sure that the criteria for establishing
efficacy and safety should be that different.
As a rule, we are very strict
conclusions about efficacy because saying that
there is a benefit when there is none means that
millions will be treated unnecessarily and subject
to side effects and cost. Now, although some might
advocate being less strict in reaching conclusions
about safety, please remember; saying that there is
an adverse effect when there is none means that
millions will be deprived of an effective
In conclusion, the findings of controlled
trials are most easily interpreted when they
represent the principal intent of the study. A
non-principle finding is subject to many
interpretive difficulties many of which we have
reviewed; ascertainment biases, inflated
false-positive rates due to multiplicity of
comparisons and, the one I have emphasized the
most, the imprecision of estimates inherent in the
analysis of small numbers.
I think FDA, industry and academia remain
in a quandary as to how to respond in a responsible
fashion to observe differences in the
frequency of adverse events. Let me just
emphasize, my presentation shouldn't be construed
as favoring one particular side in all the
discussions that have occurred. In my view,
regardless of one's position, it is critical to
understand the limitations of what we know and to
resist the temptation to reach conclusions before
we are justified to do so.
I think only by recognizing our ignorance
will we be able to take the first step towards
developing a rational approach that is in the
interest of all patients.
Thank you. I will be happy to answer any
DR. WOOD: Dr. D'Agostino?
DR. D'AGOSTINO: Thank you very much,
Milt. I have a couple of questions that I think, I
hope, are relevant to our deliberations. In terms
of your sense of large and the idea of chasing
after a safety event and making more out of it than
one should, we have a study approved where there
was a serious up-front prestated deliberation
make sure they had good ascertainment and
adjudication of cardiovascular events, and they
come up with 45 versus 25 events, carefully
I am struck by that's being small, but I
am also struck by the carefulness in which it was
done, say, as opposed to the APD where they did an
interim analysis that has those problems. Could
you comment on, say, the approved study?
DR. PACKER: I think that, when you have
incomplete data, as you would if you have
small-numbers events, you need to be a lot more
careful about the thinking process. That doesn't
mean you can't make judgments. It doesn't mean you
can't incorporate a set of principles that would
guide decision making by looking at the totality of
the evidence and bringing to the process what you
inherently believe. I think that is what the
committee needs to do today.
What I really wanted to address, however,
is how hard this is and that the normal
reliance--as you know, clinical
because they don't understand p-values, rely on
them. What I am trying to do is to explain that,
in fact, we are less certain about what we know
here than we, perhaps, should be.
DR. D'AGOSTINO: But that is on the
approved, studies, it was reasonable, too.
DR. PACKER: I think you need to take that
in the totality of the carefulness in which it was
done, the prospective nature of it. But, remember,
in all the examples that I showed you, the trend
seemed sometimes very striking trends in early
pilot trials that were prespecified, adjudicated
endpoints but, because they were small-number
events with very imprecise estimates, the
definitive trial was non-confirmatory.
So just because it is up-front and
DR. D'AGOSTINO: That is my question, yes.
That is my question. You still end up with small
numbers. Let me have just a couple of other
questions. The second question is really bothering
me very much in terms of how we would
trials. If you decide--if the group decides and
suggests to the FDA that there should be more
trials, more randomized clinical trials, the
sponsors are, then, going to have to go back and
say, well, they are going to set up a trial saying
the null hypothesis that the relative risk is 1.0
versus the relative risk is not 1.0.
Now, the best thing a sponsor can do is to
run a very sloppy study and they will accept that
null hypothesis because the confidence intervals
will so wide and they will contain 1.0. The
alternative is to sort of do a noninferiority type
idea that you end up the study, you end up with the
confidence interval, and that confidence interval
has to be below something like 1.3.
Do you have advice for us if you did this
sort of second approach? We are dealing with rates
like 1 percent. Could we live with a 1.3 relative
risk that you rule out, a 1.3 relative risk?
People may be dying if you do that. So how do you
respond to that?
DR. PACKER: I wish I knew the answer to
that. I think that it depends on the type of
adverse reaction. It depends on the particular
drug. It depends on the vulnerability of the
patient population. All of these need to be
factored together with the actual feasibility of
doing the study.
The one thing I would say is that one
learns very little by doing a lousy trial. So,
doing a good trial is the only way to get a
reasonable answer or reasonable estimate of the
DR. D'AGOSTINO: Just one more. I will
make it quick. In these trials, in many of these
trials, people just won't stay in the trial. Can
you give us some advice on how to deal with the
drop-out--now, there are rules that you could say,
the individual wants to leave, has decided to leave
because the blood pressure is building up or
because of G.I. problems building up.
To say, we are only going to look at that
individual for 14 more days after they leave, to
me, is a problem because if the blood
building up, they may be on their way and it may
take two or three months before they get an M.I.
and so forth. So you have got the sort of
dropouts, terminations, that are part of the
protocol but you also have the individuals who just
stop coming. And they could be substantial. So,
any advice to us?
DR. PACKER: Gee, as you know, when we do
trials for superiority, the effort that we put into
adherence is extreme. We really want people to
stay on treatment and we organize the trials to do
everything we can to ethically and reasonably
I take your point that, if the trial were
a noninferiority trial, it is possible that the
investigators and sponsor might be less motivated
recognizing that poor adherence works in their
favor. I think that there needs to be a reasonable
effort--I mean, you can maintain adherence in most
trials if you really, really want to.
DR. D'AGOSTINO: Thank you.
DR. WOOD: I suspect we are not going to
solve that problem today. Dr. Shapiro?
MS. SHAPIRO: Just a comment on your
comment. We all know, of course, that the Federal
Regulations require that participants be allowed to
withdraw and not be badgered into staying. But
what I really wanted to talk about was your
observations about how it is wrong to suggest that
we should not chase safety quite as rigorously
because we will, then, deprive ourselves and others
of information and access to effective treatment.
I don't think it is as simplistic as that,
in that, when we are looking at potential harm or
safety problems, we have to look not only at
likelihood that it exists but prevalence and
So I think that your response to that
approach has to take account of those factors as
DR. PACKER: Let me try to reframe my
response. You can't isolate benefit from risk.
The judgment as to whether a drug should be used on
an individual basis or on a population
basis has to
be the relative value of benefit to risk. You may
decide that you don't even want to pursue a safety
trend in a non-fatal event when you know the drug
prolongs life. That would be a very reasonable
On the other hand, you might want to
vigorously pursue a very serious safety is in a
drug for a symptomatic or cosmetic condition. So
the risk-to-benefit relationship is the one that
has to be vigorously defined.
MS. SHAPIRO: Right. I am sure you will
agree with this; you also have to factor in
prevalence of the condition and likely use of that
drug in the population.
DR. PACKER: That's right. But it is
always--it is risk to benefit. The goal here is
not to say that the risk-to-benefit relationship
can be altered, simply because you want to
emphasize one part or another, has to be in the
context of the clinical problem and looked at from
the patient point of view.
DR. WOOD: Dr. Cush?
DR. CUSH: I have two questions. One, I
need some education. You were frequently referring
to very wide confidence intervals where
seem so wide. It was only, like, 0.3 and 0.4
where, obviously, when it ranged from 1.0 to 8.0,
that is very wide. But you used those terms in
both situations. Could you explain the differences
DR. PACKER: Actually, I have used "wide"
to refer to extremely wide, moderately wide and
DR. CUSH: And narrow would be--
DR. PACKER: Narrow is less than wide.
DR. CUSH: Okay.
DR. PACKER: Let me try. All the examples
that I showed you that I characterized as wide
truly reflected estimates that had a high degree of
uncertainty associated with it. On the benefit
side, benefits that range from an 80 percent
reduction in risk on the high side to a 20 percent
reduction in risk--remember, and I guess I should
emphasize this and I guess Tom would
dramatically, the concept of how these curves
looked like in terms of the width is not
symmetrical on both sides of 1.0. The lowest you
can go below 1.0 is 0. So wide confidence
intervals below 1.0 can be 0.2 to 0.8. Those would
be wide confidence intervals. There is no limit
for estimates greater than 1.0, so you can have 1.0
to 24 on the adverse side of this. So you have to
sort of think about what is wide differently when
you are looking at estimates below 1.0 than when
you are looking at estimates above 1.0. Maybe that
would be helpful.
DR. CUSH: That does help. Secondly, you
have told us that when we are dealing with
low-numbers adverse events and that being very
imprecise and hard to make conclusions from, is it
even less valid or even greater error to, then,
take that data derived in one situation, like in an
Alzheimer's trial, and then try to generalize that
to the general population?
DR. PACKER: But we do that all the time.
There is a general sense that efficacy is
extrapolatable across diseases but safety that is
not disease-specific is extrapolatable.
Let me put it this way. If we didn't do
that, the problem that I put forward would be
really impossible, really impossible. So I
actually feel comfortable extrapolating safety data
across indications as long as the safety item is
DR. WOOD: Dr. Shafer?
DR. SHAFER: Thanks. That was actually a
very informative presentation and I can confirm the
distance from Washington to California.
There are really two questions here that I
think we need to bifurcate. One of them involves
the scientific question of getting at the truth,
whatever that is. I appreciate everything you say
and, prior to a drug being approved, at least
ideally, there would be adequate time and resources
to do exactly what you are proposing.
But there is a second question which is
how to inform clinical and regulatory decision
making based on imprecise information
approval because, in that setting, a daily decision
is being made by patients and their physicians as
to whether or not they need to take the drug.
One question about how to approach these
sorts of imprecise data when, in fact, a daily
decision is occurring, is can you take the
confidence bounds for both the risk and the benefit
and integrate those over the public-health hazard
and the public-health benefit to try to incorporate
the entire--both the point estimates but also the
uncertainty about them into the regulatory
DR. PACKER: Oh, wow. Just a couple of
comments. One, the precision of the estimates on
efficacy is almost always more precise, much more
precise, than the estimates on safety. So you have
this very precise estimate on efficacy. You have
this very imprecise estimate, in general, on
safety. And you try to sort of integrate them and
you have to now weigh them because it could be that
the efficacy thing you are looking at is really
important and the safety is sort of not
important. Or it could be other way around, the
efficacy is sort of very small--the efficacy is
small, but the safety is a big risk.
DR. SHAFER: That is exactly the question.
DR. PACKER: You might think that someone
in the world might be clever to create a
statistical model that would allow that to take
place. I am actually much more comfortable with
people doing that than statistical models doing
that. Somehow, people have the ability to
integrate all of this, especially a group of people
have an ability to integrate this, much better than
any mathematical model.
I would be very uncomfortable if someone
were actually to propose a mathematical model that
replaced the human, very important human, element
DR. WOOD: Dr. Farrar.
DR. FARRAR: Every example that I have
seen to date in looking at the risks in
overinterpreting data seem to go from being a
positive study to a negative study. I wonder about
the other way around and whether there are any
inherent differences in thinking about it the other
way around, the bottom line being that if you have
ten studies that show no safety issue with a
well-measured process, whether you can then say,
well, maybe the 11th study is going to show it
DR. PACKER: I think you need to find out
how much information there is in each study, how
easily or how appropriate it is to combine the data
across the studies to determine how precise the
estimates, after you have collected and integrated
all of the data, and put that into a judgement as
to how much data you actually need to be confident
about the precision of the estimate.
So there isn't a uniform way of thinking
about. It is not like you will know it when you
see it. There is some guidance, some mathematical
guidance, that needs to be incorporated into the
DR. WOOD: Dr. Domanski.
You know, I am not nearly
as sophisticated, really, Milton, as you are about
this sort of thing nor about some of the people in
the room, but I am a little bit concerned about
some of the examples. I will give you one. I
don't think ISIS 4 was a definitive trial of
magnesium, because I know something about that. We
did the MAGIC study which was a very large study.
Like ISIS 4, it was negative, but ISIS 4
was substantially different methodologically in
terms of when that was given. I think that example
actually, to be honest, is fairly misleading as a
result. I think it is an example of a stopped
clock is right twice a day. But, yeah; it came out
But I a worried if that is the basis for
this--that kind of thing is the basis for this
discussion across more of the landscape.
DR. PACKER: Let me emphasize, Mike, that
I knew that if I picked one study and gave you an
example of one st that I would be at great risk
because everyone knows something about these
studies more than what I know about these
although some of the studies I actually mentioned
were studies I was personally involved with and
think that I know a little more about them.
So I just wanted to--I would not
overemphasize--and, in fact, one might
appropriately underemphasize--the magnesium
example. But the other examples, time and time and
time and time again. It is just like reaching
conclusions during a very early part of a study
based on interim monitoring. When you have small
numbers of events, the estimates are very imprecise
and may not reflect what happens at the end of a
complete experiment. That is just a general
I take your point about ISIS 4 but the
number of examples here is just overwhelming.
DR. WOOD: It is important, Milton, to
remember, we have replication for two of these
drugs and these safety signals here. So it is not
just single studies.
DR. FURBERG: Milton, I think that was a
great presentation. I think, for balance, it would
be nice if you can have examples showing the other
side, how trends in smaller studies were confirmed
in definitive trials. And I know plenty of those.
DR. PACKER: Oh, yes.
DR. FURBERG: That was never discussed.
You are painting a dark picture saying you can't
trust smaller studies. You are right. You never
know where you are going to end up and you need to
be careful. But don't say that you can't rely on
DR. WOOD: I was actually on the advisory
committee that turned down Vesnarinone, that looked
at that study. There were lots of issues that came
up at that time that led us to do that. So it
wasn't just that there was a study that was
compelling and that people went with that.
DR. PACKER: Curt, let me just say that--I
think your point is very, very important. What I
have not done is shown many, many examples of
interim monitoring in trials where the
results were reflective of the endpoint. I have
not shown a whole host, probably more than I could
think of, of all of the pilot trials where the
initial trends encouraged someone to pursue it and
that the second study was, in fact, very
Let me just make my point clear. It is
just not as reliable as we think it is. It is not
that it is worthless. I do not want to say that.
If I have implied that, then I do not want to imply
that. I just want to say that the risk of error
early when you have small-number events is much,
much greater than when you have a much more precise
estimate at the end of the trial.
My plea here is that when you don't know,
the best thing you can do is say, "I don't know."
And that is my only plea.
DR. WOOD: Milt, when you have two trials
that replicate one another, with a p-value of less
than 0.05, if that was an efficacy endpoint, we
would approve on the basis of that; correct?
DR. PACKER: That's right.
DR. WOOD: But you are telling us that,
when it is a safety endpoint, we should not act on
I think it is counterintuitive.
DR. PACKER: No, no, no.
DR. WOOD: Hang on. That seems to me
counterintuitive. We have, for two of these drugs,
two randomized trials that replicate the outcome.
In three of the four trials, the outcome was
predefined, adjudicated and so on. That is about
as good as any drug that has been approved on the
U.S. market that I can think of.
DR. PACKER: Let me just add one
dimension, Alastair, to the thinking process and
that is that when you have a p less than 0.05 on
two trials, on the primary endpoint because it is
efficacy, you have two trials that were designed
for the endpoint and have fairly narrow confidence
intervals and precise estimates.
That is not the same concept as having a p
less than 0.05 on two imprecise estimates which are
DR. WOOD: No; I understand that very
well. I think we all do. The issue here is both
of the second trials--both of the second
trials--were designed to test the safety issue that
was in the first trial even though they were
efficacy studies. So it is not like they were just
two trials that fell on the ground from Mars that
arrived with something. These were designed, at
least according to the sponsors, to check for that
So I think you are overselling the point a
Let's move on. Dr. Jenkins?
DR. JENKINS: I found the presentation
very interesting and I wanted to probe a little bit
further on the APPROVe study because that is the
one that I think we were feeling very comfortable
with the finding in APPROVe. Yet, I went back to
Merck's presentation, and their prospective plan
was actually to combine three studies that were
going to be placebo versus rofecoxib in three
Their plan was to have 25,000
evaluate the cardiovascular signal. Now, in
APPROVe, presumably, they had stopping rules that
the Data Safety Monitoring Committee saw an extreme
effect that met those criteria so they stopped the
study. But I am just interested in hearing your
thoughts about how should we interpret APPROVe
where the stopping rule is met for an individual
study when the prespecified plan was to have three
studies combined for 25,000 patients.
DR. PACKER: Gee, I must say that I am
delighted to have everyone ask me the hard
questions for this afternoon. I sort of think that
this is what this committee has to do. I only
wanted to add a dimension to the thinking process
here. I don't come with any answers on how to put
all of the data together. All of the points on how
to synthesize these data, I am very comfortable
with the human process of doing so as long as the
human process incorporates an understanding of how
difficult and imprecise this is and the fact that,
in the past, although it has led to predictions
that came true, it also led to
predictions that did
not come true.
DR. JENKINS: I think, more specifically,
the point I was trying to get you to comment on is
not the overall interpretation of the rofecoxib
data but the fact that there was a plan for 25,000
patients in three studies. What I am trying to
understand is how should we, then, interpret a
finding from one of those three studies where an
interim analysis crossed the stopping boundary and
met the criteria for stopping the study. What
weight should we give to that finding in that
DR. PACKER: I don't think there is a
precise answer to that. Any time you deviate from
your preplanned attack on the conduct of analysis
of a trial, you weaken, to varying degrees, the
precision of the estimate and the confidence you
have in the data that you are looking at.
DR. WOOD: Dr. Nissen?
DR. NISSEN: Milt, there is an additional
subtlety here. Let me see if I can drill down with
you on it. What we have here is a class of drugs
where we have multiple trials within the class. So
what we are asked to do is not necessarily, in some
respects, for each individual drug, say, well, do
we have replication or not.
But if we take the position that this is a
class effect, then we have got four, or perhaps,
five trials. This came up once before. It was
kind of controversial. I think you may have been
on the committee at the time when we had the
angiotensin-receptor blockers for renal protection.
What the two companies did with two different drugs
is they stipulated that the other could use the
data from the other company's trials as supportive.
So the reason that this is really much
harder is that we have a lot of trials here. We
may not have reached all the evidence in an
individual drug, but we have trials across the
class of drugs. I wonder if you have any thoughts
about this because it is obviously a difference
between studying a single agent and studying a
class of agents.
DR. PACKER: I think that, Steve--I mean,
that is why the process works best when there are
human beings involved in the thinking process.
There is no predetermined sense that one should
bring to the process--that you confine the analysis
only to one drug. What you should allow yourself
to do is look at the data with one drug, look at
the data with drugs that you think are related.
If there are data that you think are in a
drug that really isn't related, you might want to
analyze that separately or do it both ways to see
if it is consistent. There is no statistical
formula that can guide the very important human
My major point is that the precision that
most clinical investigators think exists here isn't
as precise as we think it is. But that doesn't
mean that you--and Curt would emphasize this--that
doesn't mean that you can't put together your own
picture of the totality of the data and bring to it
a sense of whether it reaches some critical level
In the absence of precision,
you have got
to do that. But don't forget inherently that the
data are imprecise.
DR. WOOD: Curt, do you want to say
something else? No. Then let's move on. The next
speaker is Bob Temple who we are going to confine
to his seat.
DR. TEMPLE: Alastair, I have a question.
What am I supposed to do about my slides? Can
someone show them for me? I will delete many of
DR. WOOD: Okay. You can come up here if
you do it quickly.
DR. TEMPLE: I don't care where I'm from.
I really don't.
DR. WOOD: Then Kimberly will work the
slides for you.
DR. TEMPLE: Okay; if Kimberly will do
Issues in Projecting Increased Risk of
Cardiovascular Events to the Exposed Population
DR. TEMPLE: I was not in any way trying
to address the main issues the committee
grappling which is about what to do about these
drugs. But it seems to me you can't help noticing
that there is some data we would all like to have
that we don't have and that is what I was trying to
Obviously, the main thing we are worried
about is the effect of the COX-2-selective NSAIDs
on cardiovascular outcomes, notably death, stroke
and heart attack. But are particularly interested
in the single drug effects, whether they are all
the same. We are interested in whether we are
looking at true class effects of differences.
We also can't help noticing there is not a
lot of long-term data on the nonselective NSAIDs
and, of course, has been pointed repeatedly, some
of them are sort of selective anyway.
There is major interest in possible
differences in the subpopulations that might be a
different risks. I think there are mechanistic
considerations, how much of this is really likely
to be platelets and could there be a blood-pressure
The importance of that, to me, is that it
is not quite clear what to do about platelet
effects, but, conceivably, you could manage a
blood-pressure effect if that was a problem.
There is a lot of importance and interest
in the dose and dose interval. And it is important
to think about how long studies have to be to
detect these things. Obviously, some of trials
seem to have shown things in a matter of seven or
eight months. There is some suggestion that some
of the effects need much longer to detect.
Skip the next one.
With respect to cardiovascular effects,
the main question is whether everything is really
answered. You know, there are lots of studies, as
Alastair was pointing out. They are not perfectly
consistent, maybe, but there are a number of
studies with a number of drugs that seem to be
showing the same thing.
I guess, to me, they don't seem entirely
consistent. There are a number of possible reasons
for that. One is that there really are differences
between drugs, or at least between
is that even the best controlled studies sometimes
give different answers. Another is that small
effects are difficult to evaluate in epidemiologic
and even controlled studies. Then the last is that
the effects may be population-dependent. That has
So it does seem to me there is more to
learn. Skip the next. We all know that. Platelet
One of the things that seems important to
pin down and I don't think it has been pinned down
yet is the possibility that blood pressure is a
significant part of all this, that there is some
impression that Vioxx has bigger blood-pressure
effects than the other drugs, but I don't think
there is what we would call adequate data on the
effects of all these.
By adequate data, I mean data that gives
you information about the effect of drug over the
entire dosing interval, that has pinned down dose
response and that has pinned down the effect of
different dosing intervals. There is an
impression, though, that these drugs can reverse
the effect of other anti-hypertensives, perhaps,
especially, ones that work through the renal and
angiotensin system. They seem to have, at least
some of them, an effect on blood pressure generally
and then there are isolated reports of hypertension
in trials reported as adverse reactions, clearly
more common in the treated groups.
I have a bunch of slides showing that
elevated blood pressure is bad for you. You can
deduce that from epidemiologic effects, from a
mountain of clinical studies. The most recent
study that of interest, which I will not
describe--keep going--in detail is a study that
Steve Nissen knows about called CAMELOT which you
can read as saying that a change in blood pressure
of even 5 millimeters of mercury systolic and 3
diastolic might have a reduction of about
33 percent in the kinds of events we are talking
about in people whose diastolic pressure is only
That is not definitive. This is a subset
of the data and you can look at my slide to see
what I did.
As I said, we don't know as much about the
blood pressure as we should.
So a crucial question is in the larger
assessment of cardiovascular effects; what can we
really study more. My own view is that, given
VIGOR and fairly consistent epidemiologic findings,
it would be difficult to study 50 milligrams of
rofecoxib. I doubt you could write a proper
I take Milton's concern to heart but I
guess my own view is there is probably enough
information about that. But what you could with
respect to other things depends on what you
Suppose you believe that the
cardiovascular risk of 200, 400, of celecoxib is
not entirely clear. One polyp study says yes and
other studies are not so clear. And you believe,
also, that a class effect is uncertain or, more
particularly, that the effect might not
certain doses and certain dose intervals even if
you are inclined to believe that the class does
have a problem.
If you also believe that more needs to be
known about the long-term use of all NSAIDs,
including those that are nominally COX-2-selective
and those that are not, if you believe that new
COX-2-selective agents conceivably could be
developed with appropriate information, and if you
believe the pharmacology gives hypotheses that need
to be tested, not necessarily just believed--sorry
Garret--then here is what you might be able to do.
Again, I am not, in any way, saying who
should do this. This will be a massive
undertaking. But it does seem to me that there is
information we all collectively need as a
community. So I am calling it an ALLHAT study for
This is just one of what people could
dream up as what might be compared. The drugs, it
seems to me, one might think about putting in it
include ibuprofen, which we think
probably ought to
be neutral, not bad. It may not have the platelet
effects you want. Naproxen--I am embarrassed to
say this but I am letting myself be affected by the
epidemiology studies. Naproxen sort of looks good.
You might even say it is at least a placebo, but I
am not quite ready to say that.
Diclofenac seems a good model of a regular
NSAID that is really COX-2-selective, at least to a
degree. Celecoxib possibly at more than one dose,
although, maybe for caution, one would want to
think about the lower dose first. Then I have two
other groups that I will be interested in people's
comments on, and I am not totally sure you could
bring these off.
But could one include an aspirin full-dose
study. We know it is an effective agent in
arthritis accompanied by a proton pump inhibitor.
Now, you would have to first show that proton pump
inhibitors really do block the ulcerogenic effects
of aspirin. That is a short-term study and maybe
one could do that. So I will be interested in
whether people think you can bring that off.
The reason for doing it is we know the
effects of aspirin are not unfavorable and we think
they are probably favorable in at least
populations, in populations at high risk and
probably not unfavorable in people at low risk.
The last one that seems worth considering,
and my understanding is that, in many parts of the
world, at least osteoarthritis is treated this way,
to use acetaminophen plus codeine added as needed
and try to do something about the constipation.
That would be as close to a true placebo
group as I think you can get in a setting like
this. So it seems quite interesting.
It is worth saying if one had a new single
agent, my suggestion, and one still thought that
drugs like this should be developed, that the
single agent might be compared to naproxen and I
would still hope for one of the other last two
comparisons as a true placebo.
Obviously, these are all people who need
chronic pain medications. You would want O.A. and
R.A. stratified. I don't believe you could use the
APAP group for rheumatoid arthritis but others may
not agree with that. You probably want to study a
range of cardiovascular risks but you probably
would want to study the lower-risk people first.
The reason I say that is anyone with known
coronary-artery disease really has to be given
aspirin just because that is part of treatment and
it isn't clear yet, to me, how aspirin interacts
with the COX-2-selective drugs. You would think it
would make them unselective but the data don't seem
to necessarily say that.
A good question is how big the sample
would have to be and that depends on what you want
to find out. If you are really trying to compare
the drugs with a true placebo, they wouldn't have
to be that large to rule out, say, a two-fold risk
or something like that. We have seen studies with
about 1,000 per group that have distinguished
between drugs. So that is not so huge.
But if you really wanted to get at whether
one drug is a little bit different from another,
you are talking about studies of massive
have asked various numerically qualified people and
the general impression is that if you wanted to
rule out a 20 or 30 percent difference, you are
talking about 50,000 per group. That is beyond my
hopes even for ALLHAT 2.
Obviously, the outcomes of major interest
are cardiovascular death, stroke, AMI and bleeding.
I have heard some thoughts that maybe heart failure
should be looked at in addition but I wouldn't make
that the primary endpoint. I think you can look at
A big problem is what to do about blood
pressure. My first thought was that you would
monitor it and treat anything over 120 over 80, but
that really isn't standard practice. So a question
I would raise is whether one could leave people to
go to 130 over 90, would that be acceptable.
A question one could raise is why do this
at all? Do you really need these drugs? We have
heard fairly strong feelings that G.I. intolerance
is not trivial. But my answer is more that we
really don't know enough about the whole
these drugs. There is no question that people are
going to get something for their arthritis. I am
not entirely comfortable with looking at the data
and saying we know what we need to.
You could sort of deduce that naproxen
usually looks pretty good. It usually beats what
is there except we just heard about a study where
it was a little worse. But it is not clear where
ibuprofen comes. It doesn't show the same thing.
It seems to me there is a serious population need
to find out about these things and to understand
more whether all selectivity is the same.
We have been through diclofenac at length
and it is not clear what one needs. So I think the
idea of doing a large study has weight.
If you believe that it is really all
settled, that cardiovascular risk is clearly
increased with all of the COX-2-selective agents,
ignoring for now which ones are actually selective,
there still are things one might want to know.
It might be of interest to do a study that
still would have the ibuprofen and
and might still have my aspirin or APAP groups.
One might consider trying a celecoxib with the
addition of aspirin. I know the results of that
have not shown that any adverse effect seems to be
mitigated, but that still doesn't make much sense
and it might be something one could still want to
test. It would seem that if you added aspirin to a
selective agent, you ought to have a de facto
unselective agent. Of course, that presumes
mechanism and you shouldn't presume mechanism. You
should test it.
Anyway, those are my thoughts. I think my
main point is that there is really a very important
need for better information on the whole array of
these drugs and the kind of study needed to do that
is mind-boggling large. However, people are
already undertaking studies with 25,000 and 30,000
patients already. So it is not as outlandish as I
would have said it was before we started this
DR. WOOD: Okay.
I am just interested,
why didn't you suggest a PPI with naproxen? For
your ALLHAT study, why didn't you suggest a PPI
DR. TEMPLE: That is a fair question. I
think the answer on--what did I suggest it with?
DR. WOOD: With aspirin. It doesn't
DR. TEMPLE: I will tell you the reason.
Full-dose aspirin is just plainly impossible to use
because of massive G.I. intolerance. I believe,
historically based, it is worse than we expect with
naproxen. So I thought you had to do it there
urgently. You could do it with naproxen, too.
That would be okay.
I have to point out that we do not have
definitive labeling or evidence that those drugs
really do prevent this but we have heard about some
studies that suggest it. I do think that is an
early thing to discover.
DR. WOOD: Okay. Understood. Let's move
straight on to Bob O'Neill's presentation who also
is going to do it from his seat.
Issues in Projecting Increased Risk
of Cardiovascular Events to the Exposed Population
DR. O'NEILL: I won't go through the
slides. I might point your attention to a few of
them. I will try and do this in five or ten
DR. WOOD: Do you want us to have the
slides up, Bob?
DR. O'NEILL: What I was asked to do is
essentially provide a framework. This is a very
difficult problem of projecting risk to the
population. Very little has been published about
how to do this appropriate so I was intending to go
through sort of the logic and the framework of how
you might think about this.
It requires the integration of exposure
data at the national population level and it needs
information relative to how long people are on
drugs and it uses information from the clinical
trials as well as from the epidemiology studies to
the extent that they are relevant to the question
that is being asked.
This is a very difficult problem. It was
not intended to give any estimate, any single
number. It was intended to show how hard it is to
get there and, at the end of the day, how variable
and sensitive the estimate might be to all the
assumptions you have to make.
So I used the Vioxx VIGOR and APPROVe
studies as an example of the process that one might
go through. I made the point that event
definitions and many things matter. But I guess if
there is anything that I would like people to take
home is that time matters. The hazard rate
matters. And the hazard ratio matters as a
function of time when you do any of these
I would just recall two slides. One would
be the VIGOR study which is Slide 12 so that
everybody could remind themselves and Slide 16.
The VIGOR study shows a separation of curves.
Behind that is what is called a hazard rate. I
believe the data supports that the escalation of
the risk increases with duration of
Merck and we have talked about this in the past and
sort of have different views of this, but we seem
to feel that that risk does escalate.
That does not mean that there is no risk
in that picture early on. I think David Graham has
made this point that it may be a power issue but,
nonetheless, it is what it is and I am not
convinced that the epidemiological studies at this
stage add anything to our knowledge about early
risk for the points I made yesterday because I
think time zero matters in terms of looking at the
risk, in terms of how long you are on.
The next slide is Slide 16 which is the
APPROVe study. Similar pattern, only delayed a
year. So instead of the curve separating at
approximately six months, four months, they
separate a little later on. The idea here is that
the relative risks that are summary relative risks
for both of these trials, for VIGOR, for thrombotic
event, it is approximately 2.28 and, for APPROVe,
it is approximately 1.92 for confirmed thrombic
events is an average relative risk
all the time points so that the relative risk at
different times is a function of time.
That is an important concept when, then,
you go and you look at the national projection of
how many people are exposed for how long a period
of time. I won't go through that because they are
in the slides. But we have no data in the United
States to do this. So we did a projection based
upon the IMS National Prescription data, another
separate database that allowed us to look at how
long exposure, success of exposures, might be to
get an idea of how long individuals may stay on the
Surprisingly enough, a very small
percentage of the millions of people that are
prescribed the drug are on the drug for more than a
year. That is in one of the slides on the
Caremark. So what this meant is you multiply all
these estimates which, essentially, are time. We
calculated a time-specific difference in absolute
incidence rates for the different trials, made a
projection and essentially used in that
a number of assumptions many of which are not
verifiable, and then came up with some crude
estimate of what might even be an upper bound on a
confidence interval for any estimate.
We probably don't believe it because there
is no real methodology to support that estimate but
nonetheless to say that an estimate is very
So the bottom line, and the conclusions
here, given the time frame, is that purpose of the
projection effort was essentially just to
provide--this is the last slide; it is Slide 47--it
is essentially to provide a framework for
considering how you would think about developing an
estimate and to provide a range of estimates and,
also, essentially, to point out that there are many
limitations to any estimate that you would provide.
We are not supporting any, or putting
forward any, one estimate but I do believe that we
need to understand this problem by moving away from
summarizing nonproportional hazards in person
It is not a good idea. It begs
question as to whether the risk is constant or
whether the risk is dependent on time.
If there is one problem with the
epidemiological literature, it constantly reports
person-year risk as opposed to every one of the
clinical trials we have seen presents a
Kaplan-Meier curve that looks at the time-dependent
risk. Unless you understand that, you can't come
to grips with comparing one drug to another.
You can't come to grips with comparing a
drug to itself. If you look at the VIGOR study
relative to the approved study, they are in
different populations. One is in a population of
R.A. The other is in a polyp prevention trial.
One is at 50 milligrams. The other is at 25
There are many things that need to be
sorted out. So the point here is that this is a
very difficult exercise to project. This was just
a framework to say, here is how you might think
about it. Most of the estimates are fraught with a
lot of danger and have to have many
on them were you to bank on any one estimate alone.
That is pretty much my bottom line.
DR. WOOD: Bob, just to make sure
everybody in the audience understands what you are
talking about with estimates, what you are talking
about are absolute numbers of people--
DR. O'NEILL: An estimate of the absolute
numbers of individuals that might have been at risk
and had these events if they were exposed--if they
were exposed. This is a model projection.
DR. WOOD: Right. I just wanted to
clarify that. So it is not the relative risk. It
is not the same as what Milt was talking about.
DR. O'NEILL: Right. Exactly. This is a
long discussion to get into the concept of
attributable risk in its own right. Given the
time, I wouldn't be able to do that.
DR. WOOD: So you are talking about the
number of people, these sort of numbers that are
DR. O'NEILL: Right; to go through that
It is hard enough to interpret a single
study or a collection of studies. To go to an
estimate of what the increased number of events
might be at the exposed level is what this effort
was about, all the different, five different
separate interlinked but disparate databases that
you would need to get there to make this kind of an
DR. WOOD: Okay. Good. Thanks.
DR. WOOD: We will take a few minutes, a
very few minutes, for questions to the last two
speakers and then we will take a break and be back.
So the panel needs to remember that they are eating
into their break.
DR. NISSEN: Quickly, Bob, Bob Temple.
The difficulty, of course, in the ALLHAT study is
that it is very--it seems unlikely that it will get
done. So the question is, putting some constraints
on this, and I thought about this last night in
some detail into the wee hours of the morning, it
seems to me that what we really need for this class
of drugs is a reference standard. That reference
standard, unlike many studies, can't be placebo
because you can't treat arthritis patients with
So I would submit to you that, if you are
going to do comparisons, that the reference
standard, the best reference standard we have, is
naproxen because we know as much about it as
anything else. We think it is, at worst, neutral
and maybe a little better than neutral.
So I would argue that, if you want to do
ALLHAT light, then what you do is you test every
agent both that stay on the market and that are
proposed to bring onto the market against naproxen
with an adequately sized trial and you set an upper
bound, which we have to talk about, about what the
upper bound of hazard you are willing to accept is,
and the test that you run is on efficacy and on
If your drug is beaten by naproxen, you
don't make it. If you can show equivalence within
a reasonable upper bound of naproxen, then we would
be pretty comfortable--I think I would be
comfortable that the drug is not going to create a
What do you think about that strategy?
DR. TEMPLE: That is actually--I went
through it very fast, but that is actually what I
said at the bottom of one slide. I still would
like to know better whether the naproxen is less
bad or is really good. Therefore, as I said on the
slide, in my heart, I would like to see somebody
try to give full-dose aspirin for a while because
we are really pretty sure that won't be bad.
I think the community, in the long run,
needs that. Who is going to do it? That is a
perfectly good question. I do want to point out,
though, that the way some of the trials were done,
like TARGET, they could have given answers on some
of this, or at least closer. But, because they did
separate trials, instead of randomizing to each of
the treatments, that was obscured.
You could have had a very substantial
naproxen-ibuprofen comparison, but you didn't get
it because of the structure of the
trials. So I
think it is very important to randomize to each of
the treatments, obviously, whatever it is. But
that would be my best guess at the moment. But, in
line with what Alastair asked before, when you do
naproxen and you are looking at G.I. effects, do
you add a proton pump inhibitor? I think you need
a little more information before you do that, but
you might say that, which then raises the
fundamental question of how much help you get from
DR. WOOD: Dr. Cryer?
DR. CRYER: I wanted to comment on several
of the questions, Dr. Temple, that you raised as
well to ask a question. I guess I will just ask
the question first. When you say "full-dose
aspirin," are you referring to full
anti-inflammatory doses of aspirin, 3.9 grams a day
DR. TEMPLE: Which I assume most people
will not tolerate and there will be huge bleeding.
So you have got to do something.
DR. CRYER: Right.
See, I think that is a
non-practical experiment design and I think we have
come a long way from 3.9 grams of aspirin per day,
particularly because of the concerns of the adverse
events, the silicysm, the G.I. events. Clearly,
100 percent of those people are going to have
gastric ulcerations assessed endoscopically.
So I also would prefer one of the newer
NSAIDs, traditional NSAIDs, in that comparison.
With regard to--
DR. TEMPLE: Actually, before you leave
that, do you know what would happen if you added a
proton pump inhibitor to aspirin?
DR. CRYER: Not at 3.9 grams a day. I
don't think anybody thought that would be a
DR. TEMPLE: Short term, then, just to
look at endoscopic ulcers.
DR. CRYER: I don't know and I don't think
that it will ever be known.
DR. TEMPLE: Then I won't get the answer.
DR. CRYER: What I do know is that, if you
give 3.9 grams of aspirin per day in the
short-term, greater than 90 percent of your
patients who take aspirin will have endoscopic
ulceration. I don't know what the effect of the
PPI would be.
I wanted to address your last kind of
question that you threw out there of whether or not
a short-term study would show that celecoxib plus
80 milligrams of aspirin would have a favorable
effect, a G.I. effect, compared to a non-selective
NSAID. Those experiments have been done.
With respect to endoscopic ulcer, COX-2
plus aspirin equals traditional NSAID. With regard
to hospitalizations, having said that, there is a
recent study not yet published, epidemiologic study
from Canada, indicating that COX-2 plus aspirin,
hospitalizations for that are less than
hospitalizations for non-selective NSAIDs plus
aspirin. Then we have outcome studies not yet
fully published in the abstract form which indicate
that events on COX-2 plus aspirin are similar to
events on non-selective NSAID plus aspirin--G.I.
DR. TEMPLE: It is possible that if you
add aspirin--I mean, it is sort what I would
expect--is that you would get something
that is a
lot closer to being--in a cardiovascular sense, a
lot closer to being just a regular NSAID and maybe
you would still have some residual advantage in a
But, I must say, the data so far don't
show that. But they didn't seem definitive to me.
It raises the question of--you know, the
idea of COX-2 selectivity is, at least, in part, a
conceptual and promotional idea. As Garret pointed
out the first day, five or six of those old drugs
that aren't coxibs are COX-2-selective. So there
is a whole range. My feeling is we need to
understand the consequences of what all that means
and there is a somewhat artificial separation
between the coxibs and the others because those old
drug at least are partially selective and may have
some of the same properties.
So one of my hopes that we could look at a
range of these.
DR. CRYER: With respect to your last
comment, I am entirely in agreement with that.
DR. WOOD: Let's move on. Dr. Cush?
DR. CUSH: ALLHAT, I like the intention of
it. I would suggest, though, that if you are going
to have a study long enough to pick up
these events, a year or two, it is going to be
very, very hard to keep O.A. patients on one of
So maybe actually stratifying according to
pure COX-2-specific drugs to COX-2-selective drugs
to the non-selective drugs that are more
predominantly COX-1 and then having a totally
nonsteroidal, non-nonsteroidal group, which would
be the Tylenol group you talked to or other
analgesic agents might work over the long term.
DR. TEMPLE: That would answer a lot of
the questions. My real hope--you have a better
idea whether it is possible than I do--is that you
could actually find a population that could be
given what we are pretty sure is a
cardiovascular-neutral treatment. That is really
the only way to pin this down and it does seem
worth pinning down.
DR. WOOD: Dr. Hennekens?
DR. HENNEKENS: I think I gleaned from Dr.
O'Neill that if we determine there is a class
effect that it varies not just by drug and dose but
by duration of therapy. From Dr. Temple, the
comment that--I am very attracted to the concept of
what I would call a large simple trial rather than
an ALLHAT trial. I think there is merit in seeing
aspirin studied in therapeutic doses and I think
there is evidence that anti-inflammatory effects
are seen a doses far lower than the 3.9 grams.
But the question I have for Bob is there
are three currently marketed FDA-approved coxibs.
So would you include valdecoxib and 25 milligrams
of rofecoxib in your design?
DR. TEMPLE: Part of the reason I didn't
address that is I figured that is what the
committee is going to talk about. I was willing to
say that the celecoxib data look funny enough so
that you might consider it.
DR. WOOD: That is part of what we are
going to discuss.
DR. TEMPLE: That is what you are going to
discuss so I didn't address it.
DR. WOOD: Let's move that to later. Dr.
DR. DOMANSKI: I will pass.
DR. WOOD: Dr. Abramson?
DR. ABRAMSON: Thank you. I want to
probably say something rather naive in support of
the study, Bob, and that is that we are at a moment
where we can do a paradigm shift, meaning that
study that you propose is an important one but it
is very large and it is going to be very hard to
get any resources to do that.
I think we are at a moment where for the
companies and the FDA and the government to think
about a collaborative study where, if you have a
drug that has some--this information is important,
that we put together a collaboration among industry
to do a multi-arm study of multiple drugs. It is
something, you know, in the
the companies have supported largely this
osteoarthritis initiative through the NIH to look
at outcomes in large numbers of patients.
I think what we need is a similar COX-2
initiative where either with the FDA or the NIH
participating, with collaboration among industry,
we are doing a multi-armed large study with
biomarkers, with pharmacogenomics studies, with
genetics and other blood pressure, but try and do
it in a utopian way.
I think everyone here wants to get the
right answer, whether it is in industry or here at
the table. This could be a good opportunity to do
something very differently than we have done before
in a large trial.
DR. TEMPLE: I don't disagree at all. I
mean, some of the drugs are generic. They don't
have any company that is massively interested in
them. So it is going to be a mixture of
government, generosity and a wide variety of other
things that are scarce. So I don't know how
to--you noticed I didn't have a slide on
how to do
DR. WOOD: Dr. Ilowite?
DR. ILOWITE: Just a minor point. I
understand the need for a cardiovascular-neutral
anti-inflammatory drug in an ALLHAT study. But I
was a little confused because I am aware of some
literature directed at people who are interested in
Kawasaki disease suggesting that high-dose
anti-inflammatory aspirin is actually prothrombotic
because of differential effects on prostacycline
DR. TEMPLE: There are aspirin studies
going back to at least moderate doses that show
beneficial effects. It is not just 80 milligrams.
It is certainly at least a gram a day. Some of the
early ones were more than that. That is worth
thinking about. I am encouraged by the thought
that you might be able to get away with doses less
than 3 grams. So I didn't know that it was
considered prothrombotic. I thought aspirin always
looked good. But that is not up to grams. I don't
think any of the studies have done
DR. WOOD: We will give Dr. Fleming the
DR. FLEMING: I am just debating whether
to do it now or after the break.
DR. WOOD: Let me help you. Go ahead.
DR. FLEMING: Now?
DR. WOOD: After the break will be great.
DR. FLEMING: All right. I will wait.
DR. WOOD: We will take a break and then
we will be back here in ten minutes.
DR. WOOD: Okay, folks. Let's get
started. The next presentation will be given by
Sharon Hertz who is Deputy Director of the
DR. HERTZ: Thank you. I am just going to
spend a very few minutes summarizing some of our--
DR. WOOD: Let me, in fact, just before
Sharon begins--Sharon Hertz has passed out a
handout that includes a lot of her slides. In the
interest of time, she has graciously
delete some of these slides and just focus on a
smaller subset of what is in the handout.
However, the committee does have the
handout and the committee may find that handout
useful for referring to some of the data.
DR. HENNEKENS: Alastair, a quick comment.
I want to make a quick clarification on the earlier
comment about pro-inflammatory effects of high
doses of aspirin.
DR. WOOD: Sorry; I missed that. About
DR. HENNEKENS: In the randomized trials,
135 randomized trials with over 212,000 randomized
subjects, whether the doses of aspirin are 75
milligrams or up to 2 grams a day, there are
significant cardiovascular benefits to aspirin even
at high doses. The issue, as Bob pointed out, at
the high doses, is not that there is a reversal of
the benefit but that the side effects are
So I think that is an important point to
DR. ILOWITE: I just wanted to say that in
pediatrics, we think of anti-inflammatory doses as
100 milligrams per kilogram. So those are the
doses I was speaking of.
DR. GIBOFSKY: Finally, the high-dose
aspirin that would be necessary to treat patients
with rheumatoid arthritis of 3.9 grams or greater
would have significant problems on the stomach, as
Dr. Cryer said, significant problems on the hearing
of the patient and significant problems, perhaps,
on other organ systems as well. It is not a study
that could be easily undertaken.
DR. HENNEKENS: I won't debate the value
of the study of 3.9 grams of aspirin but, from the
perspective of anti-inflammatory effects, they have
been observed at doses of 2 grams of aspirin a day
and, in fact, there are randomized studies going on
directly comparing that somewhat higher doses of
maybe 1 to 1-and-a-half grams a day might have
significant anti-inflammatory as well as
anti-atherogenic effects as measured by endothelial
function, nitric oxide formation and
So I don't think that the traditionally
high doses are the ones that necessarily would need
to be done. But I don't want to debate whether we
should be studying doses of 4 grams of aspirin.
DR. WOOD: What you are telling us,
Charlie, is that you are comfortable that there is
an antithrombotic effect at the high doses of
aspirin. Is that right? Okay. Good.
Dr. Cush wants to say something.
DR. CUSH: Again, you need not
anti-inflammatory doses but analgesic doses which
can be substantially lower. I do want to make a
statement with regard to a study that wasn't
presented here that I think is germane and we
should know about it, and this is quick. There is
a very large trial that is NIH supported that is
called the GATE study, glucosamine in
osteoarthritis of the knee.
This is a 1588 study that is completed and
is currently being analyzed. That Data Safety
Monitoring Board of the study has
analyzed it for
cardiovascular risk because there is a Celebrex
arm. There are five arms in this 1500-patient
study; placebo, Celebrex 200 milligrams once a day,
glucosamine only, chondroitin sulfate only, and
glucosamine and chondroitin sulfate.
The outcome here, in a six-month trial, is
pain reduction in osteoarthritis in the knee.
Because of all this press and what not, they have
looked at the safety outcomes and they have not
shown any increase in cardiovascular events
including M.I., any difference between the Celebrex
group and the other four control groups.
DR. WOOD: Let's move on to the program.
Summary of Meeting Presentations
DR. HERTZ: There are now several versions
of my slides around and you are free to look at
whichever interests you. There is one correction
on the lumeracoxib slides from the original set
where I substituted the word diclofenac for
ibuprofen. So those of you looking at those slides
just be aware of that, please.
What I am really just going to do now is
just focus down again some of the reasons why we
This would not be the current slide set.
Any help here?
Looking at the most recent set that were
handed out, and we will just work from there
because there is not a lot of data anymore to
present, but, basically, I want to just point out
that we are here because we do recognize that pain
drugs are critically important, that the
COX-2-selective NSAIDs have been extensively
studied and there are, over time, studies that
revealed new potential uses as well as new risks.
We need to determine how we feel about
these risks. Are they limited to individual
products? Are they applicable across the group of
COX-2 selectives and how far does this extend to
the nonselective anti-inflammatories.
There is a slide that describes--
DR. WOOD: Sharon, apparently everybody
has hard copies of your slides.
DR. HERTZ: Right.
DR. WOOD: So if you want to just go
through them and refer to the slide number, that
would probably be helpful to people.
DR. HERTZ: Okay. If we go to the third
slide, you can get a sense of the sizes of the
databases that were presented in the
reviewer descriptions of FDA reviews.
A couple of points. The numbers there
reflect predominantly patients on the drug of
interest as opposed to the entire database. The
outcome studies are more reflective of the entire
populations including comparators. These drugs
were assessed and have been assessed over time in
fairly large numbers of patients.
I think it is useful to note that we have
not approved, in this country, all of the
COX-2-selective NSAIDs that have come to us in
applications for a variety of reasons. Some of
these may be related to cardiovascular-risk
assessment. Some may be related to
non-cardiovascular-risk assessment which we really
haven't gotten into in this setting.
In addition, you may also note that
parecoxib has not yet been approved in this country
although it has been approved elsewhere. So I
think that we have a lot of issues to consider with
When we reviewed the studies that have
been presented, we see that there is some increased
risk for cardiovascular events but one of the key
issues here is that the results are not
across studies and across situations. We also have
seen that there is risk that is being associated
with some of the nonselective products.
So we have a story of conflicting data. I
am up the Slide 5. We have data that has been
present across short- and long-term studies, the
epidemiologic studies. The challenge is to compare
across populations, across comparators. It is
striking that sometimes very similar study designs
have very different results.
It is possible there is more than one
mechanism. Again, the data has been inconsistent
with the NSAIDs. We also have conflicting
information coming back on what occurs in the
context of concurrent aspirin use. It is really
unclear if aspirin use has a truly meaningful
effect on whether there is any G.I. benefit of the
COX-2-selective products. That has not been clear
I have been asked to point out that, in
addition, time to onset of risk is something that
we need to consider very importantly, too, which,
again, is something that is evident when we look at
the study data and important in our deliberations
So, in spite of this conflicting data and
the many questions, we have to move forward. We
have to determine what the role of approved
products are on the market today, what additional
studies are necessary, what studies would be most
I am going to summarize and combine some
of the questions that we have posed. These are
questions we dearly would like input from the
committee. To start, if we think about the first
three questions, does the available data support a
conclusion that celecoxib, rofecoxib and valdecoxib
significantly increase the risk of cardiovascular
events. Does the overall risk-versus-benefit
profile for each of these support marketing in the
U.S. If yes, in whom? And which of the potential
benefits of celecoxib or the others outweigh the
potential risks and what actions would you
recommend that we consider implementing to ensure
I think it is also important to understand
that some of these answers are going to depend on
if we think that this is a fairly uniform class
effect and, if not, we are going to have weigh the
amount of information available for each of the
products. It is not the same. We don't have the
longer outcome studies, for instance, with
valdecoxib at this point.
Question 4 asks if the available data
support a conclusion that one or more of the
COX-2-selective agents increase the risk of
cardiovascular events and what is the
concomitant aspirin in attempting to mitigate that
risk. What additional clinical trials or
observational studies, if any, would you recommend
as essential for us to further evaluate celecoxib,
rofecoxib and valdecoxib?
What about to further evaluate the
potential G.I. benefits for these same products?
Would you recommend that the labeling for these
products include information regarding the absence
of long-term controlled clinical-trial data
assessing potential cardiovascular effects and if
you have a recommendation for how that should be
conveyed in terms of warnings, boxes and such.
What additional trials would be essential
to evaluate the nonselective nonsteroidal
anti-inflammatory drugs particularly with respect
to cardiovascular risk? Similarly, what will now
become essential for products under development
prior to approval to help gain approval?
We have to determine what studies would be
necessary to evaluate the cardiovascular risk of
these products and how much information
do we need
to know about the gastrointestinal risk? If
preapproval studies recommended as essential do not
demonstrate an increased risk for a cardiovascular
event, how would you propose the FDA handle that
information in the labeling? Would the absence of
a cardiovascular-risk signal preclude the need for
any warnings or precautions in the labeling of a
new product or should we rely more on a class
warning or precaution in the absence of a signal of
increased risk in the preapproval databases?
If you think a class warning is
appropriate, please advise with particular
attention to whether you recommend it apply to all
NSAIDs or only COX-2-selective NSAIDs.
So I want to thank everybody here for
their time and their commitment to helping us
through this extremely challenging program and we
really look forward to hearing your deliberations
and your recommendations.
DR. WOOD: Thank you very much.
The companies have also asked
minutes to respond. We all heard the rules
yesterday so it is two minutes. Microphone gets
turned off two minutes later and just keep moving.
DR. HARRIGAN: Could I have Slide No. 1.
This is Harrigan from Pfizer. What I would like to
do is first to summarize what we know about
celecoxib and what we think that tells us about the
benefit:risk equation for that drug.
I make the point in this slide about
Celebrex being extensively studies and to remind
the committee of the contrast of the very widely
used nonspecific NSAIDs. On the next point, we see
that efficacy has been demonstrated in arthritis
pain and familial adenomatous polyposis. Our
prescription data and observational study data tell
us that approximately three-quarters of patients
who are taking celecoxib are receiving daily doses
of 200 milligrams or less.
Celebrex does have a favorable G.I. safety
profile, a point emphasized by the very relevant
G.I. safety findings that we heard about
morning from ADAPT compared to over-the-counter
doses of naproxen.
Cardiovascular risk was not detected in
the setting of treating arthritis patients
understanding all the caveats about that data that
we have heard over the past two days. In APC, an
increase in cardiovascular risk was reported
apparently in a dose-related pattern. In contrast,
two additional long-term placebo-controlled trials
did not find evidence of increased cardiovascular
risk at daily doses of 400 milligrams.
The comment about the ADAPT findings is
supported by the initial announcements from
National Institute of Aging. We await that data
with great interest, particularly given the size,
the duration in the elderly population study which
would lead us to believe, expect, that the number
of events in that trial will exceed the number of
events in either or both of the other two trials
The final ADAPT data and the polyp
efficacy data will make significant