Comment on
Draft Guidance for
Industry
Patient-Reported Outcome
Measures:
Use in Medical Product
Development to Support Labeling Claims
Docket No.
2006D-0044
Submitted by:
Arthur A. Stone, Ph.D.
Department of Psychiatry
Stony Brook University
and
invivodata, inc.
The FDA is commended for a
well constructed Draft Guidance addressing the important issue of
Patient-Reported Outcomes (PROs) in clinical studies. The science of self-report that underlies PRO
assessment is reflected in the Draft Guidance and your document will
undoubtedly be instrumental for clinical researchers using PRO instruments in
the evaluations of drugs, biologics, and devices as the basis for labeling
claims and as primary end-points for approval.
In the same spirit of using
the best science available to inform the use of PRO data in clinical trials, I
wish to support and further substantiate the FDA’s decision to encourage the
use of shorter recall periods in PRO assessment circumstances. Below, I outline
a number of lines of research and provide several references from the scientific literature
that support and further strengthen this position.
The FDA clearly and rightly
considers the recall interval of PRO assessments as a vital component for
evaluating PRO instruments and the accuracy of PRO data. This position is
reflected in their Draft Guidance which states (Lines 328-343):
Sponsors should also evaluate the rationale and the
appropriateness of the recall period for a PRO instrument. To this end, it is
important to consider patients’ ability to accurately recall the information
requested as proposed. The choice of recall period that is most suitable
depends on the purpose and intended use of the instrument, the characteristics
of the disease/condition, and the treatment to be tested. When evaluating
PRO-based claims, the FDA intends to review the study protocol to determine
what steps were taken to ensure that patients understand the appropriate recall
period. If a patient diary or some other form of unsupervised data entry is
used, the FDA plans to review the protocol to determine what measures are taken
to ensure that patients make entries according to the study design and not, for
example, just before a clinic visit when their reports will be collected.
PRO instruments
that require patients to rely on memory, especially if they must recall over a
period of time, or to average their response over a period of time may threaten
the accuracy of the PRO data. It is usually better to construct items that ask
patients to describe their current state than to ask them to compare their
current state with an earlier period or to attempt to average their experiences
over a period of time.
I would like to support these statements and the FDA’s
preliminary guidance on the reporting period used in PRO assessments[SS1] . In general, shorter reporting periods are preferred
over longer ones. There are several lines of research that support this
position.
First, cognitive scientists have
discovered a tremendous amount about the nature of memory, including how
information is stored and how it is retrieved. One overwhelming conclusion form
this research is that the recall of experiences is both limited and selective (Bradburn,
Rips & Shevell, 1987; Schwarz & Sudman, 1994). An implication of this fact
for choosing recall assessments is that patients do not actually have access to
an unbiased memory of experiences, which some recall assessments appear to
assume. Relying on longer durations of recall may be fruitless because the
information is simply no longer available to the individual (Schwarz &
Sudman, 1994). A further complication for the recall process is evidence that
the circumstances under which the recall task is conducted (e.g., symptoms,
mood at time of recall) can affect the accessibility of information, and thus
what is recalled and reported (Kihlstrom, Eich, Sandbrand & Tobias, 2000). There
is, then, considerable evidence that memories are distorted when recall periods
are long. The definition of “long” is relative and relates to the content of
information considered; “long” may be hours for the recall of pain intensity or
it may be months for very salient, discrete health experiences.
Second, studies have also shown
that when respondents are asked to summarize information (e.g., reporting an
average level of symptom intensity, or an overall frequency), the process is
affected by several cognitive heuristics, including the peak-end heuristic (Redelmeier
& Kahneman, 1996; Redelmeier, Katz & Kahneman, 2003), the salience
heuristic (Stone, Schwartz, Broderick & Shiffman, 2005), duration neglect (Fredrickson
& Kahneman, 1993), and the accessibility heuristic (Tversky & Kahneman,
1973). These heuristics are cognitive shortcuts that we unconsciously use to
recall or summarize our past experience.
Each of these heuristics indicates that particular aspects or components
of experience are highlighted and exercise undue influence when experiences are
summarized over time. Examples of experiences that are given undue weight in
summaries are ones that are very intense, those that are easily accessible (Robinson
& Clore, 2002; Tversky & Kahneman, 1973), and experiences that are
proximal or even concurrent with testing (Eich, Reeves, Jaeger &
Graff-Radford, 1985). Unfortunately, we do not know how consistently people employ
these heuristics over time or if the heuristics apply to all members of a group
reporting a particular type of experience. This leads to the possibility that
the information yielded by recall PROs represents one aspect of experience for
some individuals and other aspects of the experience for other people. Blurring
of the constructs measured by a PRO instrument could lead to conceptual
confusion and uninterpretable or misinterpreted data.
Third, cognitive interviewing
of chronic pain patients who have completed recall PRO questions (DeMaio &
Rothgeb, 1996) indicates, at least for pain assessment, that, contrary to a
common view of recall, they do not systematically review the reporting period
and create a summary adequately representing the entire period. Instead,
particular instances of pain or pain-related impairment appear to dominate such
ratings (Broderick, Stone, Calvanese, Schwartz & Turk, 2006). Cognitive
interview data, then, supports the conceptual view outlined in the prior points
and the concern about long recall periods.
Fourth, direct comparisons of
data from real-time data capture studies (multiple within-day diaries) have
shown (again, for pain) that only about 50% to 60% of the variance is shared
between the average of the real-time reports and recall over the same period of
time (Stone, Broderick, Shiffman & Schwartz, 2004). This means that there
is a considerable “signal” that weekly recall captures, although a substantial
proportion of this association may be attributed to an individual’s level of
current pain, which is readily accessible. (i.e., reporting a “read-out” of
current pain can sometimes be a reasonable a proxy for pain over the past
week) However, these data also mean that
there is considerable error, or “noise,” when recall measures are intended to
provide an average of an experience over a period of time. The workings of
memory, limitations of recall, and cognitive heuristics are likely explanations
for the differences between real-time and recall assessments. Many individuals
who score high on real-time measures are likely to score considerably lower on
recall and visa versa.
Fifth, cognitive processes
may have particular relevance for recalled PRO assessments used to evaluate
treatment efficacy in clinical trials. A new treatment could, for example,
reduce the proportion of time patients are symptomatic relative to the effects
of placebo. If the duration neglect heuristic described above affected
retrospective PRO outcomes, this important effect might not be detected. The
unfortunate result in this case is that an effective treatment would be missed.
Another example is derived from concept that long recall periods invoke
responses that are based more on beliefs than on actual experience (Ross, 1989). Consistent with comments in the Draft
Guidance (Lines 717-733), this suggests that even a small degree of unblinding in
clinical trials may lead to biased recall assessments, because of self-reported
outcomes would reflect beliefs about
treatment efficacy and not actual symptom change. This would result in the
erroneous conclusion that a treatment was effective. These and similar effects
due to cognitive processing of recall information have the potential to
systematically bias the results of clinical trials.
In summary, current science
supports the FDA Draft Guidance that PROs should use relatively short reporting
periods and avoid relying on recall.
References
Bradburn, N., Rips, L., & Shevell,
S. (1987). Answering autobiographical questions: The impact of memory and
inference on surveys. Science, 236, 157-161.
Broderick, J., Stone, A., Calvanese,
P., Schwartz, J., & Turk, D. (2006). Recalled pain ratings: A complex and
poorly defined task. Journal of Pain, 7, 142-149.
DeMaio, T., & Rothgeb, J. (1996).
Cognitive interviewing techniques: In the lab and in the field. In N. Schwarz
& S. Sudman (Eds.), Answering questions: Methodology for determining
cognitive and communicative processes in survey research (pp. 177-195).
Eich, E., Reeves, J., Jaeger, B., &
Graff-Radford, S. (1985). Memory for pain: Relation between past and present
pain intensity. Pain, 223, 375-379.
Fredrickson, B., & Kahneman, D.
(1993). Duration neglect in retrospective evaluations of affective episodes. Journal
of Personality and Social Psychology, 65, 45-55.
Kihlstrom, J., Eich, E., Sandbrand, D.,
& Tobias, B. (2000). Emotion and memory: Implications for self-report. In
A. Stone, J. Turkkan, C. Bachrach, J. Jobe, H. Kurtzman & V. Cain (Eds.), The
science of self-report: Implication for research and practice (pp. 81-99).
Redelmeier, D., & Kahneman, D.
(1996). Patients' memories of pain medical treatments: Real-time and
retrospective evaluations of two minimally invasive procedures. Pain, 66,
3-8.
Redelmeier, D., Katz, J., &
Kahneman, D. (2003). Memories of colonoscopy: A randomized trial. Pain, 104,
187-194.
Robinson, M., & Clore, G. (2002).
Belief and feeling: Evidence for an accessibility model of emotional
self-report. Psychological Bulletin, 128, 934-960.
Ross, M. (1989). Relation of implicit
theories to the construction of personal histories. Psychological Review,
96, 341-357.
Schwarz, N., & Oyserman, D. (2001).
Asking questions about behavior: Cognition, communication and questionnaire
construction. American Journal of Evaluation, 22, 127-160.
Schwarz, N., & Sudman, S. (1994). Autobiographical
memory and the validity of retrospective reports.
Stone, A., Shiffman, S., Atienza, A.,
& Nebling, L. (In press). The science of real-time data capture.
Stone, A., Broderick, J., Shiffman, S.,
& Schwartz, J. (2004). Understanding recall of weekly pain from a momentary
assessment perspective: Absolute accuracy, between- and within-person
consistency, and judged change in weekly pain. Pain, 107, 61-69.
Stone, A., Schwartz, J., Broderick, J.,
& Shiffman, S. (2005). Variability of momentary pain predicts recall of
weekly pain: A consequence of the peak (or salience) memory heuristic. Personality
and Social Psychology Bulletin, 31, 1340-1346.
Tversky, A., & Kahneman, D. (1973).
Availability: A heuristic for judging frequency and probability. Cognitive
Psychology, 5, 207-232.
[SS1]Quote specific FDA comment and reference by line number