Comment on

Draft Guidance for Industry

Patient-Reported Outcome Measures:

Use in Medical Product Development to Support Labeling Claims

Docket No. 2006D-0044

 

 

Submitted by:

Arthur A. Stone, Ph.D.

Department of Psychiatry

Stony Brook University

and

invivodata, inc.

 

The FDA is commended for a well constructed Draft Guidance addressing the important issue of Patient-Reported Outcomes (PROs) in clinical studies.  The science of self-report that underlies PRO assessment is reflected in the Draft Guidance and your document will undoubtedly be instrumental for clinical researchers using PRO instruments in the evaluations of drugs, biologics, and devices as the basis for labeling claims and as primary end-points for approval.

 

In the same spirit of using the best science available to inform the use of PRO data in clinical trials, I wish to support and further substantiate the FDA’s decision to encourage the use of shorter recall periods in PRO assessment circumstances. Below, I outline a number of lines of research and provide several  references from the scientific literature that support and further strengthen this position. 

 

The FDA clearly and rightly considers the recall interval of PRO assessments as a vital component for evaluating PRO instruments and the accuracy of PRO data. This position is reflected in their Draft Guidance which states (Lines 328-343):

 

Sponsors should also evaluate the rationale and the appropriateness of the recall period for a PRO instrument. To this end, it is important to consider patients’ ability to accurately recall the information requested as proposed. The choice of recall period that is most suitable depends on the purpose and intended use of the instrument, the characteristics of the disease/condition, and the treatment to be tested. When evaluating PRO-based claims, the FDA intends to review the study protocol to determine what steps were taken to ensure that patients understand the appropriate recall period. If a patient diary or some other form of unsupervised data entry is used, the FDA plans to review the protocol to determine what measures are taken to ensure that patients make entries according to the study design and not, for example, just before a clinic visit when their reports will be collected.

 

PRO instruments that require patients to rely on memory, especially if they must recall over a period of time, or to average their response over a period of time may threaten the accuracy of the PRO data. It is usually better to construct items that ask patients to describe their current state than to ask them to compare their current state with an earlier period or to attempt to average their experiences over a period of time.

 

I would like to support these statements and the FDA’s preliminary guidance on the reporting period used in PRO assessments[SS1] . In general, shorter reporting periods are preferred over longer ones. There are several lines of research that support this position.

 

First, cognitive scientists have discovered a tremendous amount about the nature of memory, including how information is stored and how it is retrieved. One overwhelming conclusion form this research is that the recall of experiences is both limited and selective (Bradburn, Rips & Shevell, 1987; Schwarz & Sudman, 1994). An implication of this fact for choosing recall assessments is that patients do not actually have access to an unbiased memory of experiences, which some recall assessments appear to assume. Relying on longer durations of recall may be fruitless because the information is simply no longer available to the individual (Schwarz & Sudman, 1994). A further complication for the recall process is evidence that the circumstances under which the recall task is conducted (e.g., symptoms, mood at time of recall) can affect the accessibility of information, and thus what is recalled and reported (Kihlstrom, Eich, Sandbrand & Tobias, 2000). There is, then, considerable evidence that memories are distorted when recall periods are long. The definition of “long” is relative and relates to the content of information considered; “long” may be hours for the recall of pain intensity or it may be months for very salient, discrete health experiences.

 

Second, studies have also shown that when respondents are asked to summarize information (e.g., reporting an average level of symptom intensity, or an overall frequency), the process is affected by several cognitive heuristics, including the peak-end heuristic (Redelmeier & Kahneman, 1996; Redelmeier, Katz & Kahneman, 2003), the salience heuristic (Stone, Schwartz, Broderick & Shiffman, 2005), duration neglect (Fredrickson & Kahneman, 1993), and the accessibility heuristic (Tversky & Kahneman, 1973). These heuristics are cognitive shortcuts that we unconsciously use to recall or summarize our past experience.  Each of these heuristics indicates that particular aspects or components of experience are highlighted and exercise undue influence when experiences are summarized over time. Examples of experiences that are given undue weight in summaries are ones that are very intense, those that are easily accessible (Robinson & Clore, 2002; Tversky & Kahneman, 1973), and experiences that are proximal or even concurrent with testing (Eich, Reeves, Jaeger & Graff-Radford, 1985). Unfortunately, we do not know how consistently people employ these heuristics over time or if the heuristics apply to all members of a group reporting a particular type of experience. This leads to the possibility that the information yielded by recall PROs represents one aspect of experience for some individuals and other aspects of the experience for other people. Blurring of the constructs measured by a PRO instrument could lead to conceptual confusion and uninterpretable or misinterpreted data.

 

Third, cognitive interviewing of chronic pain patients who have completed recall PRO questions (DeMaio & Rothgeb, 1996) indicates, at least for pain assessment, that, contrary to a common view of recall, they do not systematically review the reporting period and create a summary adequately representing the entire period. Instead, particular instances of pain or pain-related impairment appear to dominate such ratings (Broderick, Stone, Calvanese, Schwartz & Turk, 2006). Cognitive interview data, then, supports the conceptual view outlined in the prior points and the concern about long recall periods.

 

Fourth, direct comparisons of data from real-time data capture studies (multiple within-day diaries) have shown (again, for pain) that only about 50% to 60% of the variance is shared between the average of the real-time reports and recall over the same period of time (Stone, Broderick, Shiffman & Schwartz, 2004). This means that there is a considerable “signal” that weekly recall captures, although a substantial proportion of this association may be attributed to an individual’s level of current pain, which is readily accessible. (i.e., reporting a “read-out” of current pain can sometimes be a reasonable a proxy for pain over the past week)  However, these data also mean that there is considerable error, or “noise,” when recall measures are intended to provide an average of an experience over a period of time. The workings of memory, limitations of recall, and cognitive heuristics are likely explanations for the differences between real-time and recall assessments. Many individuals who score high on real-time measures are likely to score considerably lower on recall and visa versa.

 

Fifth, cognitive processes may have particular relevance for recalled PRO assessments used to evaluate treatment efficacy in clinical trials. A new treatment could, for example, reduce the proportion of time patients are symptomatic relative to the effects of placebo. If the duration neglect heuristic described above affected retrospective PRO outcomes, this important effect might not be detected. The unfortunate result in this case is that an effective treatment would be missed. Another example is derived from concept that long recall periods invoke responses that are based more on beliefs than on actual experience (Ross, 1989).  Consistent with comments in the Draft Guidance (Lines 717-733), this suggests that even a small degree of unblinding in clinical trials may lead to biased recall assessments, because of self-reported outcomes would reflect beliefs about treatment efficacy and not actual symptom change. This would result in the erroneous conclusion that a treatment was effective. These and similar effects due to cognitive processing of recall information have the potential to systematically bias the results of clinical trials.

 

In summary, current science supports the FDA Draft Guidance that PROs should use relatively short reporting periods and avoid relying on recall.

 

References

 

Bradburn, N., Rips, L., & Shevell, S. (1987). Answering autobiographical questions: The impact of memory and inference on surveys. Science, 236, 157-161.

Broderick, J., Stone, A., Calvanese, P., Schwartz, J., & Turk, D. (2006). Recalled pain ratings: A complex and poorly defined task. Journal of Pain, 7, 142-149.

DeMaio, T., & Rothgeb, J. (1996). Cognitive interviewing techniques: In the lab and in the field. In N. Schwarz & S. Sudman (Eds.), Answering questions: Methodology for determining cognitive and communicative processes in survey research (pp. 177-195). San Francisco, CA: Jossey-Bass.

Eich, E., Reeves, J., Jaeger, B., & Graff-Radford, S. (1985). Memory for pain: Relation between past and present pain intensity. Pain, 223, 375-379.

Fredrickson, B., & Kahneman, D. (1993). Duration neglect in retrospective evaluations of affective episodes. Journal of Personality and Social Psychology, 65, 45-55.

Kihlstrom, J., Eich, E., Sandbrand, D., & Tobias, B. (2000). Emotion and memory: Implications for self-report. In A. Stone, J. Turkkan, C. Bachrach, J. Jobe, H. Kurtzman & V. Cain (Eds.), The science of self-report: Implication for research and practice (pp. 81-99). Mahwah, NJ: Erlbaum.

Redelmeier, D., & Kahneman, D. (1996). Patients' memories of pain medical treatments: Real-time and retrospective evaluations of two minimally invasive procedures. Pain, 66, 3-8.

Redelmeier, D., Katz, J., & Kahneman, D. (2003). Memories of colonoscopy: A randomized trial. Pain, 104, 187-194.

Robinson, M., & Clore, G. (2002). Belief and feeling: Evidence for an accessibility model of emotional self-report. Psychological Bulletin, 128, 934-960.

Ross, M. (1989). Relation of implicit theories to the construction of personal histories. Psychological Review, 96, 341-357.

Schwarz, N., & Oyserman, D. (2001). Asking questions about behavior: Cognition, communication and questionnaire construction. American Journal of Evaluation, 22, 127-160.

Schwarz, N., & Sudman, S. (1994). Autobiographical memory and the validity of retrospective reports. New York: Springer-Verlag.

Stone, A., Shiffman, S., Atienza, A., & Nebling, L. (In press). The science of real-time data capture. New York: Oxford University.

Stone, A., Broderick, J., Shiffman, S., & Schwartz, J. (2004). Understanding recall of weekly pain from a momentary assessment perspective: Absolute accuracy, between- and within-person consistency, and judged change in weekly pain. Pain, 107, 61-69.

Stone, A., Schwartz, J., Broderick, J., & Shiffman, S. (2005). Variability of momentary pain predicts recall of weekly pain: A consequence of the peak (or salience) memory heuristic. Personality and Social Psychology Bulletin, 31, 1340-1346.

Tversky, A., & Kahneman, D. (1973). Availability: A heuristic for judging frequency and probability. Cognitive Psychology, 5, 207-232.

 

 


 [SS1]Quote  specific FDA comment and reference by line number