-

Medical Devices

  • Share Share this page

Draft Guidance for Industry and FDA Staff: Clinical Performance Assessment: Considerations for Computer-Assisted Detection Devices Applied to Radiology Images and Radiology Device Data - Premarket Approval (PMA) and Premarket Notification [510(k)] Submissions

PDF Printer VersionDRAFT GUIDANCE

This guidance document is being distributed for comment purposes only.
Document issued on: October 21, 2009

Comments and suggestions regarding this draft document should be submitted within 90 days of publication in the Federal Register of the notice announcing the availability of the draft guidance. Submit written comments to the Division of Dockets Management (HFA-305), Food and Drug Administration, 5630 Fishers Lane, rm. 1061, Rockville, MD 20852. Alternatively, electronic comments may be submitted to http://www.regulations.gov. All comments should be identified with the docket number listed in the notice of availability that publishes in the Federal Register.

For questions regarding this draft guidance document contact Nicholas Petrick (OSEL) at 301-796-2563, or by e-mail at Nicholas.Petrick@fda.hhs.gov; or Joyce Whang (ODE) at 301-796-6516, or by e-mail at Joyce.Whang@fda.hhs.gov.

CDRH Logo 

U.S. Department of Health and Human Services
Food and Drug Administration
Center for Devices and Radiological Health
Division of Imaging and Applied Mathematics
Office of Science and Engineering Laboratories
Radiological Devices Branch
Division of Reproductive, Abdominal, and Radiological Devices
Office of Device Evaluation

Preface

Additional Copies

Additional copies are available from the Internet. You may also send an e-mail request to dsmica@fda.hhs.gov to receive an electronic copy of the guidance or send a fax request to 301-847-8149 to receive a hard copy. Please use the document number (1698) to identify the guidance you are requesting.


Table of Contents

  1. INTRODUCTION
    • The Least Burdensome Approach
  2.  BACKGROUND
  3. RATIONALE
  4. CLINICAL STUDY DESIGN
    • Evaluation Paradigm and Study Endpoints
    • Control Arm
    • Reading Scenarios and Randomization
    • Rating Scale
    • Scoring
  5. Training of Study Participants
  6. STUDY POPULATION
    • Data Poolability
  7.  REFERENCE STANDARD
  8.  REPORTING
  9.  POSTMARKET PLANNING FOR PMAS

Draft Guidance for Industry and FDA Staff: Clinical Performance Assessment: Considerations for Computer-Assisted Detection Devices Applied to Radiology Images and Radiology Device Data - Premarket Approval (PMA) and Premarket Notification [510(k)] Submissions

This draft guidance, when finalized, will represent the Food and Drug Administration's (FDA's) current thinking on this topic. It does not create or confer any rights for or on any person and does not operate to bind FDA or the public. You can use an alternative approach if the approach satisfies the requirements of the applicable statutes and regulations. If you want to discuss an alternative approach, contact the FDA staff responsible for implementing this guidance. If you cannot identify the appropriate FDA staff, call the appropriate number listed on the title page of this guidance.

1. Introduction

This draft guidance document provides recommendations to industry, systems and service providers, consultants, FDA staff, and others regarding clinical performance assessment of computer-assisted detection (CADe1) devices applied to radiology images and radiology device data (often referred to as “radiological data” in this document). CADe devices are computerized systems that incorporate pattern recognition and data analysis capabilities (i.e., combine values, measurements, or features extracted from the patient radiological data) intended to identify, mark, highlight, or in any other manner direct attention to portions of an image, or aspects of radiology device data, that may reveal abnormalities during interpretation of patient radiology images or patient radiology device data by the intended user (i.e., a physician or other health care professional), referred to as the “clinician” in this document. In drafting this document, we considered the recommendations on documentation and performance testing for CADe devices made during the public meeting of the Radiological Devices Advisory Panel on March 4-5, 2008.2 This draft guidance is issued for comment purposes only.

FDA's guidance documents, including this guidance, do not establish legally enforceable responsibilities. Instead, guidances describe the Agency's current thinking on a topic and should be viewed only as recommendations, unless specific regulatory or statutory requirements are cited. The use of the word should in Agency guidances means that something is suggested or recommended, but not required.

The Least Burdensome Approach

This draft guidance document reflects our careful review of what we believe are the relevant issues related to clinical performance studies for CADe devices applied to radiological data and what we believe would be the least burdensome way of addressing these issues. If you have comments on whether there is a less burdensome approach, however, please submit your comments as indicated on the cover of this document.

2. Scope

This document provides guidance regarding clinical performance assessment studies for CADe devices applied to radiology images and radiology device data. Radiological data include those that are produced during patient examination with ultrasound, radiography, magnetic resonance imaging (MRI), computed tomography (CT), positron emission tomography (PET), etc.3 As stated above, CADe devices are computerized systems intended to identify, mark, highlight, or in any other manner direct attention to portions of an image, or aspects of radiology device data, that may reveal abnormalities during interpretation of patient radiology images or patient radiology device data, by the clinician.

By design, a CADe device can be a unique detection scheme specific to only one type of potential abnormality, or a combination or bundle of multiple parallel detection schemes, each one specifically designed to detect one type of potential abnormality revealed in the patient radiological data. Examples of CADe devices that fall within the scope of this draft guidance include:

  • a CADe algorithm designed to identify and prompt microcalcification clusters and masses on digital mammograms,
  • a CADe device designed to identify and prompt colonic polyps on CT colonography studies,
  • a CADe designed to identify and prompt filling defects on thoracic CT examination and,
  • a CADe designed to identify and prompt brain lesions on head MRI studies.

This draft guidance does not cover clinical performance assessment studies for CADe devices that are intended for use during intra-operative procedures or for computer-assisted diagnostic devices (CADx) and computer-triage devices, whether marketed as unique devices or bundled with a CADe device that, by itself, may be subject to this draft guidance. Below is further explanation of the CADx and computer-triage devices not covered by this draft guidance:

  • CADx devices are computerized systems intended to provide information beyond identifying, marking, highlighting, or in any other manner directing attention to portions of an image, or aspects of radiology data, that may reveal abnormalities during interpretation of patient radiology images or patient radiology device data by the clinician. CADx devices include those devices intended to provide an assessment of disease or other conditions in terms of the likelihood of the presence or absence of disease, or devices intended to specify disease type (i.e., specific diagnosis or differential diagnosis), severity, stage, or intervention recommended. An example of such a device would be a computer algorithm designed both to identify and prompt potential microcalcification clusters and masses on digital mammograms and also to provide a probability score to the clinician for each potential lesion as additional information.
  • Computer-triage devices are computerized systems intended to in any way reduce or eliminate any aspect of clinical care currently provided by a clinician, such as a device for which the output indicates that a subset of patients (i.e., one or more patients in the target population) are normal and therefore do not require interpretation of their radiological data by a clinician. An example of this device is a prescreening computer scheme that identifies patients with normal MRI scans that do not require any review or diagnostic interpretation by a clinician.

For any of these types of devices, we recommend that you contact the Agency to inquire about regulatory pathways, regulatory requirements, and recommendations about nonclinical and clinical data.

3. Rationale

This draft guidance makes recommendations as to how you should design and conduct your clinical performance assessment studies (i.e., well-controlled clinical investigations) for your CADe device. These studies may be part of your premarket submission to FDA.4 The recommendations in this document are meant to guide you as you develop and test your CADe device; they are not meant to specify the full content or type of premarket submission that may be applicable to your device.5 If you would like the Agency's advice about the classification and the regulatory requirements that may be applicable to your device, you may submit a request under Section 513(g) of the Federal Food, Drug, and Cosmetic Act (the Act).6

Regardless of the type of premarket submission you are required to submit for your device, we recommend that you request the Agency’s review of your protocols prior to initiating your standalone performance assessment and clinical performance assessment studies for your CADe device. To request the Agency’s review of your protocols, you may submit a pre-submission to the Agency.

4. Clinical Study Design

The clinical performance assessment of a CADe device is intended to demonstrate the clinical safety and effectiveness of your device for its intended use, when used by the intended user and in accordance with its proposed labeling and instructions.

As described above in the scope, a CADe device, by design, is intended to identify data that may reveal abnormalities during interpretation of patient images or data by the clinician. There is a complex relationship between the CADe output and the clinician such that clinical performance may depend on a variety of factors that should be considered in any study design including:

  • timing of CADe application in the interpretive process;
  • physical characteristics of the CADe mark, i.e., size and shape, type of boundary (e.g., solid, dashed, circle, isocontour), and proximity of the CADe mark to the abnormality;
  • user’s knowledge of the type of abnormalities that the CADe is designed to mark; and
  • number of CADe marks.

Your clinical performance assessment should be well-controlled especially if performed in a laboratory setting (i.e., off site of the clinical arena) to preclude or limit various biases that might impact conclusions on the device safety or effectiveness. Some various types of study designs that may be utilized to assess your CADe device include:

  • A field test or prospective reader study (e.g., randomized controlled trial) that evaluates a device in actual clinical conditions. A field test may not be practical in situations, for example, where there is very low disease prevalence that may necessitate enrollment of an excessively large number of patients.
  • A retrospective reader study consisting of a retrospective case collection enriched with diseased/abnormal cases is a possible surrogate for a field test.
  • A stress test is another option for the clinical performance assessment of some CADe devices. A stress test is a retrospective study enriched with patient cases that contain more challenging imaging findings (or other image data) than normally seen in routine clinical practice but that still fall within the device’s intended use population (see Section 5. Study Population). Note that the use of sample enrichment will likely alter reader performance in the trial compared with clinical practice because of the differences in disease prevalence (and case difficulty for stress testing) between the trial and clinical practice.

The clinical performance assessment of CADe devices is typically performed by utilizing a multiple reader multiple case (MRMC) study design, where a set of readers evaluate image data under multiple reading conditions or modalities (e.g., readers unaided versus readers aided by CADe). The MRMC design can be “fully-crossed” whereby all readers independently read all of the cases. This design offers the greatest statistical power for a given number of cases. However, non-fully crossed study designs may be acceptable, for example in prospective studies where interpretations of the same patient data by multiple clinicians may not be feasible.

Whether you decide on a fully-crossed study design or not, we recommend the use of an MRMC evaluation paradigm to assess the clinical performance of a CADe device using one of the study designs described above. A complete clinical study design protocol should be included in your submission. Pre-specification of the statistical analysis is a key factor for obtaining consistent and convincing scientific evidence. We recommend you provide:7

  • a description of the study design;
  • a description of how the imaging data are to be collected (e.g., make and model of the imaging device imaging protocol) and the expertise of the person collecting the data (e.g., x-ray technician)
  • a copy of the protocol, including the following:
    • hypothesis to be tested and study endpoints,
    • plans for checking any assumptions required to validate the tests,
    • alternative procedures/tests to be used if the required assumptions are not met,
    • study success criteria that indicate which hypotheses should be met in order for the clinical study to be considered a success,
    • statistical and clinical justification of the selected case sample size,
    • statistical and clinical justification of the selected number of readers,
    • image interpretation methodology and relationship to clinical practice,
    • randomization methods, and
    • reader task including rating scale used (see Section 4, subsection Rating Scale);
  • the reader qualifications and experience;
  • a description of the reader training;
  • a statistical analysis plan (i.e., endpoints, statistical methods) with description of:
    • the process for defining truth (see Section 6. Reference Standard),
    • the details of the scoring technique used (see Section 4,subsection Scoring), and
    • any results from a pilot study supporting the proposed design.

Valid estimation of clinical performance for CADe devices is dependent upon sound study design. Aspects of sound clinical study design should include the following:

  • study populations (both diseased and normal cases) are appropriately representative of the intended use population;
  • study design avoids confounding of the CADe effect, e.g., reading session effects
  • sample size is sufficient to demonstrate performance claims;
  • truth definition is appropriate for assessment of performance, and uncertainty in the reference standard is correctly accounted for in the study analysis, if applicable;
  • appropriate data cohorts are represented in the data set;
  • readers are selected such that they are representative of the intended population of clinical users; and
  • imaging hardware are selected such that they are consistent with current clinical practice.

Evaluation Paradigm and Study Endpoints

Study endpoints should be selected to demonstrate that your CADe device is effective (i.e., that in a significant portion of the target population, the use of the device for its intended uses and conditions of use, when accompanied by adequate directions for use and warnings against unsafe use, will provide clinically significant results).8 Selection of the primary and secondary endpoints will depend on the intended use of your device and should be fixed prior to initiating your evaluation. Performance metrics based on the receiver operating characteristic (ROC) curve or variant of ROC (e.g., free-response receiver operating characteristic (FROC) curve or location-specific receiver operating characteristic ( LROC) curve), in addition to sensitivity (Se) and specificity (Sp) at a clinical action point will be likely candidates as endpoints. Considering Se/Sp and an ROC based endpoint allows evaluation of the device over the entire range of operating points as well as at the usual cut point a reader would act on in practice. Data collection for both sets of endpoints can be done simultaneously within a single reader study. Sensitivity (Se) is defined as the probability that a test is positive for a population of patients with the disease/condition/abnormality while Specificity (Sp) is defined as the probability that the test is negative for a population of normal patients (i.e., patients without the disease/condition/abnormality). An ROC curve is a plot of all sensitivities at all possible specificities. It is a summary of diagnostic performance of a device or a clinician. An FROC curve is a plot of sensitivity versus the number of false positive marks. FROC metrics summarize diagnostic performance when multiple disease sites per patient are accounted for in the analysis. See Wagner, et al.9 and the IRCU Report 79 10 for additional details on these assessment paradigms.

Various summary performance metrics to assess the effectiveness of the use of your CADe device by readers may be employed (and may vary based on the specific device and clinical indication). Examples of these include:

  • area, partial area, or any other measures, under ROC curve,
  • area, partial area, or any other measures, under the FROC curve,
  • area, partial area, or any other measures, under the LROC curve,
  • reader Se/Sp (or recall rate11) pair, and
  • reader localization accuracy.

We recommend the inclusion of lesion-based, patient-based, and any other relevant anatomical or image unit-based measures of performance in the assessment. The selection of lesion-based, patient-based or another unit-based measure of performance as a primary or secondary endpoint will depend on the intended use and the expected impact of the device on clinical practice.

For study endpoints based on the area under the ROC/FROC/LROC curve or partial area under the ROC/FROC/LROC curve, we recommend that you provide plots of the actual curves along with summary performance information for both parametric and non-parametric analysis approaches when possible. See Gur et al.12 for potential limitations of relying on only one type of ROC analyses. As mentioned above, we also recommend that you include a sensitivity/specificity (or recall rate) endpoint in your analysis when an area-based endpoint is used because it is not always straightforward to translate the magnitude of an area under the curve (AUC) change into the magnitude of change expected in clinical practice. Reporting sensitivity/specificity (or recall rate) may provide additional information for understanding the expected impact of a device on clinical practice

We recommend that you describe your statistical evaluation methodology, and provide results including:

  • overall reader performance;
  • stratified performance by relevant confounders or effect modifiers (e.g., lesion type, lesion size, lesion location, scanning protocol, imaging hardware, concomitant diseases) (see Section 5, Study Population); and
  • confidence intervals (CIs) that account for reader variability, case variability, and truth variability or other sources of variability when appropriate.

We recommend that you identify and validate your analysis software.13 You should provide a reference to the analysis approach used, clarify the software implementation, and specify a version number if appropriate. Certain validated MRMC analysis approaches, examples of which can be found in the literature or obtained online, may be appropriate for your device evaluation depending on its intended use and conditions of use.14,15 If you plan to write your own analysis software we recommend you submit a copy of the code developed along with your validation data.

The definitions of a true positive, true negative, false positive, and false negative CADe mark should be consistent with the intended use of the device and the characterization of the reference standard (see Section 6, Reference Standard).

Control Arm

We recommend you assess the clinical performance of your CADe device relative to a control modality. For PMA submissions, a study control arm that uses conventional clinical interpretation (i.e., interpretation without the CADe device) should generally be the most relevant comparator in CADe performance assessment. For CADe devices intended as second readers, another possible control is double reading by two clinicians. For 510(k) submissions, direct comparison with the predicate CADe device may be useful for establishing substantial equivalence. Other control arms can be valid. We recommend you contact the Agency to discuss your choice of a control arm prior to conducting your clinical study.

The study control arm should utilize the same reading methodology as the device arm and be consistent with clinical practice. The same population of cases, if not the same cases themselves, should be in all study arms to minimize potential bias. For designs that include distinct cases in each study arm, we recommend you provide a description and flow chart demonstrating how patients and readers were randomized into the different arms.

Reading Scenarios and Randomization

Reading scenarios should be consistent with the intended use of the device. We suggest the following as possible reading scenarios for inclusion as part of the clinical testing:

  • a conventional reading without the CADe device (i.e., reader alone);
  • a second-read in which the CADe output is displayed immediately after conducting a conventional interpretation; and
  • a concurrent or simultaneous read in which the CADe output is available at any time during the interpretation process.

You should randomize readers, cases, and reading scenarios to reduce bias in performance measures. We recommend you describe your randomization methodology and provide an associated flowchart. One approach to randomization is to make use of the principle of Latin squares. For example, when evaluating both concurrent and second-reader modes with a set of 450 cases, a possible study design may consist of first dividing the cases into three groups of 150 cases, A, B and C. Each group is further divided into subsets of fifty cases, which are read with the same reading scenario. If α, β and γ are the index for the conventional reading, the second-read mode and the concurrent reading mode respectively, then reading scenarios and cases can be assigned as follows:

Image Group Reading
Session I
Reading
Session II
Reading
Session III
A(150)(50) α β γ
(50) β γ α
(50) γ α β
B(150)(50) β γ α
(50) γ α β
(50) α β γ
C(150)(50) γ α β
(50) α β γ
(50) β γ α

If the study enrolled four readers, the example above would result in 600=150x4 readings per group per reading session. The order in which the 150 cases are read should be randomized within each group and reading session. Note that the sample sizes used here are for illustrative purposes only. Generally, the sample sizes needed for clinical studies should be representative of the intended use population. Likewise, this example study design illustrated above is not the only one that could be used to validate the effectiveness of your CADe device.

In case of multiple reading sessions where the same cases are read multiple times, we recommend that each reading session be separated in time by at least four weeks to avoid memory bias. However, longer time gaps may be advisable. For shorter or longer time gaps between reading sessions, we recommend you provide data supporting your proposed time gaps.

Rating Scale

You should use conventional medical interpretation and reporting for lesion location, extent, and patient management. ROC-based endpoints (see Section 4, subsection Evaluation Paradigm and Study Endpoints) may support collecting data with a finer rating scale (e.g., a 7-point or 100-point scale) when readers rate the lesion and/or disease status in a patient. We recommend providing training to the readers on the use of the rating scale (see Section 4, subsection Training of Study Participants).

Scoring

We refer to the procedure for determining the correspondence between the reader’s interpretation and the truth (e.g., disease status) as the scoring process. The scoring process and the scoring definition are important components in the clinical assessment of a CADe device and should be described. We recommend you describe the process (i.e., rationale, definition, and criteria) for determining whether a reader’s interpretation corresponds to the truth status established during the truthing process (see Section 6.Reference Standard for information on the truthing process).

In this document, we describe scoring in terms of the clinical performance assessment. A different type of scoring is used to evaluate device standalone performance which is described in the draft guidance entitled Computer-Assisted Detection Devices Applied to Radiology Images and Radiology Device Data - Premarket Notification [510(k)] Submissions.16

The scoring process for the clinical studies should be consistent with the abnormalities marked by the CADe and the intended use of your device. The scoring process should be described and fixed prior to initiating your evaluation. In your description of the scoring process, we recommend you indicate whether the scoring is based on:

  • electronic or non-electronic means;
  • physical overlap of the boundary, area, or volume of a reader mark in relation to the boundary, area, or volume of reference standard;
  • relationship of the centroid of a reader mark to the boundary or spatial location of reference standard;
  • relationship of the centroid of the reference standard to the boundary or spatial location of a reader mark;
  • interpretation by reviewing reader(s); or
  • other methods.

For scoring that relies on interpretations by reviewing readers, we recommend you provide the number of readers involved, their qualifications, their levels of experience and expertise, the specific instructions conveyed to them prior to their participation in the scoring process, and any specific criteria used as part of the scoring process. When multiple readers are involved in scoring, you should describe the process by which their interpretations are combined to make an overall scoring determination or how their interpretations are incorporated in the performance evaluation, including how any inconsistencies are addressed.

Training of Study Participants

We recommend you specify instructions and provide training to study participants on the use of the CADe device and the details on how to participate in the clinical study. Training should include a description of the device and instructions for how to use the device. For specialized reading instructions or rules (e.g., rules for changing initial without-CADe interpretation when reviewing the CADe marks), we recommend you justify their clinical relevance according to reading task, clinical workflow, and medical practice.

We also recommend that training be provided to the readers on the use of the rating scale (see Section 4, subsection Rating Scale), especially if such a rating scale is not generally utilized in clinical practice. Such training helps avoid incorrect or un-interpretable results. We recommend that reader training include rating a representative set of normal and abnormal cases according to the study design methodology, and making use of cases that are not part of the testing database.

5. Study Population

Patient data (i.e., cases) may be collected prospectively or retrospectively based on well-defined inclusion and exclusion criteria. We recommend that you provide the protocol for your case collections. Note that cases collected for your clinical trial should be independent of the cases used during your device development and should be new to the readers participating in the clinical assessment of the device. An acceptable approach for acquiring data is the collection of consecutive cases that are within the inclusion and outside of the exclusion criteria from each participating collection site.

Enrichment with diseased/abnormal cases is permissible for an efficient and less burdensome representative case dataset. You may also enrich the study population with patient cases that contain imaging findings (or other image data) that are challenging to clinicians but that still fall within the device’s intended use population. This enrichment is often referred to as stress testing. For example, if assessing a CADe device designed to assist in detecting colon polyps, the study population may be enriched with cases containing small polyps. Enrichment may affect reader performance so the extent of enrichment should be weighed against the introduction of biases into the study design.

The sample size of the study should be large enough such that the study has adequate power to detect with statistical significance your proposed performance claims. If performance claims are proposed for individual subsets, then the sample sizes for these subsets should be determined accordingly to detect these claims with statistical significance. For formal subset analysis, a pre-specified statistical adjustment for the testing of multiple subsets would be statistically necessary.

The study population should be representative of the intended use population for your device. Your study dataset should include the full range of diseased/abnormal and normal cases. The study should also contain a sufficient number of cases from important cohorts (e.g., subsets defined by clinically relevant confounders, effect modifiers, and concomitant diseases) such that clinical performance estimates can be obtained for these individual subsets. As stated above, powering these subsets for statistical significance may not be recommended unless specific subset performance claims are being included.

When describing your study population, we recommend you provide specific information, where appropriate, including:

  • the patient demographic data (e.g., age, ethnicity, race);
  • the patient medical history relevant to the CADe application;
  • the patient disease state and indications for the radiologic test
  • the conditions of radiologic testing, e.g. technique (including whether the test was performed with/without contrast, contrast type and dose per patient, patient body mass index, radiation exposure, T-weighting for MRI images) and views taken
  • a description of how the imaging data were collected (e.g., make and model of imaging devices and the imaging protocol) and the expertise of the person collecting the data (e.g., x-ray technician)
  • the collection sites;
  • the processing sites if applicable (e.g., patient data digitization);
  • the number of cases:
    • the number of diseased cases
    • the number of normal cases
    • methods used to determine disease status, location and extent (see Section 6. Reference Standard);
  • the case distributions stratified by relevant confounders or effect modifiers, such as lesion type (e.g., hyperplastic vs. adenomatous colonic polyps), lesion size, lesion location, disease stage, organ characteristics (e.g., breast composition), concomitant diseases, imaging hardware (e.g., makes and models), imaging or scanning protocols, collection sites, and processing sites (if applicable); and
  • a comparison of the clinical, imaging, and pathologic characteristics of the patient data compared to the target population.

Data Poolability

Premarket approval applications based solely on foreign data and otherwise meeting the criteria for approval may be approved if, among other requirements, the foreign data are applicable to the United States (U.S.) population and U.S. medical practice and the studies have been performed by clinical investigators of recognized competence (21 CFR 814.15). You should justify why non-U.S. data reflects what is expected for a U.S. population with respect to disease occurrence, characteristics, practice of medicine, and clinician competency. In accordance with good clinical study design, you should justify, both statistically and clinically, the poolability of data from multiple sites. We recommend that premarket notification applications follow similar quality data practices with regard to foreign data and data poolability. You are encouraged to contact the Agency if you intend to make use of foreign data as the basis of your premarket submission.

6. Reference Standard

For purposes of this document, the reference standard (also often called the “gold standard” or “ground truth” in the imaging community) for patient data indicates whether the disease/condition/abnormality is present and may include such attributes as the extent or location of the disease/condition/abnormality. We refer to the characterization of the reference standard for the patient, e.g., disease status, as the truthing process.

We recommend that you provide the rationale for your truthing process and indicate if it is based on:

  • the output from another device;
  • an established clinical determination (e.g., biopsy, specific laboratory test);
  • a follow-up clinical imaging examination;
  • a follow-up medical examination other than imaging; or
  • an interpretation by a reviewing clinician(s) (i.e., truther(s)).

We also recommend that you describe the methodology utilized to make this reference standard determination (e.g., based on pathology or based on a standard of care determination). For truthing that relies on the interpretation by a reviewing clinician(s), we recommend you provide:

  • the number of truthers involved;
  • their qualifications;
  • their levels of experience and expertise;
  • the instructions conveyed to them prior to participating in the truthing process;
  • all available clinical information from the patient utilized by the truthers in the identification of disease/condition/abnormality and in the marking of the location and extent of the disease/condition/abnormality; and
  • any specific criteria used as part of the truthing process.

hen multiple truthers are involved, you should describe the process by which their interpretations are combined to make an overall reference standard determination and how your process accounts for inconsistencies between clinicians participating in the truthing process (truth variability) (see Section  4, subsection Evaluation Paradigm and Study Endpoints). Note that clinicians participating in the truthing process should not be the same as those who participate in the core clinical performance assessment of the CADe device.

7. Reporting

Reporting of performance results may be guided by the FDA Guidance entitled Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests; Guidance for Industry and FDA Reviewers.17 We recommend submitting electronically the data used in any statistical analysis in your study including the following:

  • patient information,
  • disease or normal status,
  • concomitant diseases,
  • lesion size,
  • lesion type,
  • lesion location,
  • disease stage,
  • organ characteristics.
  • imaging hardware,
  • imaging or scanning protocol,
  • imaging and data characteristics (e.g., characteristics associated with differences in digitization architectures for a CADe using scanned films),
  • and statistical analysis.

For more information on submitting data electronically, please see the FDA white paper entitled Clinical Data for Premarket Submissions.18

8. Postmarket Planning for PMAs

FDA applies the “Total Product Life Cycle (TPLC)” model to promote and protect the public health. Premarket approval (PMA) applications should include a postmarket plan to assess the continued safety, effectiveness, and reliability of an approved device for its intended use.

One potential piece of a postmarket plan is a post-approval study (PAS). FDA may require you to conduct a post-approval study as a condition of approval in a PMA approval order (21 CFR 814.82(a)(2)). A post-approval study is not always necessary as a condition of approval. FDA determines whether one is necessary on a case-by-case basis.

In the event your PMA approval order does require a post-approval study, we suggest that the study population characterization include race, age and target population baselines. FDA recommends that the target population include baselines for prevalence of the abnormality to be detected, as well as current screening method sensitivity, positive predictive value (PPV), specificity, negative predictive value (NPV), biopsy rate, and recall rate. FDA further recommends that you include in your study protocol, at a minimum, the following:

  • Radiologist training and experience for those participating in the PAS
  • User training with the CADe device
  • Adjustments to CADe systems that may occur during the study period
  • Types of abnormalities detected
  • Type of imaging center
  • Consecutive enrollment of subjects
  • Study sensitivity, PPV, specificity, NPV, biopsy rate, recall rate, false-negative rate, number of missed abnormalities (may consider evaluation of readings at next exam for comparison of missed abnormalities)
  • Area under of curve and/or ROC analysis

FDA will work interactively with you to finalize the postmarket plan and/or any post-approval study protocol prior to approval decisions so that they are ready to implement if the device is approved.

For additional information, please refer to the FDA Guidance entitled Procedures for Handling Post-Approval Studies Imposed by PMA Order; Guidance for Industry and FDA Staff.19


1 The use of the acronym CADe for computer-assisted detection may not be a generally recognized acronym in the community at large. It is used here to identify the specific type of devices discussed in this document.

2 http://www.fda.gov/ohrms/dockets/ac/cdrh08.html#radiology

3 For any use of a contrast imaging agent, we recommend that you verify that such comports with the regulation, labeling, and indications of the imaging drugs and devices. You may wish to consult the draft guidance New Contrast Imaging Indication Considerations for Devices and Approved Drug and Biological Products (DRAFT) (http://www.fda.gov/downloads/RegulatoryInformation/Guidances/UCM126051.pdf) for new contrast imaging drugs and devices indications.

4 This submission may be a premarket notification (510(k)), an application for premarket approval (PMA), an application for a product development protocol (PDP), an application for a humanitarian device exemption (HDE), or an application for an investigational device exemption (IDE).

5 A 510(k) submission and a PMA application are the most common submission types for the CADe devices addressed in this draft guidance. As described in the draft guidance Computer-Assisted Detection Devices Applied to Radiology Images and Radiology Device Data - Premarket Notification [510(k)] Submissions (http://www.fda.gov/ MedicalDevices/DeviceRegulationandGuidance/ GuidanceDocuments/ucm187249.htm ),some CADe devices are Class II regulated under 21 CFR 892.2050 and require a 510(k) while others are Class III and require a PMA. For more information on the various device classes, see Section 513(a)(1) of the Federal Food, Drug, and Cosmetic Act (the Act) (21 U.S.C. 360c(a)(1)).

6 Section 513(g) of the Act (21 U.S.C. 360c(g)) provides a means for obtaining the Agency's views about the classification and the regulatory requirements that may be applicable to your device.

7 Precisely what information you should provide to FDA will depend largely on the type of premarket submission required for your device.

8 See 21 CFR 860.7(e).

9 Wagner, R. F., Metz, C. E., and Campbell, G., “Assessment of medical imaging systems and computer aids: A tutorial review,” Acad. Radiol. 14:723–48, 2007.

10 ICRU Report 79, “Receiver Operating Characteristic Analysis in Medical Imaging,” Vol.8 No.1 (2008), Oxford University Press (ISSN 1473-6691).

11 Recall rate refers to the percentage of patients (including diseased and non-diseased patients) that are called back or recalled for additional medical assessment.

12 Gur, D., Bandos, A.I., and Rockette, H.E., “Comparing Areas under Receiver Operating Characteristic Curves: Potential Impact of the Last Experimentally Measured Operating Point,” Radiology 247:12–15, 2008.

13 For more information on MRMC analysis software, see, for example, Obuchowski, N. A., Beiden, S. V., Berbaum, K. S., Hillis, S. L., Ishwaran, H., Song, H. H., and Wagner, R. F., “Multi-reader, multi-case ROC analysis: An empirical comparison of five methods,” Acad. Radiol. 11: 980–995, 2004.

14 For MRMC literature references, s ee, for example: Metz, C. E., “Fundamental ROC analysis,” Handbook of Medical Imaging. Vol. 1. Physics and Psychophysics. Beutel J, Kundel HL, and VanMetter RL (Eds.) SPIE Press, 751–769, 2000; Wagner, R. F., Metz, C. E., and Campbell, G., “Assessment of medical imaging systems and computer aids: A tutorial review,” Acad. Radiol. 14:723–48, 2007; Obuchowski, N. A., Beiden, S. V., Berbaum, K. S., Hillis, S. L., Ishwaran, H., Song, H. H., and Wagner, R. F., “Multi-reader, multi-case ROC analysis: An empirical comparison of five methods,” Acad. Radiol. 11: 980–995, 2004.

15 For online access to software that analyzes MRMC data based on validated techniques, see, for example: LABMRMC software and general ROC software, The University of Chicago: http://xray.bsd.uchicago.edu/krl/roc_soft6.htm (for either quasi-continuous or categorical data); University of Iowa MRMC software: ftp://perception.radiology.uiowa.edu/PUBLIC (for categorical data); OBUMRM software: http://www.bio.ri.ccf.org/html/obumrm.html.Exit Disclaimer

16 http://www.fda.gov/ MedicalDevices/DeviceRegulationandGuidance/ GuidanceDocuments/ucm187249.htm

17 http://www.fda.gov/ MedicalDevices/DeviceRegulationandGuidance/ GuidanceDocuments/ucm071148.htm

18 http://www.fda.gov/ MedicalDevices/DeviceRegulationandGuidance /HowtoMarketYourDevice/PremarketSubmissions/ ucm136377.htm

19 http://www.fda.gov/ MedicalDevices/DeviceRegulationandGuidance/ GuidanceDocuments/ucm070974.htm

    
-
-
-
-
-