Evaluation Methods for Artificial Intelligence (AI)-Enabled Medical Devices: Performance Assessment and Uncertainty Quantification
As part of the Artificial Intelligence (AI) Program in the FDA’s Center for Devices and Radiological Health (CDRH), this regulatory science research aims to help device developers, reviewers, and other stakeholders to determine and use least burdensome metrics for appropriate evaluation of AI-enabled medical devices.
Overview
Different intended applications of AI-enabled medical devices in medicine require distinct metrics for performance assessment. For example, tasks such as classification, estimation, image segmentation, time-to-event analysis, and detections/localization of abnormalities each may need to use different performance metrics. Furthermore, within each task, there are nuances as to how the output is presented to the user, and how the data is collected and structured. These factors affect what kind of performance metric(s) are best suited to assess device performance.
The first goal of this effort is to develop tools for the selection of appropriate metrics in the assessment of AI-enabled device performance. This challenge is amplified by the fact that for many AI-enabled medical devices, the reference standard, or the “label,” for a case often has high uncertainty or variability. For example, the label may need to be defined based on the subjective review of experts, which may lead to high variability in reference standard. This uncertainty in labeling combines with other types of uncertainties, such as lack of knowledge or data, and random effects in machine learning are reflected at the output of AI devices.
The second goal of this effort is to develop methods and tools to quantify this uncertainty, and, if applicable, convey it in the device output to users, and measure its effect on users. In this project, we will develop methods and tools for uncertainty quantification in AI-enabled algorithms. Accurate quantification of uncertainty and a thorough understanding of factors impacting uncertainty would allow review teams and regulatory scientists to assess the calibration of uncertainty outputs. These outputs, when adequately validated, will enable clinicians to make more informed clinical decisions that will benefit patients and the public health.
Project
- AI assessment metric selection tree with the development of novel assessment methods for image segmentation models.
Resources
- “AI/ML Classification Metric Decision Tree for Medical Imaging,” OSEL Regulatory Science Tools Catalog, 2024.
- Drukker K, Sahiner B, Hu T, Kim GH, Whitney HM, Baughan N, Myers KJ, Giger ML, McNitt-Gray M., “MIDRC-MetricTree: a decision tree-based tool for recommending performance metrics in artificial intelligence-assisted medical image analysis,” J. Med. Imag. 11(2), 024504 (2024), doi: 10.1117/1.JMI.11.2.024504.
For more information, email OSEL_AI@fda.hhs.gov.