Performance Evaluation Methods for Evolving Artificial Intelligence (AI)-Enabled Medical Devices
As part of the Artificial Intelligence (AI) Program in the FDA’s Center for Devices and Radiological Health (CDRH), the goal of this regulatory science research is to develop methods for performance evaluation of model updates for artificial intelligence/machine learning (AI/ML)-enabled devices.
Overview
On March 30, 2023, the FDA’s Center for Devices and Radiological Health (CDRH) published the draft guidance document: Marketing Submission Recommendations for a Predetermined Change Control Plan (PCCP) for Artificial Intelligence/Machine Learning (AI)-Enabled Device Software Functions. This draft guidance aims to enable device manufacturers to include a plan in an FDA submission so that the device can evolve within controlled boundaries while on the market. This approach is expected to enable manufacturers to make modifications and updates more easily to their devices, while also maintaining the FDA’s ability to assure continued device safety and effectiveness. While the draft guidance outlines a sound approach, there are areas in the premarket evaluation of devices with PCCPs that require further technical analysis for a least burdensome path to the market.
Well-curated, labeled, and representative datasets in medical applications are difficult and resource-intensive to collect, so device sponsors naturally wish to reuse their test datasets in evaluating their devices with PCCPs. However, repeatedly using the same test dataset when testing a sequence of AI model updates can be problematic, because the AI model can end up overfitting to the test dataset. If this happens, the performance evaluation will yield misleading, overly optimistic results, and the models will fail to generalize to new data. There is a need for methods to re-use evaluation datasets safely for devices with a PCCP. Other knowledge gaps in this area include the implications of potential changes to the reference standard, how much change is acceptable to maintain an appropriate benefit/risk profile, and how to balance plasticity/stability of continuously learning models.
The goal of this effort is to address issues by:
- Developing statistical methods and theoretical results as well as performing empirical experiments and studies.
- Releasing regulatory science tools that can be used to design studies that will continuously measure performance for evolving algorithms under a postmarket assurance plan.
Project
- Develop Methods for Performance Evaluation of Model Updates for AI/ML-Enabled Devices with a PCCP
Resources
- Feng, J., Pennello, G., Petrick, N., Sahiner, B., Pirracchio, R., & Gossmann, A. (2022). Sequential Algorithmic Modification with Test Data Reuse. Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, 674–684.
- A. Burgon, B. Sahiner, N. Petrick, G. Pennello, R. K. Samala, “Methods for improved understanding of evolving AI model learning and knowledge retention across sequential modification steps,” RSNA Program Book (2023).
- Gossmann, A. (2022). Test Data Reuse for the Evaluation of Continuously Evolving Machine Learning Algorithms in Medicine. Invited talk at the Tutorial on AI for medical image analysis in practice at MICCAI 2022. September 21, 2022.
For more information, email OSEL_AI@fda.hhs.gov.