CERSI Collaborators: University of California San Francisco: Jean Feng, PhD; Romain Pirracchio, MD, PhD
FDA Collaborators: : Center for Devices and Radiological Health: Berkman Sahiner, PhD; Alexej Gossmann, PhD; Gene Pennello, PhD; Nicholas Petrick, PhD
Project Start Date: 12/2020
Regulatory Science Challenge
With the growing use of artificial intelligence/machine learning (AI/ML)-based algorithms in medical devices, the FDA is looking to develop statistical frameworks that allow these algorithms to evolve over time continuously and safely. As compared to locked or fixed algorithms, continuously learning algorithms have the potential to learn from vast amounts of data generated from the daily delivery of healthcare, adapt in dynamic environments, and improve over time. For more information, please visit the AI/ML in Software as a Medical Device webpage.
A primary focus of the FDA’s proposed framework is the Predetermined Change Control Plan (PCCP). A PCCP is intended to provide details on the specific changes a manufacturer plans to implement to modify their medical device and would allow developers to explain how they will validate the safety and efficacy of any modifications to their algorithm. However, the FDA has only discussed PCCPs at a high-level. An important next step is to provide recommendations on the design of performance evaluation methods in PCCPs to ensure that a device continues to maintain its safety and effectiveness.
Project Description and Goals
This project will develop performance evaluation methods that provide performance guarantees for frequently updated ML algorithms. Researchers will consider two common motivations for introducing algorithmic updates and design the performance evaluation component of the PCCPs for each use case:
- ML algorithms are known to decay in performance over time due to shifts in clinical practice patterns, emergence of new diseases (e.g. COVID-19), patient characteristics, and more. As such, ML algorithms need to be regularly updated so that their performance is maintained over time. Researchers will design performance evaluation methods that approve algorithmic modifications to protect against performance decay.
- After the initial release of a ML algorithm, the model can be fine-tuned by retraining on subsequently gathered data, adding newly discovered features, or more. It may not always be practical to assemble a new dataset for testing each modification, especially when most modifications are minor or are implemented in rapid succession. Researchers will design a performance evaluation method that approves modifications by reusing a central benchmark dataset.
Ultimately, these studies will inform the FDA and model developers on how to design PCCPs that ensure safety and effectiveness of their AI/ML-enabled software as a medical device.
Feng, Jean. 2021. “Learning to Safely Approve Updates to Machine Learning Algorithms.” In Proceedings of the Conference on Health, Inference, and Learning, 164–73. CHIL ’21. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3450439.3451864.
Feng J, Gossmann A, Sahiner B, Pirracchio R. Bayesian logistic regression for online recalibration and revision of risk prediction models with performance guarantees. J Am Med Inform Assoc. 2022 Jan 11:ocab280. doi: https://doi.org/10.1093/jamia/ocab280. Epub ahead of print. PMID: 35022756.
Feng, J., Pennllo, G., Petrick, N., Sahiner, B., Pirracchio, R. & Gossmann, A.. (2022). Sequential algorithmic modification with test data reuse. Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 180:674-684 Available from https://proceedings.mlr.press/v180/feng22a.html.