Addressing the Limitations of Medical Data in AI
As part of the Artificial Intelligence (AI) Program in the FDA’s Center for Devices and Radiological Health (CDRH), the goal of this regulatory science research is to study the possibilities and limitations of supplementing medical patient datasets with synthetic data, for example, artificial data that has been partially or fully generated using computational techniques.
Overview
Rapid development and regulatory assessment of medical AI models can bring timely and accurate diagnosis for patients and reduce disparities in health access. However, development and assessment may also require large datasets across various patient population distributions and imaging conditions. For medical device developers, obtaining representative patient datasets with appropriate annotations may be burdensome due to high acquisition cost, safety limitations, patient privacy restrictions, or low disease prevalence rates. Synthetic (also known as in silico) data may allow for obtaining labeled examples more safely and effectively as opposed to collecting real patient data.
Projects
- REALYSM: Regulatory Evaluation of Artificial Intelligence using Physics Simulation
- Generative Data Augmentation using Adversarial Examples for Increasing Model Generalizability
- In Silico CT Imaging Datasets for Pediatric Device Assessment of Intracranial Hemorrhage
- Synthetic Medical Data Evaluation Beyond Similarity Metrics
Above: Real patient datasets can be supplemented by creating realistic digital object models, digital replicas of acquisition devices, and resulting large-scale synthetic datasets.
Resources
- “M-SYNTH: Knowledge-based in silico models and dataset for the comparative evaluation of mammography AI,” Catalog of Regulatory Science Tools (RST).
- “VICTRE: In Silico Breast Imaging Pipeline,” Catalog of Regulatory Science Tools (RST).
- “MCGPU: GPU-accelerated Monte Carlo X-ray Imaging Simulator,” Catalog of Regulatory Science Tools (RST).
- Sengupta A, Lago M A, Badano A, “In situ tumor model for longitudinal in silico imaging trials,” 2024 Phys. Med. Biol.
- Sizikova E, Saharkhiz N, Sharma D, Lago M, Sahiner B, Delfino JG, Badano A, “Knowledge-based in silico models and dataset for the comparative evaluation of mammography AI for a range of breast characteristics, lesion conspicuities and doses,” Advances in Neural Information Processing Systems (NeurIPS) 2023.
- Badano A, Lago M, Sizikova E, Delfino JG, Guan S, Anastasio MA, Sahiner B, “The stochastic digital human is now enrolling for in silico imaging trials – Methods and tools for generating digital cohorts,” Progress in Biomedical Engineering 2023.
- Sizikova E, Saharkhiz N, Sharma D, Lago M, Sahiner B, Delfino JG, Badano A, “Knowledge-based in silico models and dataset for the comparative evaluation of mammography AI,” NeurIPS Workshop on Synthetic Data Generation with Generative AI 2023.
- Sengupta A, Badal A, Makeev A, Badano A. “Computational models of direct and indirect X-ray breast imaging detectors for in silico trials.” Med Phys. 2022; 49: 6856–6870.
For more information, email OSEL_AI@fda.hhs.gov.