FDA Digital Health and Artificial Intelligence Glossary – Educational Resource

This glossary is a compilation of commonly used terms in the digital health, artificial intelligence, and machine learning space and their definitions. These definitions are either directly from, or adapted from, various public sources, including consensus standard organizations and published literature.

The glossary is intended for general educational purposes. The glossary does not constitute agency guidance, policy, or recommendations. The glossary also does not constitute legally enforceable requirements, nor does it affect any requirements of the Federal Food, Drug, and Cosmetic Act or implementing regulations.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

A

Artificial Intelligence (AI)

A machine-based system that can, for a given set of human-defined objectives, make predictions, recommendations, or decisions influencing real or virtual environments. Artificial intelligence systems use machine- and human-based inputs to perceive real and virtual environments; abstract such perceptions into models through analysis in an automated manner; and use model inference to formulate options for information or action.

Source: 15 U.S.C. 9401(3). https://www.govinfo.gov/content/pkg/USCODE-2020-title15/html/USCODE-2020-title15-chap119.htm

Related Term(s): Machine Learning Model

Artificial Intelligence Performance Monitoring (AI Performance Monitoring)

Refers to the process of regularly collecting and analyzing data on the use of a deployed AI system to evaluate its performance in accomplishing its intended tasks in real-world settings. The assessment of an AI model’s performance involves various performance metrics and criteria depending on the specific application. This monitoring typically aims to assess the performance of these AI systems in practice, detect performance degradation or changes (e.g., due to data drift), identify instances of misuse, and address any safety or usability concerns.

Source: Adapted from:

Sahiner, B., Chen, W., Samala, R. K., & Petrick, N. (2023). Data drift in medical machine learning: implications and potential remedies. British Journal of Radiology, 96(1150). https://doi.org/10.1259/bjr.20220878

Related Term(s): Data Drift

Artificial Intelligence System (AI System)

Engineered system that generates outputs such as content, forecasts, recommendations, or decisions for a given set of human-defined objectives.

Source: International Organization for Standardization. (2022). Information technology — Artificial intelligence — Artificial intelligence concepts and terminology (ISO/IEC 22989:2022). https://www.iso.org/standard/74296.html

Assistive Artificial Intelligence (Assistive AI)

AI-enabled products designed to assist human decision making. The AI only provides suggestions, information, or data that helps users make more informed decisions.

Assistive AI and Autonomous AI exist on a spectrum. Examples of Assistive AI might include a wearable device that monitors a patient’s vital signs and alerts the user or a healthcare provider when certain metrics are out of the normal range or a product that assists radiologists by showing the location of a potential abnormality.

Source: Adapted from Bajwa, J., Munir, U., Nori, A., & Williams, B. (2021). Artificial intelligence in healthcare: transforming the practice of medicine. Future Healthcare Journal, 8(2), e188–e194. https://doi.org/10.7861/fhj.2021-0095

Related Term(s): Autonomous Artificial Intelligence

Autonomous Artificial Intelligence (Autonomous AI)

AI-enabled products that have the ability to perform tasks, operate independently, and make decisions without human intervention. The level of autonomy can vary based on the product.

Assistive AI and Autonomous AI exist on a spectrum. An example of Autonomous AI could be a product that autonomously identifies normal X-rays and creates reports without the need for radiologist intervention.

Source: Adapted from:

International Organization for Standardization. (2022). Information technology — Artificial intelligence — Artificial intelligence concepts and terminology (ISO/IEC 22989:2022). https://www.iso.org/standard/74296.html
Sáenz, A.D., Harned, Z., Banerjee, O., Abràmoff, M. D., & Rajpurkar, P. (2023). Autonomous AI systems in the face of liability, regulations and costs. Npj Digital Medicine, 6(1). https://doi.org/10.1038/s41746-023-00929-1
Yang, G., Cambias, J., Cleary, K., Daimler, E., Drake, J., Dupont, P. E., Hata, N., Kazanzides, P., Martel, S., Patel, R. V., Santos, V. J., & Taylor, R. H. (2017). Medical robotics—Regulatory, ethical, and legal considerations for increasing levels of autonomy. Science Robotics, 2(4). https://doi.org/10.1126/scirobotics.aam8638

Related Term(s): Assistive Artificial Intelligence

C

Continual Machine Learning

The ability of a model to adapt its performance by incorporating new data or experiences over time while retaining prior knowledge/information.

The model changes are implemented such that for a given set of inputs, the output may be different before and after the changes are implemented. These changes are typically implemented and validated through a well-defined process that aims at improving performance based on analysis of new data. In contrast to a locked model, a continual machine learning model has a defined learning process to change its behavior.

Source: Adapted from International Organization for Standardization. (2022). Information technology — Artificial intelligence — Artificial intelligence concepts and terminology (ISO/IEC 22989:2022). https://www.iso.org/standard/74296.html

Synonym(s): Adaptive Model, Lifelong Learning

Related Term(s): Locked Model

Convolutional Neural Network (CNN)

A specialized deep neural network architecture that consists of one or more convolution layers that is suited for processing grid-like data, such as images. In a convolution layer, a “filter” (window or template) slides over regions of the input image to identify low-level patterns (e.g., edges) by applying convolution (a mathematical dot operation applied to the input data). Different filters can be applied to extract different features, such as edges, textures, or curves in images. Additionally, CNNs can include pooling layers, whose function is to reduce the feature dimensionality while retaining relevant features. These convolution and pooling layers get stacked on top of each other to enable this network to build up a hierarchical understanding of patterns and makes CNNs effective at tasks like image recognition and computer vision. An important aspect of this network is its ability to conserve spatial information of the original input while still performing the feature extraction.

Source: Adapted from:

Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., Liu, T., Wang, X., Wang, G., Cai, J., & Chen, T. (2018). Recent advances in convolutional neural networks. Pattern Recognition, 77, 354–377. https://doi.org/10.1016/j.patcog.2017.10.013
International Organization for Standardization. (2022). Information technology — Artificial intelligence — Artificial intelligence concepts and terminology (ISO/IEC 22989:2022). https://www.iso.org/standard/74296.html
Li, Z., Liu, F., Yang, W., Peng, S., & Zhou, J. (2022). A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects. IEEE Transactions on Neural Networks and Learning Systems, 33(12), 6999–7019. https://doi.org/10.1109/tnnls.2021.3084827

Related Term(s): Deep Learning, Neural Network

D

Data Card

A structured report of relevant characteristics of datasets needed by stakeholders for AI development and evaluation. It contains a descriptive section including descriptive information such as number of samples, collection protocols and associated metadata, and a scorecard section, a quantitative analysis reporting dataset characteristics using relevant criteria and metrics.

Source: Adapted from Pushkarna, M., Zaldivar, A., & Kjartansson, O. (2022). Data Cards: Purposeful and Transparent Dataset Documentation for Responsible AI. Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT '22). Association for Computing Machinery. https://doi.org/10.1145/3531146.3533231

Related Term(s): Model Card

Data Drift

Refers to the change in the input data distribution a deployed model receives over time, which can cause the model's performance to degrade. This occurs when the properties of the underlying data change. Data drift can affect the accuracy and reliability of predictive models.

For example, medical AI-enabled products can experience data drift due to, statistical differences between the data used for model development and data used in clinical operation due to variations between medical practices or context of use between training and clinical use, and changes in patient demographics, disease trends, and data collection methods over time.

Source: Adapted from:

International Organization for Standardization. (2020). Software and systems engineering — Software testing — Part 11: Guidelines on the testing of AI-based systems (ISO/IEC TR 29119-11:2020). https://www.iso.org/standard/79016.html
Sahiner, B., Chen, W., Samala, R. K., & Petrick, N. (2023). Data drift in medical machine learning: implications and potential remedies. British Journal of Radiology, 96(1150). https://doi.org/10.1259/bjr.20220878

Deep Learning

A specialized branch of ML that involves training neural networks with multiple intermediary (hidden) layers that operate between an input layer that receives data and an output layer that presents the final network output. Each layer learns to transform its input data into a slightly more abstract and composite representation and produces an output that serves as an input for the next layer. As data propagates through successive layers, these models are able to learn hierarchical feature representations from the input data.

For example, in healthcare, deep learning models can be used to identify tumors or suspicious lesions in medical images to support physicians and radiologists in the evaluation of disease.

Source: Adapted from:

Esteva, A., Robicquet, A., Ramsundar, B., Kuleshov, V., DePristo, M., Chou, K., Cui, C., Corrado, G., Thrun, S., & Dean, J. (2019). A guide to deep learning in healthcare. Nature Medicine, 25(1), 24–29. https://doi.org/10.1038/s41591-018-0316-z
International Organization for Standardization. (2020). Software and systems engineering — Software testing — Part 11: Guidelines on the testing of AI-based systems (ISO/IEC TR 29119-11:2020). https://www.iso.org/standard/79016.html

Synonym(s): Deep Neural Network Learning

Related Term(s): Neural Network

Digital Health Technology (DHT)

A system that uses computing platforms, connectivity, software, and/or sensors for healthcare and related uses. These technologies span a wide range of uses, from applications in general wellness to applications as a medical device. They include technologies intended for use as a medical product, in a medical product, or as an adjunct to other medical products (devices, drugs, and biologics). They may also be used to develop or study medical products.

Source: From U.S. Food and Drug Administration. (2023). Digital Health Technologies for Remote Data Acquisition in Clinical Investigations, Guidance for Industry, Investigators, and Other Stakeholders. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/digital-health-technologies-remote-data-acquisition-clinical-investigations.

Digital Twin

A set of information constructs that mimics the structure, context, and behavior of a physical asset, is dynamically updated with data from its physical twin throughout its lifecycle and informs decisions. The bidirectional interaction between the virtual and the physical is central to the digital twin.

Digital twins can enable personalized medicine applications. For example, the digital twin of a patient could inform clinical decisions, such as treatment options and clinical assessments. In addition, digital twins can play a role in assembling large cohorts for in silico clinical trials, and in quality assessment and process optimization of drug manufacturing processes.

Source: Adapted from:

American Institute of Aeronautics and Astronautics (2020). Digital Twin: Definition & Value. An AIAA and AIA Position Paper. https://www.aiaa.org/docs/default-source/uploadedfiles/issues-and-advocacy/policy-papers/digital-twin-institute-position-paper-(december-2020).pdf
Badano, A., Lago, M. A., Sizikova, E., Delfino, J. G., Guan, S., Anastasio, M. A., & Sahiner, B. (2023). The stochastic digital human is now enrolling for in silico imaging trials—methods and tools for generating digital cohorts. Progress in Biomedical Engineering, 5(4), 042002. https://doi.org/10.1088/2516-1091/ad04c0
National Academies of Sciences, Engineering, and Medicine. (2023). Foundational Research Gaps and Future Directions for Digital Twins. https://doi.org/10.17226/26894

E

Ensemble Methods

ML techniques that combine multiple models to improve the overall predictive performance compared to using a single model.

This involves training a set of base models, such as neural networks, and then aggregating their predictions to make the final prediction. Some common ensemble methods include bagging (i.e., training multiple models on different subsets of the training data and averaging their predictions), boosting (i.e., training models sequentially where each new model focuses on correcting the errors of the previous model), and stacking (i.e., using the predictions of multiple base models as input features for a higher-level “meta-model” that learns how to best combine them).

Source: Adapted from:

Dietterich, T.G. (2000). Ensemble Methods in Machine Learning. In: Multiple Classifier Systems. MCS 2000. Lecture Notes in Computer Science. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45014-9_1
IEEE Standards. (2022). IEEE Standard for Performance and Safety Evaluation of Artificial Intelligence Based Medical Devices: Terminology (IEEE Std 2802™‐2022). https://standards.ieee.org/ieee/2802/7460/

Synonym(s): Ensemble Learning

Explainability

"Refers to a representation of the mechanisms underlying AI systems’ operation." (Source: NIST)

Explainability may help overcome the opaqueness of black-box systems (i.e., systems where the internal workings and decision-making processes are not transparent or readily understandable). These explanations can take various forms, including free-text explanations, saliency maps, SHapley Additive exPlanations (SHAP), or relevant input examples from data. The primary intent is to answer the question "Why" an AI system made a particular decision. Appropriate Explainable AI (XAI) methods may enable the development of more accurate, interpretable, and transparent AI systems to safely augment human decision-making in healthcare.

Source: Adapted from:

Arrieta, A. B., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., García, S., Gil-López, S., Molina, D., Benjamins, R., Chatila, R., & Herrera, F. (2020). Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82–115. https://doi.org/10.1016/j.inffus.2019.12.012
International Organization for Standardization. (2022). Information technology — Artificial intelligence — Artificial intelligence concepts and terminology (ISO/IEC 22989:2022). https://www.iso.org/standard/74296.html

Synonym(s): Explainable AI (XAI)

Related Term(s): Interpretability

F

Feature Engineering

A ML process where attributes from raw data that best represent the underlying patterns are identified for use in training a specific ML model. It involves selecting, transforming, or creating relevant input variables (known as features) to enhance the performance of ML models.

Domain knowledge and data analysis techniques can be used to craft features that capture the inherent relationships in the data. For example, for a model that can predict heart failure, feature engineering on patient data may involve creating a “risk score” by combining relevant features such as age, blood pressure, cholesterol levels, and a history of cardiovascular disease.

Source: Adapted from International Organization for Standardization. (2020). Software and systems engineering — Software testing — Part 11: Guidelines on the testing of AI-based systems (ISO/IEC TR 29119-11: 2020). https://www.iso.org/standard/79016.html

Federated Learning

A decentralized approach to training ML models. Models are trained by each site on data that are kept locally, and model updates are sent to a central server, whereby the central server aggregates these updates to improve a global model. This method is designed to preserve data privacy, as raw data remain at the local sites and are not centralized.

For example, federated learning can allow hospitals to collaborate on a heart disease prediction model without sharing patient data. The model is sent to be trained locally at each hospital, and only the model updates from each hospital, not raw data, are sent back and aggregated. This way, individual patient information remains localized, addressing privacy concerns while still benefiting from a collectively improved model.

Source: Adapted from:

Darzidehkalani, E., Ghasemi-rad, M., & van Ooijen, P. M. A. (2022). Federated Learning in Medical Imaging: Part I: Toward Multicentral Health Care Ecosystems. Journal of the American College of Radiology, 19(8), 969–974. https://doi.org/10.1016/j.jacr.2022.03.015
IEEE Standards. (2022). IEEE Standard for Performance and Safety Evaluation of Artificial Intelligence Based Medical Devices: Terminology (IEEE Std 2802™‐2022). https://standards.ieee.org/ieee/2802/7460/

Synonym(s): Federated Machine Learning

Foundation Models

AI models trained using large, typically unlabeled datasets and significant computational resources, that are applicable across a wide range of contexts, including some that the models were not specifically developed and trained for (i.e., emergent capabilities). These models can serve as a foundation upon which further models can be built and adapted for specific uses through further training (i.e., fine-tuning). These models can perform a range of general tasks, such as text synthesis, image manipulation, and audio generation. These models are based on deep learning architectures like transformers and can use either unimodal or multimodal input data.

Source: Adapted from:

Jones, E. (2023). Explainer: What is a foundation model? Ada Lovelace Institute. https://www.adalovelaceinstitute.org/resource/foundation-models-explainer/
Wornow, M., Xu, Y., Thapa, R., Patel, B., Steinberg, E., Fleming, S., Pfeffer, M.A., Fries, J., & Shah, N. H. (2023). The shaky foundations of large language models and foundation models for electronic health records. Npj Digital Medicine, 6(1). https://doi.org/10.1038/s41746-023-00879-8

G

Generative Adversarial Network (GAN)

A deep learning-based model architecture that normally consists of two competing neural networks, a generator, and a discriminator. The goal of the “generator” is to synthesize fake data to fool the “discriminator”, while the “discriminator” tries to discriminate between the synthesized examples (generator’s output) and the original training data distribution. The goal of the training is to find a point of equilibrium between the two competing networks, and after the training process, the generator learns to generate new data with the same distribution as the training set. This approach can be used to generate synthetic images.

Source: Adapted from:

Pan, Z., Yu, W., Yi, X., Khan, A., Yuan, F., & Zheng, Y. (2019). Recent Progress on Generative Adversarial Networks (GANs): A Survey. IEEE Access, 7, 36322–36333. https://doi.org/10.1109/access.2019.2905015
Singh, N. K., & Raza, K. (2021). Medical Image Generation Using Generative Adversarial Networks: A Review. In Patgiri, R., Biswas, A., Roy, P. (Eds.), Health Informatics: A Computational Perspective in Healthcare. Studies in Computational Intelligence (pp. 77–96). Springer, Singapore. https://doi.org/10.1007/978-981-15-9735-0_5

Generative Artificial Intelligence (Generative AI)

“The class of AI models that emulate the structure and characteristics of input data in order to generate derived synthetic content. This can include images, videos, audio, text, and other digital content.” (Source: FedRAMP.gov)
This is usually done by approximating the statistical distribution of the input data. For example, in healthcare, generative AI can be used to generate annotations on synthetic medical data (e.g., image features, text labels) to help expand datasets for training algorithms.

Source: Adapted from:

Meskó, B., & Topol, E. J. (2023). The imperative for regulatory oversight of large language models (or generative AI) in healthcare. Npj Digital Medicine, 6(1). https://doi.org/10.1038/s41746-023-00873-0
Zohny, H., McMillan, J., & King, M. (2023). Ethics of generative AI. Journal of Medical Ethics, 49(2), 79–80. https://doi.org/10.1136/jme-2023-108909

Related Term(s): Foundation Models, Large Language Model, Multimodal

H

Human in the Loop Machine Learning

An approach where humans interact with ML models to enhance accuracy and end-user trust in the machine. In human in the loop ML, human interaction is iterative and can lead to continuous performance improvement over time. This interaction is especially relevant in scenarios where the model might be uncertain about its predictions and needs human guidance for verification.

Unlike human in the loop ML, supervised machine learning primarily involves human input during the data labeling phase, after which the algorithm trains independently. Labeling or annotation is the process of attaching descriptive information to data. Data itself are unchanged in the annotation process.

Source: Adapted from Mosqueira-Rey, E., Hernández-Pereira, E., Alonso-Ríos, D., Bobes-Bascarán, J., & Fernández-Leal, Á. (2022). Human-in-the-loop machine learning: a state of the art. Artificial Intelligence Review, 56(4), 3005–3054. https://doi.org/10.1007/s10462-022-10246-w

I

Internet of Things (IoT) Device

“Devices that have at least one transducer (sensor or actuator) for interacting directly with the physical world and at least one network interface (e.g., Ethernet, Wi-Fi, Bluetooth) for interfacing with the digital world.“ (Source: NIST)

For example, in healthcare, IoT devices can include wearable devices like smartwatches that can collect vital signs data, such as heart rate and activity levels, and smart inhalers with sensors that can track medication usage for asthma patients. This data can be sent to a mobile app or a central system and transmitted to healthcare providers to enable remote monitoring of patients.

Source: Adapted from Islam, S. M. R., Kwak, D., Kabir, M. H., Hossain, M., & Kwak, K. (2015). The Internet of Things for Health Care: A Comprehensive Survey. IEEE Access, 3, 678–708. https://doi.org/10.1109/access.2015.2437951

Interoperability

The ability to communicate and exchange data accurately, effectively, securely, and consistently with different information technology systems, software applications, and networks in various settings, and exchange data such that clinical or operational purpose and meaning of the data are preserved and unaltered.

Source: From U.S. Food and Drug Administration. (2024). Study Data Technical Conformance Guide- Technical Specifications Document. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/study-data-technical-conformance-guide-technical-specifications-document (This Technical Specifications Document is incorporated by reference into the Guidance for Industry Providing Regulatory Submissions in Electronic Format – Standardized Study Data. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/providing-regulatory-submissions-electronic-format-standardized-study-data)

Interpretability

Refers to the meaning of AI systems’ output in the context of their designed functional purposes.

Source: From National Institute of Standards and Technology. (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0). https://nvlpubs.nist.gov/nistpubs/ai/nist.ai.100-1.pdf.

Related Term(s): Explainability

L

Large Language Model (LLM)

A type of AI model trained on large text datasets to learn the relationships between words in natural language. These models can apply these learned patterns to predict and generate natural language responses to a wide range of inputs or prompts they receive, to conduct tasks like translation, summarization, and question answering. These models are characterized by a vast number of model parameters (i.e., internal learned variables within a trained model).

LLMs build on foundational AI models by developing more comprehensive language understanding beyond basic linguistic patterns. For example, in the context of LLMs, chatbot is a program that enables communication between the LLM and the human through text or voice commands in a way that mimics human-to-human conversation.

Source: Adapted from:

Gottlieb, S., & Silvis, L. (2023). How to Safely Integrate Large Language Models Into Health Care. JAMA Health Forum, 4(9), e233909. https://doi.org/10.1001/jamahealthforum.2023.3909
Thirunavukarasu, A. J., Ting, D. S. J., Elangovan, K., Gutiérrez, L., Tan, T. F., & Ting, D. S. W. (2023). Large language models in medicine. Nature Medicine, 29(8), 1930–1940. https://doi.org/10.1038/s41591-023-02448-8

Locked Model

A model that provides the same output each time the same input is applied to it and does not change with use, as its parameters or configuration cannot be updated. In case of AI-enabled medical products, locked models can help ensure consistent performance.

Source: Adapted from Benjamens, S., Dhunnoo, P., & Meskó, B. (2020). The state of artificial intelligence-based FDA-approved medical devices and algorithms: an online database. Npj Digital Medicine, 3(1). https://doi.org/10.1038/s41746-020-00324-0

Related Term(s): Continual Machine Learning

M

Machine Learning (ML)

A set of techniques that can be used to train AI algorithms to improve performance at a task based on data.

Source: 15 U.S.C. 9401(3). https://www.govinfo.gov/content/pkg/USCODE-2020-title15/html/USCODE-2020-title15-chap119.htm

Machine Learning Algorithm (ML Algorithm)

Step-by-step procedures or set of instructions followed for performing a task or solving a problem.

For example, in ML, algorithms are used to train models using data to solve a specific problem.

Source: Adapted from El Naqa, I., Murphy, M.J. (2015). What Is Machine Learning? In: El Naqa, I., Li, R., Murphy, M. (eds) Machine Learning in Radiation Oncology. Springer, Cham. https://doi.org/10.1007/978-3-319-18305-3_1

Related Term(s): Machine Learning Model

Machine Learning Algorithmic Bias (ML Algorithmic Bias)

The term “bias” is used in various contexts in different fields and industries. In the context of AI, bias refers to the systematic deviation in model predictions or outcomes for certain data points or groups compared to others. Here we are focusing on, algorithmic bias, where such deviations can stem from various sources, such as the characteristics of the training dataset, choices made during model development, data processing irregularities, or biases introduced during data collection or from human decisions. Algorithmic bias can lead to a systematic difference or error in treatment of certain objects, people, or groups in comparison to others, or prediction failures that can result in other risks, where treatment is any kind of action, including perception, observation, representation, prediction, or decision.

Source: Adapted from International Medical Device Regulators Forum. (2022). Machine Learning-enabled Medical Devices: Key Terms and Definitions. https://www.imdrf.org/documents/machine-learning-enabled-medical-devices-key-terms-and-definitions

Machine Learning Model (ML Model)

A mathematical construct that generates an inference or prediction for input data. This model is the result of an ML algorithm learning from data. Models are trained by algorithms, which are step-by-step procedures used to process data and derive results. AI systems (e.g., AI-enabled medical devices) employ one or more models to achieve their intended purpose.

Source: Adapted from:

International Medical Device Regulators Forum. (2022). Machine Learning-enabled Medical Devices: Key Terms and Definitions. https://www.imdrf.org/documents/machine-learning-enabled-medical-devices-key-terms-and-definitions
International Organization for Standardization. (2022). Information technology — Artificial intelligence — Artificial intelligence concepts and terminology (ISO/IEC 22989:2022). https://www.iso.org/standard/74296.html

Related Term(s): Machine Learning Algorithm

Model Calibration

The process of adjusting predicted probabilities generated by an ML model to ensure that they accurately reflect the observed frequencies of events or outcomes in the real world.

For example, if a model is well calibrated and predicts 20% probability of breast cancer for a patient, then the observed frequency of breast cancer should be approximately 20 out of 100 patients that were given such a prediction by the model.

Source: Adapted from Chen, W., Sahiner, B., Samuelson, F., Pezeshk, A., & Petrick, N. (2018) Calibration of medical diagnostic classifier scores to the probability of disease. Statistical Methods in Medical Research, 27(5). https://doi.org/10.1177/0962280216661371

Model Card

A structured report of relevant technical characteristics of an AI model and benchmark evaluation results relevant to the intended application domains. Model cards also provide information about the context in which models are intended to be used and details of how their performance was assessed.

Source: Adapted from Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., Spitzer, E., Raji, I.D., & Gebru, T. (2019). Model Cards for Model Reporting. Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT* '19). Association for Computing Machinery. https://doi.org/10.1145/3287560.3287596

Related Term(s): Data Card

Model Fitting

The process of training an ML model to capture underlying patterns in the data by adjusting the training parameters to make the model’s predictions as close as possible to the target values in the training data.

This adjustment of the parameters enables the model to generalize its understanding of the data, making it useful for making predictions on new, unseen data. A well-fit model does not overfit or underfit but performs well both on the training data and on new, unseen data, due to correctly capturing the relationships between the input and target variables.

Source: Adapted from Géron, A. (2019). Hands-on Machine Learning with Scikit-Learn, Keras & TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems (2nd ed.). O'Reilly Media, Inc. https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/

Related Term(s): Overfitting, Underfitting

Model Robustness

The ability of an ML model to maintain its target or specified level of performance under different circumstances. These circumstances can include noisy data (e.g., data containing errors, inconsistencies, and missing values), unseen data or data drift, or adversarial attacks that manipulate the data to deceive the model.

For example, in healthcare, challenges in model robustness can arise in medical image classification, where variations in imaging conditions like lighting or resolution, can affect the performance of a tumor classification model trained on standardized images.

Source: Adapted from:

International Organization for Standardization. (2022). Information technology — Artificial intelligence — Artificial intelligence concepts and terminology (ISO/IEC 22989:2022). https://www.iso.org/standard/74296.html
Li, K., DeCost, B., Choudhary, K., Greenwood, M., & Hattrick‐Simpers, J. (2023). A critical examination of robustness and generalizability of machine learning prediction of materials properties. Npj Computational Materials, 9(1). https://doi.org/10.1038/s41524-023-01012-9

Related Term(s): Data Drift

Model Weight

Learnable parameters that encode the core capabilities of an AI model.

Source: Adapted from RAND Corporation. Sella Nevo, Dan Lahav, Ajay Karpur, Yogev Bar-On, Henry Alexander Bradley, Jeff Alstott; Securing AI Model Weights, May 30, 2024. https://www.rand.org/content/dam/rand/pubs/research_reports/RRA2800/RRA2849-1/RAND_RRA2849-1.pdf

Multimodal

An approach for processing and integrating multiple different data types, aiming to capture and leverage the relationships between them for a better understanding of the input information or improved prediction performance. These data types may include text, images, audio, video, genomics, sensor data, etc. These different data types may be processed using a single multimodal network (e.g., based on neural network, or other architectures) or through separate unimodal networks (e.g., LLMs for text and CNNs for images) where the unimodal outputs are combined.

For example, in healthcare, data from electronic health records and wearable biosensors can be combined to enable remote monitoring of patients.

Source: Adapted from:

Acosta, J.N., Falcone, G. J., Rajpurkar, P., & Topol, E. J. (2022). Multimodal biomedical AI. Nature Medicine, 28(9), 1773–1784. https://doi.org/10.1038/s41591-022-01981-2
Kline, A., Wang, H., Li, Y., Dennis, S., Hutch, M., Xu, Z., Wang, F., Cheng, F., & Luo, Y. (2022). Multimodal machine learning in precision health: A scoping review. Npj Digital Medicine, 5(1). https://doi.org/10.1038/s41746-022-00712-8

Synonym(s): Multimodal Approach, Multimodal Learning

N

Natural Language Processing (NLP)

A subfield of AI and linguistics that enables computers to understand, process, interpret, and generate human language. NLP systems can perform tasks such as text classification, sentiment analysis, and translation, using techniques from computational linguistics and ML to process and analyze natural language data.

Natural Language Generation is one application of NLP, which involves using AI systems to produce human-readable text outputs like summaries, reports, stories, or responses.

Source: Adapted from:
International Organization for Standardization. (2022). Information technology — Artificial intelligence — Artificial intelligence concepts and terminology (ISO/IEC 22989:2022). https://www.iso.org/standard/74296.html
Network of the National Library of Medicine. (n.d.). Natural language processing. National Institute of Health. Retrieved July 31, 2024, from https://www.nnlm.gov/guides/data-glossary/natural-language-processing

Neural Network

A computational model inspired by the structure of the human brain. It is composed of interconnected nodes, or “neurons” organized into layers: an input layer that receives data, one or more hidden layers that process and identify patterns in the data, and an output layer that presents the final network output.

Source: Adapted from:

International Organization for Standardization. (2020). Software and systems engineering — Software testing — Part 11: Guidelines on the testing of AI-based systems (ISO/IEC TR 29119-11: 2020). https://www.iso.org/standard/79016.html
Network of the National Library of Medicine. (n.d.). Neural Networks. National Institute of Health. Retrieved July 31, 2024, from https://www.nnlm.gov/guides/data-glossary/neural-networks

Synonym(s): Artificial Neural Network, Neural Net

Related Term(s): Deep Learning

O

Overfitting

In ML, overfitting occurs when a model learns the training data too thoroughly, capturing not just the fundamental patterns, but also noise or random fluctuations. Such a model might excel on the training data, but struggles to generalize to new, unseen data.

Source: Adapted from IEEE Standards. (2022). IEEE Standard for Performance and Safety Evaluation of Artificial Intelligence Based Medical Devices: Terminology (IEEE Std 2802™‐2022). https://standards.ieee.org/ieee/2802/7460/

Related Term(s): Model Fitting, Underfitting

P

Performance Metrics

In the context of AI quantitative or qualitative measures that can be used to assess the ability of a model to produce the desired output for a given task. The choice of the metrics depends on the specific task and the model objectives.

Examples of quantitative metrics include accuracy, precision, sensitivity (recall), specificity, F1-score, and Area under the Receiver Operating Characteristic curve (AUC-ROC). Qualitative measures may involve heatmap evaluations or visual interpretations. These metrics enable systematic evaluation, comparison, and refinement of models, and aid in the assessment of whether the model meets its intended objectives.

Privacy Enhancing Technology

Any software or hardware solution, technical process, technique, or other technological means of mitigating privacy risks arising from data processing, including by enhancing predictability, manageability, disassociability, storage, security, and confidentiality. These technological means may include secure multiparty computation, homomorphic encryption, zero-knowledge proofs, federated learning, secure enclaves, differential privacy, and synthetic-data-generation tools.

Source: Adapted from International Organization for Standardization (2024). Information technology – Security techniques – Privacy framework (INCITS/ISO/IEC 29100:2024 (2024). https://www.iso.org/obp/ui/#iso:std:iso-iec:29100:ed-2:v1:en

Synonym(s): Privacy-preserving technology

R

Reference Standard (in Artificial Intelligence)

The best available method for establishing or measuring the true state or property of the phenomenon being examined, often represented in the form of labeled data in AI. It serves as a benchmark against which the outputs of a model are evaluated. In clinical settings and medical research, a reference standard is a diagnostic measure or method that is considered to be the gold standard clinically and is used to validate the results. For instance, a reference standard can indicate the presence, extent, and location of diseases or abnormalities.

Labeling or annotation is the process of attaching descriptive information to data. Data itself are unchanged in the annotation process.

Source: Adapted from Bossuyt, P. M., Reitsma, J. B., Bruns, D. E., Gatsonis, C.A., Glasziou, P.P., Irwig, L., Lijmer, J. G., Moher, D., Rennie, D., de Vet, H. C.W., Kressel, H. Y., Rifai, N., Golub, R. M., Altman, D. G., Hooft, L., Korevaar, D. A., & Cohen, J. F., for the STARD Group. (2015). STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. BMJ. https://doi.org/10.1136/bmj.h5527

Synonym(s): Gold Standard, Ground Truth

Reinforcement Learning

A ML approach where a model (or agent) learns by taking actions and getting rewards or penalties through its interactions with an environment. The model learns from the consequences of its actions, rather than from being explicitly taught, and selects its actions based on its past experiences (exploitation) and by making new choices (exploration), which is essentially trial and error learning.

For example, in healthcare, reinforcement learning can be used for recommending personalized treatment plans for patients with chronic diseases. The model is given patient data, including their medical history, current health status, and treatment responses, and then suggests a treatment plan. The key is the feedback loop: as patient data is continually updated with information on how well they are responding to the treatment, the model adjusts its recommendations accordingly. This process involves a lot of trial and error, as the model learns from each patient interaction. Over time, through many such interactions, the model becomes more adept at predicting and recommending the most effective treatment plans for individual patients.

S

Self-Supervised Machine Learning

ML algorithms that generate their own labels from the available unlabeled data. Unlike supervised learning, where labeled data are provided, and unsupervised learning, which uncovers hidden patterns without labels, self-supervised learning leverages the inherent structure within the data to create its own labels. This approach is useful when labeled data are limited or unavailable.

Source: Adapted from:

Krishnan, R., Rajpurkar, P., & Topol, E. J. (2022). Self-supervised learning in medicine and healthcare. Nature Biomedical Engineering, 6(12), 1346–1352. https://doi.org/10.1038/s41551-022-00914-1
Shurrab, S., & Duwairi, R. (2022). Self-supervised learning methods and applications in medical imaging analysis: a survey. PeerJ Computer Science, 8, e1045. https://doi.org/10.7717/peerj-cs.1045

Synonym(s): Self-Supervised Learning

Semi-Supervised Machine Learning

ML algorithms that leverage both unsupervised and supervised techniques. Supervised learning techniques are trained using labeled data, while unsupervised learning techniques are trained using unlabeled data. Labeling or annotation is the process of attaching descriptive information to data. Data itself are unchanged in the annotation process.

For example, consider the task of diagnosing lung diseases from chest X-rays. A semi-supervised learning model would initially be trained on a small set of labeled X-ray images, where each image has been marked by radiologists as showing signs of specific lung conditions or being normal. The model then uses this knowledge to start making predictions on a larger set of unlabeled images.

Synonym(s): Semi-Supervised Learning

Supervised Machine Learning

ML algorithms where labeled data is provided, and algorithms are trained using the labeled data. Labeling or annotation is the process of attaching descriptive information to data. Data itself is unchanged in the annotation process.

Synonym(s): Supervised Learning

Synthetic Data

Data that have been created artificially (e.g., through statistical modeling, computer simulation) so that new values and/or data elements are generated. Generally, synthetic data are intended to represent the structure, properties and relationships seen in actual patient data, except that they do not contain any real or specific information about individuals.

For example, in healthcare, synthetic data are artificial data that are intended to mimic the properties and relationships seen in real patient data. Synthetic data are examples that have been partially or fully generated using computational techniques rather than acquired from a human subject by a physical system.

Source: Adapted from:

Chen, R. J., Lu, M.Y., Chen, T.Y., Williamson, D. F. K., & Mahmood, F. (2021). Synthetic data in machine learning for medicine and healthcare. Nature Biomedical Engineering, 5(6), 493–497. https://doi.org/10.1038/s41551-021-00751-8
Giuffré, M., & Shung, D.L. (2023). Harnessing the power of synthetic data in healthcare: innovation, application, and privacy. Npj Digital Medicine, 6(1). https://doi.org/10.1038/s41746-023-00927-3
Myles, P., Ordish, J., & Tucker, A. (2023). The potential synergies between synthetic data and in silico trials in relation to generating representative virtual population cohorts. Progress in Biomedical Engineering, 5(1), 013001. https://doi.org/10.1088/2516-1091/acafbf

T

Test Data

These data are used to characterize the performance of an AI system. These data are never shown to the algorithm during training and are used to estimate the AI model’s performance after training. Testing is conducted to generate evidence to establish the performance of an AI system before the system is deployed or marketed.

For AI-enabled medical products, test data should be independent of data used for training and tuning.

Related Term(s): Training Data, Tuning Data

Training Data

These data are used by the manufacturer of an AI system in procedures and training algorithms to build an AI model, including to define model weights, connections, and components.

Related Term(s): Test Data, Tuning Data

Transfer Learning

A strategic approach within ML wherein a model developed for a particular task is adapted for a second task. This approach leverages the knowledge and patterns acquired from a previously solved problem (source task) to boost the performance and learning efficiency of a model on a subsequent, often similar, problem (target task).

For example, in healthcare, a model trained to identify tumors in lung X-ray images might leverage the learned patterns to improve the identification of abnormalities in liver ultrasound images.

Source: Adapted from Yu, X., Wang, J., Hong, Q., Teku, R., Wang, S., & Zhang, Y. (2022). Transfer learning for medical images analyses: A survey. Neurocomputing, 489, 230–254. https://doi.org/10.1016/j.neucom.2021.08.159

Tuning Data

These data are typically used by the manufacturer of an AI system to evaluate a small number of trained models. This process involves exploring various aspects, including different architectures or hyperparameters (i.e., parameters used to tune the model for the task). The tuning phase happens before the testing phase of the AI system and is part of the training process. While the AI and ML communities sometimes use the term “validation” to refer to the tuning data and phase, the FDA will not typically use the word “validation” in this context due to its specific regulatory definition (see 21 CFR 820.3(z)).

Related Term(s): Test Data, Training Data

U

Underfitting

In ML, underfitting happens when a model does not capture the patterns and complexity of the training data, leading to poor performance on both the training and new, unseen data.

Related Term(s): Overfitting

Unsupervised Machine Learning

ML algorithms that only make use of unlabeled data during training. Unsupervised learning seeks to uncover hidden patterns or structures within the data.

Synonym(s): Unsupervised Learning

W

Watermarking

The process of embedding an identifying pattern in a piece of media in order to track its origin —including into outputs such as images, audio, video, and digital text—for the purposes of verifying the authenticity of the output or the identity or characteristics of its provenance, modifications, or conveyance.

Source: Source: Adapted from Brookings Institute. Detecting AI fingerprints: A guide to watermarking and beyond, Srinivasan, S.; January 4, 2024. https://www.brookings.edu/articles/detecting-ai-fingerprints-a-guide-to-watermarking-and-beyond/?b=1

Feedback: Was the Glossary useful? Let us know what you think. The FDA plans to routinely update the new Digital Health and Artificial Intelligence Glossary as appropriate.

Citation: U.S. Food and Drug Administration. (2024). September 26, 2024. Digital Health and Artificial Intelligence Glossary – Educational Resource https://www.fda.gov/science-research/artificial-intelligence-and-medical-products/fda-digital-health-and-artificial-intelligence-glossary-educational-resource