U.S. flag An official website of the United States government
  1. Home
  2. About FDA
  3. FDA Organization
  4. Office of the Commissioner
  5. Office of the Chief Scientist
  6. National Center for Toxicological Research
  7. Science & Research (NCTR)
  8. NCTR Research Focus Areas
  9. BERTox Initiative
  1. NCTR Research Focus Areas

BERTox Initiative

Initiative to apply advanced AI-powered NLP to analyze FDA documents for improved efficiency and accuracy of information retrieval and toxicity assessment.

BERTox Logo

Objective: To apply an advanced AI-powered natural language processing (NLP) technology—bi-directional encoder representations from transformers (BERT)—to analyze FDA documents and public literature with improved efficiency and accuracy of information retrieval and toxicity assessment.

Introduction: FDA has historically generated and continues to generate multiple documents during the product-review process, which contain text that cannot be readily indexed or mapped onto standard database fields and often lack metadata. Therefore, analysis of semantic relationships plays a vital role in extracting useful information from the FDA documents to facilitate regulatory-science research and improve the FDA product-review process. Meanwhile, AI-based NLP has advanced the NLP field significantly by developing language models (LMs) trained with large quantities of biomedical corpora (collections of text) to perform a broad range of semantic text-analysis tasks. This initiative aims to assess the application of LMs for FDA documents by using publicly available LMs (e.g., BERT, BioBERT, and ClinicalBERT) as well as developing content-specific LMs to facilitate regulatory science at FDA.

Approaches: BERTox is a suite of NLP applications powered by diverse functions, such as information retrieval, sentiment analysis, text classification, and name-entity recognition. In several pilot studies, the BERTox approach has been applied to drug-induced liver injury classification based on FDA drug labeling, causal inference of the FDA Adverse Event Reporting Systems database, AI bias in the interpretation and classification of drug properties (e.g., safety and efficacy), text summarization to provide highlights of labeling sections, and automatic anomaly analysis.

Potential impact: Reviewing text documents is a crucial step in assessing the safety and efficacy of FDA-regulated products. However, the current manual process is time-consuming and resource-intensive. BERTox offers a set of AI tools/systems to intelligently process and extract critical information from FDA documents to improve and expedite the product-review process. In addition, BERTox can also serve as an institutional memory to effectively access past documents that are often referenced to ensure consistency and evidence-based decision-making in the review of new products.


  1. AI-Based Language Models Powering Drug Discovery and Development.
    Liu Z., Roberts R.A., Lal-Nag M., et al.
    Drug Discovery Today. 2021, 26:2593-2607.
  2. BERT-Based Natural Language Processing of Drug Labeling Documents: A Case Study for Classifying Drug-Induced Liver Injury Risk.
    Wu Y., Liu Z., Wu L., et al.
    Frontiers in Artificial Intelligence. 2021, 4:729834-729834.
  3. DICE: A Drug Indication Classification and Encyclopedia for AI-Based Indication Extraction.
    Bhatt A., Roberts R., Chen X., et al.
    Frontiers in Artificial Intelligence. 2021, 4.
  4. InferBERT: A Transformer-Based Causal Inference Framework for Enhancing Pharmacovigilance.
    Wang X., Xu X., Tong W., et al.
    Frontiers in Artificial Intelligence. 2021, 4:659622-659622.
  5. NeuroCORD: A Language Model to Facilitate COVID-19-Associated Neurological Disorder Studies.
    Wu L., Ali S., Ali H., Brock T., Xu J., and Tong W.
    International Journal of Environmental Research and Public Health. 2022, 19:9974.
  6. DeepCausality: A General AI-Powered Causal Inference Framework for Free Text: A Case Study of LiverTox.
    Wang X., Xu X., Tong W., Liu Q., and Liu Z.
    Frontiers in Artificial Intelligence. 2022, 5:999289.

Resources for You

Subscribe to receive info on NCTR Bioinformatics Tools

Get regular FDA email updates delivered on this topic to your inbox.

Back to Top