U.S. flag An official website of the United States government
  1. Home
  2. About FDA
  3. FDA Organization
  4. Office of the Commissioner
  5. Office of the Chief Scientist
  6. National Center for Toxicological Research
  7. Science & Research (NCTR)
  8. NCTR Research Focus Areas
  9. BERTox Initiative
  1. NCTR Research Focus Areas

BERTox Initiative

Initiative to apply LLMs to facilitate analysis of FDA documents and public literature for improved efficiency and accuracy.

BERTox Logo

Objective: To apply the large language models (LLMs) such as BERT and GPT to facilitate analysis of FDA documents and public literature for improved efficiency and accuracy of information retrieval and toxicity assessment.

Introduction: FDA has historically generated and continues to generate multiple documents during the product-review process, which contain text that cannot be readily indexed or mapped onto standard database fields and often lack metadata. Therefore, analysis of semantic relationships plays a vital role in extracting useful information from these FDA documents to facilitate regulatory science research and improve the FDA product-review process. Meanwhile, AI-based Natural Language Processing (NLP) has been focused on developing LLMs trained with large text documents to perform a broad range of NLP tasks. This initiative aims to assess the application of LLMs for FDA documents as well as developing content-specific LLMs to facilitate regulatory science at FDA such as information retrieval and text summarization.

Approaches: BERTox is a suite of NLP applications powered by diverse functions ranging from information retrieval, sentiment analysis, text classification, and Name Entity Recognition (NER). In several pilot studies, the BERTox approach has been applied to drug-induced liver injury classification based on FDA drug labeling, causal inference of the FDA Adverse Event Reporting Systems (FAERS) database, AI bias in interpretation and classification of drug properties (e.g., safety and efficacy), text summarization to provide highlights of labeling sections, and automatic anomaly analysis. The initiative has a specific emphasis on developing responsible AI models with customized LLMs that can be operated in a local environment for specific regulatory applications with understanding of their bias, context of use, causal inference, and explainability.

Potential impact: Reviewing text documents is a crucial step in assessing the safety and efficacy of FDA-regulated products. However, the current manual process is time consuming and resource intensive. BERTox offers a set of LLMs-based AI tools/systems to intelligently process and extract critical information from FDA documents to improve and expedite the product-review process. In addition, BERTox can also serve as an institutional memory to effectively access past documents that are often referenced to ensure consistency and evidence-based decision-making in the review of new products.


  1. A Framework Enabling LLMs into Regulatory Environment for Transparency and Trustworthiness and its Application to Drug Labeling Document.
    Wu L., Xu J., Thakkar S., Gray M., Qu Y., Li D., and Tong W.
    Regulatory Toxicology and Pharmacology. 2024, 149: 105613. doi:10.1016/j.yrtph.2024.105613.
  2. Bidirectional Encoder Representations from Transformers-like Large Language Models in Patient Safety and Pharmacovigilance: A Comprehensive Assessment of Causal Inference Implications.
    Wang X., Xu X., Liu Z., and Tong W. 
    Experimental Biology and Medicine. 2023, 248(21):1908-1917. doi:10.1177/15353702231215895. 
  3. Measurement and Mitigation of Bias in Artificial Intelligence: A Narrative Literature Review for Regulatory Science.
    Gray M., Samala R., Liu Q., Skiles D., Xu J., Tong W., and Wu L. 
    Clinical Pharmacology and Therapeutics. 2023, 115(4): 687-697. doi:10.1002/cpt.3117.

  4. RxBERT: Enhancing Drug Labeling Text Mining and Analysis with AI Language Modeling.
    Wu L., Gray M., Dang O., Xu J., Fang H., and Tong W. 
    Experimental Biology and Medicine. 2023, 248(21):1937-1943. doi:10.1177/15353702231220669.
  5. NeuroCORD: A Language Model to Facilitate COVID-19-Associated Neurological Disorder Studies.
    Wu L., Ali S., Ali H., Brock T., Xu J., and Tong W.
    International Journal of Environmental Research and Public Health. 2022, 19:9974.
  6. DeepCausality: A General AI-Powered Causal Inference Framework for Free Text: A Case Study of LiverTox.
    Wang X., Xu X., Tong W., Liu Q., and Liu Z.
    Frontiers in Artificial Intelligence. 2022, 5:999289.
  7. AI-Based Language Models Powering Drug Discovery and Development.
    Liu Z., Roberts R.A., Lal-Nag M., et al.
    Drug Discovery Today. 2021, 26:2593-2607.
  8. BERT-Based Natural Language Processing of Drug Labeling Documents: A Case Study for Classifying Drug-Induced Liver Injury Risk.
    Wu Y., Liu Z., Wu L., et al.
    Frontiers in Artificial Intelligence. 2021, 4:729834-729834.
  9. DICE: A Drug Indication Classification and Encyclopedia for AI-Based Indication Extraction.
    Bhatt A., Roberts R., Chen X., et al.
    Frontiers in Artificial Intelligence. 2021, 4.
  10. InferBERT: A Transformer-Based Causal Inference Framework for Enhancing Pharmacovigilance.
    Wang X., Xu X., Tong W., et al.
    Frontiers in Artificial Intelligence. 2021, 4:659622-659622.

Resources for You


Subscribe to receive info on NCTR Bioinformatics Tools

Get regular FDA email updates delivered on this topic to your inbox.

Back to Top