U.S. flag An official website of the United States government

On Oct. 1, 2024, the FDA began implementing a reorganization impacting many parts of the agency. We are in the process of updating FDA.gov content to reflect these changes.

  1. Home
  2. About FDA
  3. FDA Organization
  4. Office of the Commissioner
  5. Office of the Chief Scientist
  6. National Center for Toxicological Research
  7. Science & Research (NCTR)
  8. NCTR Research Focus Areas
  9. BERTox Initiative
  1. NCTR Research Focus Areas

BERTox Initiative

Initiative to apply LLMs to facilitate analysis of FDA documents and public literature for improved efficiency and accuracy.

Image
BERTox Logo

Objective: To apply the large language models (LLMs) such as BERT and GPT to facilitate analysis of FDA documents and public literature for improved efficiency and accuracy of information retrieval and toxicity assessment.

Introduction: FDA has historically generated and continues to generate multiple documents during the product-review process, which contain text that cannot be readily indexed or mapped onto standard database fields and often lack metadata. Therefore, analysis of semantic relationships plays a vital role in extracting useful information from these FDA documents to facilitate regulatory science research and improve the FDA product-review process. Meanwhile, AI-based Natural Language Processing (NLP) has been focused on developing LLMs trained with large text documents to perform a broad range of NLP tasks. This initiative aims to assess the application of LLMs for FDA documents as well as developing content-specific LLMs to facilitate regulatory science at FDA such as information retrieval and text summarization.

Approaches: BERTox is a suite of NLP applications powered by diverse functions ranging from information retrieval, sentiment analysis, text classification, and Name Entity Recognition (NER). In several pilot studies, the BERTox approach has been applied to drug-induced liver injury classification based on FDA drug labeling, causal inference of the FDA Adverse Event Reporting Systems (FAERS) database, AI bias in interpretation and classification of drug properties (e.g., safety and efficacy), text summarization to provide highlights of labeling sections, and automatic anomaly analysis. The initiative has a specific emphasis on developing responsible AI models with customized LLMs that can be operated in a local environment for specific regulatory applications with understanding of their bias, context of use, causal inference, and explainability.

Potential impact: Reviewing text documents is a crucial step in assessing the safety and efficacy of FDA-regulated products. However, the current manual process is time consuming and resource intensive. BERTox offers a set of LLMs-based AI tools/systems to intelligently process and extract critical information from FDA documents to improve and expedite the product-review process. In addition, BERTox can also serve as an institutional memory to effectively access past documents that are often referenced to ensure consistency and evidence-based decision-making in the review of new products.

 

References
 

Year Title Authors Full Citation
2024 50 Shades of AI in Regulatory Science. Tong W. and Baran S.W. 50 Shades of AI in Regulatory Science.
Tong W. and Baran S.W.
Drug Discovery Today. 2024, 29(8): 104058. doi:10.1016/j.drudis.2024.104058.
2024 Assessing the Performance of Large Language Models in Literature Screening for Pharmacovigilance: A Comparative Study. Li D., Wu L., Zhang M., Shpyleva S., Lin Y.-C., Huang H.-Y., Li T., and Xu J. Assessing the Performance of Large Language Models in Literature Screening for Pharmacovigilance: A Comparative Study.
Li D., Wu L., Zhang M., Shpyleva S., Lin Y.-C., Huang H.-Y., Li T., and Xu J.
Frontiers in Drug Safety and Regulation. 2024, 4:1379260.
doi:10.3389/fdsfr.2024.1379260.
2024 Context is Everything in Regulatory Application of Large Language Models (LLMs). Tong W. and Renaudin M. GCRSR Interagency LLMs Taskforce.  Context is Everything in Regulatory Application of Large Language Models (LLMs).
Tong W. and Renaudin M. GCRSR Interagency LLMs Taskforce.
Drug Discovery Today. 2024, 29(4): 103916. 
doi:10.1016/j.drudis.2024.103916.
2024 Description and Validation of a Novel AI Tool, LabelComp, for the Identification of Adverse Event Changes in FDA Labeling. Neyarapally G.A., Wu L., Xu J., Zhou E.H., Dang O., Lee J., Mehta D., Vaughn R.D., Pinnow E., and Fang H. Description and Validation of a Novel AI Tool, LabelComp, for the Identification of Adverse Event Changes in FDA Labeling.
Neyarapally G.A., Wu L., Xu J., Zhou E.H., Dang O., Lee J., Mehta D., Vaughn R.D., Pinnow E., and Fang H.
Drug Safety. 2024, doi:10.1007/s40264-024-01468-8.
2024 Text Summarization with ChatGPT for Drug Labeling Documents. Ying L., Liu Z., Fang H., Kusko R., Wu L., Harris S., and Tong W.

Text Summarization with ChatGPT for Drug Labeling Documents.
Ying L., Liu Z., Fang H., Kusko R., Wu L., Harris S., and Tong W.
Drug Discovery Today. 2024, 29(6): 104018. doi:0.1016/j.drudis.2024.104018.

2024 A Framework Enabling LLMs into Regulatory Environment for Transparency and Trustworthiness and its Application to Drug Labeling Document. Wu L., Xu J., Thakkar S., Gray M., Qu Y., Li D., and Tong W. A Framework Enabling LLMs into Regulatory Environment for Transparency and Trustworthiness and its Application to Drug Labeling Document.
Wu L., Xu J., Thakkar S., Gray M., Qu Y., Li D., and Tong W.
Regulatory Toxicology and Pharmacology. 2024, 149: 105613. doi:10.1016/j.yrtph.2024.105613.
2023 Bidirectional Encoder Representations from Transformers-like Large Language Models in Patient Safety and Pharmacovigilance: A Comprehensive Assessment of Causal Inference Implications. Wang X., Xu X., Liu Z., and Tong W.  Bidirectional Encoder Representations from Transformers-like Large Language Models in Patient Safety and Pharmacovigilance: A Comprehensive Assessment of Causal Inference Implications.
Wang X., Xu X., Liu Z., and Tong W. 
Experimental Biology and Medicine. 2023, 248(21):1908-1917. doi:10.1177/15353702231215895. 
2023 Classifying Free Texts Into Predefined Sections Using AI in Regulatory Documents: A Case Study with Drug Labeling Documents. Gray M., Xu J., Tong W., and Wu L. Classifying Free Texts Into Predefined Sections Using AI in Regulatory Documents: A Case Study with Drug Labeling Documents.
Gray M., Xu J., Tong W., and Wu L.
Chemical Research in Toxicology. 2023, 36(8): 1290-1299. doi:10.1021/acs.chemrestox.3c00028.
2023 Development of Benchmark Datasets for Text Mining and Sentiment Analysis to Accelerate Regulatory Literature Review. Wu L., Chen S., Guo L., Shpyleva S., Harris K., Fahmi T., Flanigan T., Tong W., Xu J., and Ren Z.

Development of Benchmark Datasets for Text Mining and Sentiment Analysis to Accelerate Regulatory Literature Review.
Wu L., Chen S., Guo L., Shpyleva S., Harris K., Fahmi T., Flanigan T., Tong W., Xu J., and Ren Z.
Regulatory Toxicology and Pharmacology. 2023, 137: 105287. doi:10.1016/j.yrtph.2022.105287.

2023 Measurement and Mitigation of Bias in Artificial Intelligence: A Narrative Literature Review for Regulatory Science. Gray M., Samala R., Liu Q., Skiles D., Xu J., Tong W., and Wu L. Measurement and Mitigation of Bias in Artificial Intelligence: A Narrative Literature Review for Regulatory Science.
Gray M., Samala R., Liu Q., Skiles D., Xu J., Tong W., and Wu L. 
Clinical Pharmacology and Therapeutics. 2023, 115(4): 687-697. 
doi:10.1002/cpt.3117.
2023 RxBERT: Enhancing Drug Labeling Text Mining and Analysis with AI Language Modeling. Wu L., Gray M., Dang O., Xu J., Fang H., and Tong W.  RxBERT: Enhancing Drug Labeling Text Mining and Analysis with AI Language Modeling.
Wu L., Gray M., Dang O., Xu J., Fang H., and Tong W. 
Experimental Biology and Medicine. 2023, 248(21):1937-1943. doi:10.1177/15353702231220669.
2022 DeepCausality: A General AI-Powered Causal Inference Framework for Free Text: A Case Study of LiverTox. Wang X., Xu X., Tong W., Liu Q., and Liu Z. DeepCausality: A General AI-Powered Causal Inference Framework for Free Text: A Case Study of LiverTox.
Wang X., Xu X., Tong W., Liu Q., and Liu Z.
Frontiers in Artificial Intelligence. 2022, 5:999289.
2022 NeuroCORD: A Language Model to Facilitate COVID-19-Associated Neurological Disorder Studies. Wu L., Ali S., Ali H., Brock T., Xu J., and Tong W. NeuroCORD: A Language Model to Facilitate COVID-19-Associated Neurological Disorder Studies.
Wu L., Ali S., Ali H., Brock T., Xu J., and Tong W.
International Journal of Environmental Research and Public Health. 2022, 19:9974.
2021 AI-Based Language Models Powering Drug Discovery and Development. Liu Z., Roberts R.A., Lal-Nag M., et al. AI-Based Language Models Powering Drug Discovery and Development.
Liu Z., Roberts R.A., Lal-Nag M., et al.
Drug Discovery Today. 2021, 26:2593-2607.
2021 BERT-Based Natural Language Processing of Drug Labeling Documents: A Case Study for Classifying Drug-Induced Liver Injury Risk. Wu Y., Liu Z., Wu L., et al. BERT-Based Natural Language Processing of Drug Labeling Documents: A Case Study for Classifying Drug-Induced Liver Injury Risk.
Wu Y., Liu Z., Wu L., et al.
Frontiers in Artificial Intelligence. 2021, 4:729834-729834.
2021 DICE: A Drug Indication Classification and Encyclopedia for AI-Based Indication Extraction. Bhatt A., Roberts R., Chen X., et al. DICE: A Drug Indication Classification and Encyclopedia for AI-Based Indication Extraction.
Bhatt A., Roberts R., Chen X., et al.
Frontiers in Artificial Intelligence. 2021, 4.
2021 InferBERT: A Transformer-Based Causal Inference Framework for Enhancing Pharmacovigilance. Wang X., Xu X., Tong W., et al. InferBERT: A Transformer-Based Causal Inference Framework for Enhancing Pharmacovigilance.
Wang X., Xu X., Tong W., et al.
Frontiers in Artificial Intelligence. 2021, 4:659622-659622.

 

Resources for You

Subscribe to receive info on NCTR Bioinformatics Tools

Get regular FDA email updates delivered on this topic to your inbox.

Back to Top