BERTox Initiative
Initiative to apply LLMs to facilitate analysis of FDA documents and public literature for improved efficiency and accuracy.
Objective: To apply the large language models (LLMs) such as BERT and GPT to facilitate analysis of FDA documents and public literature for improved efficiency and accuracy of information retrieval and toxicity assessment.
Introduction: FDA has historically generated and continues to generate multiple documents during the product-review process, which contain text that cannot be readily indexed or mapped onto standard database fields and often lack metadata. Therefore, analysis of semantic relationships plays a vital role in extracting useful information from these FDA documents to facilitate regulatory science research and improve the FDA product-review process. Meanwhile, AI-based Natural Language Processing (NLP) has been focused on developing LLMs trained with large text documents to perform a broad range of NLP tasks. This initiative aims to assess the application of LLMs for FDA documents as well as developing content-specific LLMs to facilitate regulatory science at FDA such as information retrieval and text summarization.
Approaches: BERTox is a suite of NLP applications powered by diverse functions ranging from information retrieval, sentiment analysis, text classification, and Name Entity Recognition (NER). In several pilot studies, the BERTox approach has been applied to drug-induced liver injury classification based on FDA drug labeling, causal inference of the FDA Adverse Event Reporting Systems (FAERS) database, AI bias in interpretation and classification of drug properties (e.g., safety and efficacy), text summarization to provide highlights of labeling sections, and automatic anomaly analysis. The initiative has a specific emphasis on developing responsible AI models with customized LLMs that can be operated in a local environment for specific regulatory applications with understanding of their bias, context of use, causal inference, and explainability.
Potential impact: Reviewing text documents is a crucial step in assessing the safety and efficacy of FDA-regulated products. However, the current manual process is time consuming and resource intensive. BERTox offers a set of LLMs-based AI tools/systems to intelligently process and extract critical information from FDA documents to improve and expedite the product-review process. In addition, BERTox can also serve as an institutional memory to effectively access past documents that are often referenced to ensure consistency and evidence-based decision-making in the review of new products.
References
Year | Title | Authors | Full Citation |
---|---|---|---|
2024 | 50 Shades of AI in Regulatory Science. | Tong W. and Baran S.W. | 50 Shades of AI in Regulatory Science. Tong W. and Baran S.W. Drug Discovery Today. 2024, 29(8): 104058. doi:10.1016/j.drudis.2024.104058. |
2024 | Assessing the Performance of Large Language Models in Literature Screening for Pharmacovigilance: A Comparative Study. | Li D., Wu L., Zhang M., Shpyleva S., Lin Y.-C., Huang H.-Y., Li T., and Xu J. | Assessing the Performance of Large Language Models in Literature Screening for Pharmacovigilance: A Comparative Study. Li D., Wu L., Zhang M., Shpyleva S., Lin Y.-C., Huang H.-Y., Li T., and Xu J. Frontiers in Drug Safety and Regulation. 2024, 4:1379260. doi:10.3389/fdsfr.2024.1379260. |
2024 | Context is Everything in Regulatory Application of Large Language Models (LLMs). | Tong W. and Renaudin M. GCRSR Interagency LLMs Taskforce. | Context is Everything in Regulatory Application of Large Language Models (LLMs). Tong W. and Renaudin M. GCRSR Interagency LLMs Taskforce. Drug Discovery Today. 2024, 29(4): 103916. doi:10.1016/j.drudis.2024.103916. |
2024 | Description and Validation of a Novel AI Tool, LabelComp, for the Identification of Adverse Event Changes in FDA Labeling. | Neyarapally G.A., Wu L., Xu J., Zhou E.H., Dang O., Lee J., Mehta D., Vaughn R.D., Pinnow E., and Fang H. | Description and Validation of a Novel AI Tool, LabelComp, for the Identification of Adverse Event Changes in FDA Labeling. Neyarapally G.A., Wu L., Xu J., Zhou E.H., Dang O., Lee J., Mehta D., Vaughn R.D., Pinnow E., and Fang H. Drug Safety. 2024, doi:10.1007/s40264-024-01468-8. |
2024 | Text Summarization with ChatGPT for Drug Labeling Documents. | Ying L., Liu Z., Fang H., Kusko R., Wu L., Harris S., and Tong W. |
Text Summarization with ChatGPT for Drug Labeling Documents. |
2024 | A Framework Enabling LLMs into Regulatory Environment for Transparency and Trustworthiness and its Application to Drug Labeling Document. | Wu L., Xu J., Thakkar S., Gray M., Qu Y., Li D., and Tong W. | A Framework Enabling LLMs into Regulatory Environment for Transparency and Trustworthiness and its Application to Drug Labeling Document. Wu L., Xu J., Thakkar S., Gray M., Qu Y., Li D., and Tong W. Regulatory Toxicology and Pharmacology. 2024, 149: 105613. doi:10.1016/j.yrtph.2024.105613. |
2023 | Bidirectional Encoder Representations from Transformers-like Large Language Models in Patient Safety and Pharmacovigilance: A Comprehensive Assessment of Causal Inference Implications. | Wang X., Xu X., Liu Z., and Tong W. | Bidirectional Encoder Representations from Transformers-like Large Language Models in Patient Safety and Pharmacovigilance: A Comprehensive Assessment of Causal Inference Implications. Wang X., Xu X., Liu Z., and Tong W. Experimental Biology and Medicine. 2023, 248(21):1908-1917. doi:10.1177/15353702231215895. |
2023 | Classifying Free Texts Into Predefined Sections Using AI in Regulatory Documents: A Case Study with Drug Labeling Documents. | Gray M., Xu J., Tong W., and Wu L. | Classifying Free Texts Into Predefined Sections Using AI in Regulatory Documents: A Case Study with Drug Labeling Documents. Gray M., Xu J., Tong W., and Wu L. Chemical Research in Toxicology. 2023, 36(8): 1290-1299. doi:10.1021/acs.chemrestox.3c00028. |
2023 | Development of Benchmark Datasets for Text Mining and Sentiment Analysis to Accelerate Regulatory Literature Review. | Wu L., Chen S., Guo L., Shpyleva S., Harris K., Fahmi T., Flanigan T., Tong W., Xu J., and Ren Z. |
Development of Benchmark Datasets for Text Mining and Sentiment Analysis to Accelerate Regulatory Literature Review. |
2023 | Measurement and Mitigation of Bias in Artificial Intelligence: A Narrative Literature Review for Regulatory Science. | Gray M., Samala R., Liu Q., Skiles D., Xu J., Tong W., and Wu L. | Measurement and Mitigation of Bias in Artificial Intelligence: A Narrative Literature Review for Regulatory Science. Gray M., Samala R., Liu Q., Skiles D., Xu J., Tong W., and Wu L. Clinical Pharmacology and Therapeutics. 2023, 115(4): 687-697. doi:10.1002/cpt.3117. |
2023 | RxBERT: Enhancing Drug Labeling Text Mining and Analysis with AI Language Modeling. | Wu L., Gray M., Dang O., Xu J., Fang H., and Tong W. | RxBERT: Enhancing Drug Labeling Text Mining and Analysis with AI Language Modeling. Wu L., Gray M., Dang O., Xu J., Fang H., and Tong W. Experimental Biology and Medicine. 2023, 248(21):1937-1943. doi:10.1177/15353702231220669. |
2022 | DeepCausality: A General AI-Powered Causal Inference Framework for Free Text: A Case Study of LiverTox. | Wang X., Xu X., Tong W., Liu Q., and Liu Z. | DeepCausality: A General AI-Powered Causal Inference Framework for Free Text: A Case Study of LiverTox. Wang X., Xu X., Tong W., Liu Q., and Liu Z. Frontiers in Artificial Intelligence. 2022, 5:999289. |
2022 | NeuroCORD: A Language Model to Facilitate COVID-19-Associated Neurological Disorder Studies. | Wu L., Ali S., Ali H., Brock T., Xu J., and Tong W. | NeuroCORD: A Language Model to Facilitate COVID-19-Associated Neurological Disorder Studies. Wu L., Ali S., Ali H., Brock T., Xu J., and Tong W. International Journal of Environmental Research and Public Health. 2022, 19:9974. |
2021 | AI-Based Language Models Powering Drug Discovery and Development. | Liu Z., Roberts R.A., Lal-Nag M., et al. | AI-Based Language Models Powering Drug Discovery and Development. Liu Z., Roberts R.A., Lal-Nag M., et al. Drug Discovery Today. 2021, 26:2593-2607. |
2021 | BERT-Based Natural Language Processing of Drug Labeling Documents: A Case Study for Classifying Drug-Induced Liver Injury Risk. | Wu Y., Liu Z., Wu L., et al. | BERT-Based Natural Language Processing of Drug Labeling Documents: A Case Study for Classifying Drug-Induced Liver Injury Risk. Wu Y., Liu Z., Wu L., et al. Frontiers in Artificial Intelligence. 2021, 4:729834-729834. |
2021 | DICE: A Drug Indication Classification and Encyclopedia for AI-Based Indication Extraction. | Bhatt A., Roberts R., Chen X., et al. | DICE: A Drug Indication Classification and Encyclopedia for AI-Based Indication Extraction. Bhatt A., Roberts R., Chen X., et al. Frontiers in Artificial Intelligence. 2021, 4. |
2021 | InferBERT: A Transformer-Based Causal Inference Framework for Enhancing Pharmacovigilance. | Wang X., Xu X., Tong W., et al. | InferBERT: A Transformer-Based Causal Inference Framework for Enhancing Pharmacovigilance. Wang X., Xu X., Tong W., et al. Frontiers in Artificial Intelligence. 2021, 4:659622-659622. |