Understanding Female-Specific Treatment Outcomes in Inflammatory Bowel Disease Using Novel Artificial Intelligence Tools

CERSI Collaborators: Vivek Rudrapatna, MD, PhD, Michelle Wang, PharmD, PhD, Joanne Chun, PharmD, PhD
FDA Collaborators: Jae (Mike) Lee, PhD, Artur Belov, PhD, Asha Willis, MD, Zhihua Li, PhD
Project Start Date: July 11, 2024

Regulatory Science Challenge:

Inflammatory bowel disease (e.g., Crohn’s disease and ulcerative colitis) is a chronic immune disorder of the gastrointestinal tract with several FDA approved treatments such as small molecules and biologics. However, the effects of these drugs on many unique aspects of female physiology, such as conception, pregnancy, lactation, and menopause, are less well-known, as these patients are typically not recruited into pivotal clinical trials.

Due to the lack of data from clinical trials, real-world studies are needed to evaluate the safety and effectiveness of IBD treatments in women, particularly in subpopulations such as those experiencing pregnancy, lactation, and menopause. Electronic health records (EHR) data is a promising source of complementary evidence to understand treatment outcomes, including safety and effectiveness, as well as their impact on female-specific health factors. However, more accurate methods for analyzing clinical notes are needed. This study will develop and evaluate large language model (LLM)-based computational approaches to extract granular variables from clinical notes, enabling more accurate assessments of drug safety and effectiveness in real-world patient populations. By leveraging LLMs and other advanced data science methods, this study aims to generate high-quality real-world evidence on the safety and effectiveness of novel IBD therapeutics in women.

Project Description and Aims/Goals:

The primary goal of this project is to explore the potential association of FDA-approved treatments for active IBD and female-specific outcomes, overall and in understudied subgroups (e.g., women of childbearing age, pregnant patients, and those in menopause). Secondary goals include 1) identifying new patient-level predictors of treatment outcomes in women, and 2) assessing the value of large language models (LLMs) to harmonize both structured and unstructured text data in the EHR to improve the quality of real-world evidence. The specific aims are (1) to develop and test language model-based methods for curating multimodal EHR data in preparation for studying women with IBD, and (2) to study treatment outcomes in the overall cohort and as well as subgroups and identify novel predictors of outcomes using statistical methods. The data source to be used includes de-identified structured and unstructured electronic health records at University of California, San Francisco and the structured electronic health records from the affiliated hospitals and clinics managed by the San Francisco Department of Public Health. GPT-4o, available through the HIPAA-compliant Azure API, will be the primary LLM used, with the option to expand to open-source models hosted on-premise.

Anticipated Outcomes/Impact:

Investigators expect that this project will generate high quality real-world evidence that enhances FDA’s understanding of the safety and impact of FDA-approved treatments on women with IBD, through using novel artificial intelligence (AI) computational tools. This evidence may potentially help 1) modernize development and evaluation of FDA-regulated products, 2) improve clinical care, and 3) aid regulatory decision making by strengthening post-market surveillance and labeling of existing and future therapeutics for IBD. Furthermore, investigators plan to present this work at conferences, communicate with expert groups, and produce manuscripts for publication to maximize knowledge dissemination and promote utilization of the LLM tool in real-world evidence generation.