Large Language Models to support Bioequivalence Evaluation

CERSI Collaborators: Russ B Altman, MD, PhD; Percy Liang, PhD; Kathy Giacomini, PhD

FDA Collaborators: Liang Zhao, PhD; Meng Hu, PhD

Project Start Date: August 28, 2023

Regulatory Science Framework

Charge I “Modernize development and evaluation of FDA-regulated project” and Focus Area “C. Analytical and computational Methods.”

Regulatory Science Challenge

The Food and Drug Administration (FDA) faces a significant challenge to efficiently integrate and assess vast amounts of complex data needed to ensure the safety and effectiveness of drugs. One of such regulatory assessment includes bioequivalence (BE) assessment that compare proposed generic drugs to their brand-name counterparts, requiring meticulous analysis of text, images, and tables. However, existing methods can be time-consuming and resource intensive. To mitigate this challenge, researchers are investigating the possibility of using advanced technology—specifically, Large Language Models (LLMs)—as part of the review process. LLMs can interact with various data types, summarize information, and cross-check data integrity, potentially enhancing the FDA's capacity to efficiently assess the data to support drug approval and protect public health.

Project Description and Goals

The aim of this project is to develop a suite of innovative tools leveraging LLMs to assist FDA reviewers in their critical work. The primary goal is to create an interactive expert system trained on public FDA data and related publications to quickly answer queries and summarize complex study information. Researchers also aim to make the model portable and further trained within the FDA firewall. In parallel, researchers will explore natural language processing applications capable of extracting data from images and tables and verifying consistency across different data formats. The project will utilize open-source LLMs and publicly available FDA data. The project will help FDA collaborators understand how LLMs might streamline the BE assessment process and generate a transferrable protocol/process for constructing the FDA homebrew LLMs based on internal data, improving efficiency without compromising the rigorous standards required to safeguard public health.

A Large Language Model was used in the editing process to ensure this text follows plain language guidelines for public readership.