Improving the Efficiency and Rigor of Pharmacovigilance at FDA: visualization of multi-source information and unsupervised learning to support causal inference

CERSI Collaborators: Taxiarchis Botsis, MSc, MPS, PhD; Gary Rosner, ScD; Harold Lehmann, MD, PhD; Jarushka Naidoo, MD; Kory Kreimeyer, MSc; Jonathan Spiker, BS

FDA Collaborators: Robert Ball, MD, MPH, ScM; Oanh Dang, PharmD, BCPS

Project Start Date to End Date: 09/01/2018 - 12/04/2019

Regulatory Science Challenge

FDA has a reporting system to which manufacturers are required to report adverse events. Consumers, healthcare practitioners, and others also can report adverse events that occur after use of a drug or a biologic. This FDA Adverse Event Reporting System, or FAERS, receives nearly 2 million reports each year. FDA experts evaluate these reports to identify safety “signals,” which can be indicators of possible adverse events from the use of a drug or biologic.

As part of their evaluation, the experts also consult other sources, such as published articles and drug or biologic product labeling. The number of reports and other sources available today makes it increasingly difficult for human experts to review every report in detail. Solutions are needed to streamline the current processes and help classify the reports based on the quality of the information they contain and to identify medical product safety patterns in the FAERS system.

Project Description and Goals

This project focused on three primary goals. The team:

Developed a system for collecting and combining data from multiple sources and creating powerful visual presentations for the experts
Evaluated and built advanced analytic methods¹ for classifying FAERS reports by the quality of information they contain, and
Used an existing computer analytic tool to simplify the review of the text in the narrative section of the FAERS reports. This project is intended to enhance the review of FAERS reports and other external data.

Research Outcomes/Results

We have investigated two different approaches for classifying postmarket reports by the quality of information they contain with respect to the adverse event. By developing and testing a large number of features around structured fields, narratives, and external information, we have demonstrated that complex narrative and external features are indeed significant in enabling a computer program model to best capture different classification groups of reports.

We also concluded that a two-step approach might best support this task. The first step includes identifying the reports with the lowest quality of information. The second step focuses on distinguishing among the remaining reports those that most likely describe a strong relationship between the drug and the adverse event.

We have also developed an interactive prototype Information Visualization Platform (InfoViP) for monitoring drug safety. InfoViP initially collects information from postmarket reports, product labels, and biomedical literature. It then presents this information to the Safety Evaluators (SEs) at FDA’s Center for Drug Evaluation and Research (CDER), using efficient and compelling visualizations. To build InfoViP, we worked closely with the SEs. SEs described their workflows and expectations around the automation of time-consuming tasks, including the access to external sources and the visualization of all information. We subsequently designed the InfoViP based on SEs’ initial input and additional discussions with them. The InfoViP supports many functionalities, especially the processing of free-text information in all sources and the construction of time plots. Overall, we received positive comments from the SEs during the InfoViP evaluation and were able to address most of their suggestions in the final version.

Research Impacts

We conducted work in one of FDA’s research priority areas, which is to “Develop methods and tools to improve and streamline clinical and postmarket evaluation of FDA-regulated products, including approaches to leveraging large, complex data to inform regulatory decision-making, including use of real-world data sources and mobile technologies.” The interactive prototype InfoViP, which was built in close collaboration with SEs, aggregates and visualizes information from multiple sources, including FAERs, DailyMed (a provider of labeling information), and biomedical literature. An enhanced version of this platform will be installed on FDA’s production environment to inform regulatory decision-making.

Publications: Jonathan Spiker, Kory Kreimeyer, Oanh Dang, Debra Boxwell, Vicky Chan, Connie Cheng, Paula Gish, Allison Lardieri, Eileen Wu, Suranjan De, Jarushka Naidoo, Harold Lehmann, Gary L. Rosner, Robert Ball, Taxiarchis Botsis. Information Visualization Platform for Post-market Surveillance Decision Support. To be submitted to the Applied Clinical Informatics Journal

¹Such as Natural Language Processing and unsupervised learning. Natural Language Processing is a type of technology that enables computers to comb through large texts, and analyze human language, and extract key data. Unsupervised learning is a kind of machine learning method that a computer program can initially learn to find patterns in new data without human interference and then identify the same patterns in other data.