U.S. flag An official website of the United States government
  1. Home
  2. Science & Research
  3. Data Mining
  4. Data Mining at the Center for Biologics Evaluation and Research
  1. Data Mining

Data Mining at the Center for Biologics Evaluation and Research

Evaluation of spontaneous reports of adverse events following the administration of CBER-regulated medical products remains a key component of CBER’s safety surveillance strategy. Continuing to improve the efficiency of analysis of this data and the validity of inferences drawn from it are important goals of CBER’s work in this area. CBER has been involved with data mining of spontaneous reports for more than a decade with an emphasis on the Vaccine Adverse Event Reporting System (VAERS). Previous work focused on Empirical Bayesian (EB) methods, the effects of stratification, and the use of simulations, as well as policy implications for data mining. We followed CDER’s adoption of EB methods for safety surveillance, expanding its application to CBER regulated products such as vaccines and biologics.

Over the last 5 years, CBER has launched a new program for the development of advanced methodologies for safety surveillance including, but not limited to, the natural language processing (NLP) of free-text report narratives and network analysis of adverse event reports. We continue to search for better ways to leverage novel and existing approaches by incorporating these into our regulatory environment and evaluating their impact to improve our ability to fulfill our public health mission.

Current Initiatives

 We have developed the Event-based Text-mining of Health Electronic Records (ETHER) system that retrieves key clinical and temporal information from free-text VAERS and FAERS narratives and combines it with data from structured fields to support safety surveillance. We have also built the Pattern-based and Advanced Network Analyzer for Clinical Evaluation and Assessment (PANACEA), which supports the application of various pattern recognition and network analysis approaches to adverse event reports. ETHER and PANACEA are tightly integrated and installed on the FDA servers to support medical experts and epidemiologists in the decision making process.

ETHER is an advanced NLP tool that processes the adverse event free-text descriptions (i.e. narratives) found in post-market reports and extracts clinical terms and time statements. Each clinical term is assigned one of the following types: vaccine, primary diagnosis, secondary diagnosis, medical history, or family history. ETHER also generates a coded version for each term using a standardized dictionary. A complete example is shown in the figure below, where clinical terms and time statements are highlighted in the narrative text in the top box. The middle box shows the clinical terms grouped by type and the conversion to coded terms. In the bottom box, the coded terms for the diagnostic feature types (primary and secondary diagnosis, and symptoms) are used in PANACEA to represent the clinical features of this particular report, which is only one among many.

PANACEA supports the analysis of adverse event reports using multiple network types. In a report network, like the one shown on the left in the figure, the dots represent adverse event reports and are connected when two reports contain the same coded term(s). In the element network on the right, the dots and the squares represent the coded terms and vaccine names found in the reports. In this network type, two elements are connected when they co-appear in at least one report. Multiple analyses can be performed in PANACEA and findings can be further explored in ETHER.. For example, the user may select a group of highly connected reports and launch ETHER to view the narratives of these reports.

A patients

In addition to internal efforts, the Office of Biostatistics and Epidemiology at CBER is collaborating with the Division of Cancer Prevention and Control at the Centers for Disease Control and Prevention in a big data mining project funded by the Office of The Assistant Secretary for Planning and Evaluation. This interagency effort will result in the development of a NLP Workbench Web Service that will structure and code unstructured data from the cancer and safety surveillance domains (https://aspe.hhs.gov/os-pcortf-funded-projects). Guidance will be further provided to federal and public health agencies, as well as the research community, on how to use and expand this open-source resource that will be released to the public in the summer of 2018. 

Publications of Interest

1. Kreimeyer K, Menschik D, Winiecki S, Paul W, Barash F, Woo EJ, Alimchandani M, Arya D, Zinderman C, Forshee R, Botsis T. Using Probabilistic Record Linkage of Structured and Unstructured Data to Identify Duplicate Cases in Spontaneous Adverse Event Reporting Systems. Drug Saf. 2017 Mar 14.

2. Botsis T, Jankosky C, Arya D, Kreimeyer K, Foster M, Pandey A, Wang W, Zhang G, Forshee

R, Goud R, Menschik D, Walderhaug M, Woo EJ, Scott J. Decision support environment for medical product safety surveillance. J Biomed Inform. 2016 Dec;64:354-362.

3. Wang W, Kreimeyer K, Woo EJ, Ball R, Foster M, Pandey A, Scott J, Botsis T. A new algorithmic approach for the extraction of temporal associations from clinical narratives with an application to medical product safety surveillance reports. J Biomed Inform. 2016 Aug;62:78-89.

4. Baer B, Nguyen M, Woo EJ, Winiecki S, Scott J, Martin D, Botsis T, Ball R. Can Natural Language Processing Improve the Efficiency of Vaccine Adverse Event Report Review? Methods Inf Med. 2016;55(2):144-50.

5. Botsis T, Scott J, Woo EJ, Ball R. Identifying Similar Cases in Document Networks Using Cross-Reference Structures. IEEE J Biomed Health Inform. 2015 Nov;19(6):1906-17.

6. Martin D, Menschik D, Bryant-Genevier M, Ball R. Data mining for prospective early detection of safety signals in the Vaccine Adverse Event Reporting System (VAERS): a case study of febrile seizures after a 2010-2011 seasonal influenza virus vaccine. Drug Saf. 2013 Jul;36(7):547-56.

7. Botsis T, Woo EJ, Ball R. Application of information retrieval approaches to case classification in the vaccine adverse event reporting system. Drug Saf. 2013 Jul;36(7):573-82.

8. Botsis T, Buttolph T, Nguyen MD, Winiecki S, Woo EJ, Ball R. Vaccine adverse event text mining system for extracting features from vaccine safety reports. J Am Med Inform Assoc. 2012 Nov-Dec;19(6):1011-8.

9. Ball R, Botsis T. Can network analysis improve pattern recognition among adverse events following immunization reported to VAERS? Clin Pharmacol Ther. 2011 Aug;90(2):271-8.

10. Banks D, Woo EJ, Burwen DR, Perucci P, Braun MM, Ball R. Comparing data mining methods on the VAERS database. Pharmacoepidemiol Drug Saf. 2005 Sep;14(9):601-9.