Archived Content

The content on this page is provided for reference purposes only. This content has not been altered or updated since it was archived.

Vaccines, Blood & Biologics

Evolving Data Mining System is Designed to Speed Identification of Adverse Events Following Vaccinations

The US Food and Drug Administration (FDA) has developed a computerized text mining system that will help agency scientists quickly review reports submitted to the Vaccine Adverse Event Reporting System (VAERS) to find evidence for adverse events following vaccination. Text mining is the process by which a specially designed computer program extracts specific high-quality information from a large amount of text, usually through recognizing specific terms. This enables scientists to analyze the data more thoroughly that would otherwise be impossible.

The research is being led by the Office of Biostatistics and Epidemiology (OBE) in the Center for Biologics Evaluation and Research.

VAERS is a so-called “spontaneous reporting system” (SRS), that is, a voluntary online reporting system that physicians, consumers, and others use to report what they believe are adverse events linked to the use of a particular vaccine. FDA scientists routinely analyze these reports to determine if any reports citing adverse events that occur after vaccinations are actually linked to a particular vaccine; or if they represent medical problems that occurred by coincidence after vaccination.

The OBE project is important because a text mining system designed specifically for VAERS could shorten the time (e.g., from weeks to hours) it takes for FDA to determine if a vaccine is causing rare adverse events. For example, during major events such as the pandemic of Influenza A (H1N1) 2009 (swine flu), when a new vaccine was rapidly deployed, a team of FDA medical experts had to spend considerable time and effort to rapidly analyze VAERS reports for evidence of adverse events following vaccination. A text mining system could reduce significantly the time and effort needed to respond quickly to such problems.

The existing commercially available statistical data mining approaches that are being used at the FDA are limited in their ability to thoroughly evaluate data and find links between the administered vaccines and the reported adverse events; therefore, the medical experts at the FDA must spend significant time to manually analyze individual VAERS reports.

To overcome that problem, OBE developed a computer program that can quickly find the various terms of interest in these reports and organize them in a way that facilitates their review as well as further research at the FDA. The long-term goal is to develop high-performing methods that will automatically generate alarms for unexpected adverse events related to the administration of vaccines.

In the initial phase of development, the OBE researchers used a process called text classification (TC) to extract information from VAERS about anaphylaxis (potentially fatal allergic reaction) linked to the vaccination with the vaccine for Influenza A (H1N1) 2009. TC is a process of assigning labels to a document for a specific concept based on a set of key words or phrases in the text; in this case TC labeled spontaneous reports for anaphylaxis based on terms such as facial swelling, urticaria (redness) in face/neck. Those terms are linked to the diagnosis, symptoms, or even treatment of anaphylaxis described in a VAERS report; they might suggest that the patient experienced anaphylaxis following vaccination.

Building upon the initial TC work, the OBE team worked on a more sophisticated text mining strategy called the, Vaccine adverse event Text Mining (VaeTM) system. VaeTM uses dedicated grammar rules to identify key phrases routinely used in VAERS reports, such as symptoms, diagnoses, drugs, medical and family history, vaccines, and information on when events occurred or were noticed. This work represents the first effort to create a fully automated tool of this type for quickly and automatically extracting information from VAERS.

The development of VaeTM is important because VAERS reports require intensive review by the medical experts at FDA. The different groups who report adverse events related to the use of medical products generally use somewhat different terminologies to describe similar products, medical procedures, and adverse events, depending on their level of knowledge and experience. While standardized coding languages are helpful, they do not currently capture the full range and complexity in the description of clinical events. Text mining provides a complementary approach that allows more complete extraction of information from the reports. The VaeTM’s ability to process thousands of reports in only a few seconds and extract the key information for medical products could significantly shorten the amount of time experts must spend analyzing these reports.

When fully evaluated and put into operation, VaeTM will be a valuable tool for identifying rare safety issues that are recognized only after a vaccine is approved for marketing and is used in populations larger and more diverse than those participating in clinical trials.

The OBE team further refined their ability to analyze VAERS data by using network analysis, a method that enables users to view multiple relations among the vaccines and adverse events reported by patients, families, physicians, or others.

OBE is now working to extend their work to the Adverse Event Reporting System (AERS), a database that collects information on adverse events reported for drugs and therapeutic biologic products. FDA uses AERS to look for new safety concerns that might be related to a marketed product; it receives reports from healthcare professionals, consumers, and others.

The above summary is based on the following four articles:


“Text mining for the Vaccine Adverse Event Reporting System: medical text classification using informative feature selection”

Journal of the American Medical Informatics Association

2011 September; 18(5): 631-638

DOI: 10.1136/amiajnl-2010-000022


Taxiarchis Botsis, 1,2 Michael D Nguyen,1 Emily Jane Woo,1 Marianthi Markatou,3,4 and Robert Ball1

1Office of Biostatistics and Epidemiology, Center for Biologics Evaluation and Research (CBER), Food and Drug Administration (FDA), Rockville, Maryland, USA

2Department of Computer Science, University of Tromsø, Tromsø, Norway

3Department of Statistical Sciences, Cornell University, New York, New York, USA

4IBM T.J. Watson Research Center, Hawthorne, New York, New York, USA

Corresponding author:


“Network analysis of possible anaphylaxis cases reported to the US vaccine adverse event reporting system after H1N1 influenza vaccine”

Studies in Health Technology and Informatics

2011; 169:564-8

doi: 10.3233/978-1-60750-806-9-564


Taxiarchis Botsis,a,b,1, Robert Balla

aOffice of Biostatistics and Epidemiology, Center for Biologics Evaluation and Research, Food and Drug Administration, Rockville, MD

b Department of Computer Science, University of Tromsø, Tromsø, Norway

1Corresponding author:


“Vaccine adverse event text mining system for extracting features from vaccine safety reports”

Journal of the American Medical Informatics Association

2012 Nov 1;19(6):1011-8.

doi: 10.1136/amiajnl-2012-000881. Epub 2012 Aug 25.


Taxiarchis Botsis1,2 , Thomas Buttolph1 , Michael D. Nguyen1, Scott Winiecki1, Emily Jane Woo1, Robert Ball1

1Center for Biologics Evaluation and Research (CBER), Food and Drug Administration (FDA), Rockville, MD

2Department of Computer Science, University of Tromsø, Tromsø, Norway



“Can Network Analysis Improve Pattern Recognition Among Adverse Events Following Immunization Reported to VAERS?”


Robert Ball and Taxiarchis Botsis

Offrice of Biostatistics and Epidemiology, Center for Biologics Evaluation and Research, US Food and Drug Administration, Rockville, MD

Correspondence: T. Botsis (

Page Last Updated: 01/22/2015
Note: If you need help accessing information in different file formats, see Instructions for Downloading Viewers and Players.
Language Assistance Available: Español | 繁體中文 | Tiếng Việt | 한국어 | Tagalog | Русский | العربية | Kreyòl Ayisyen | Français | Polski | Português | Italiano | Deutsch | 日本語 | فارسی | English