1. Home
  2. Science & Research
  3. About Science & Research at FDA
  4. The FDA Science Forum
  5. Standardizing the Isolation Source Metadata for the Genomic Epidemiology of Foodborne Pathogens Using LexMapr
  1. The FDA Science Forum

2021 FDA Science Forum

Standardizing the Isolation Source Metadata for the Genomic Epidemiology of Foodborne Pathogens Using LexMapr

Authors:
Poster Author(s)
Balkey, Maria, FDA/CFSAN, Batz, Michael, FDA/CFSAN, Gopinath, Gopal, FDA/CVM, Gosal, Gurinder, University of British Columbia , Griffiths, Emma, Simon Fraser University, Tate, Heather, FDA/CVM, Timme, Ruth FDA/CFSAN
Center:
Contributing Office
Center for Food Safety and Applied Nutrition

Abstract

Poster Abstract

Introduction

FDA’s GenomeTrakr is a public/private genomic epidemiology network for foodborne pathogen surveillance, specifically targeting pathogens isolated from food or environmental sources. The raw genome plus a small set of associated metadata are made publicly available at the National Center for Biotechnology Information (NCBI). Metadata include organism name, geographical location, collection date, isolate contributor and isolation source. The isolation source field is currently a free text field, requiring no standard terminologies or structure. As the GenomeTrakr database grew to over 100K isolates and the diversity of isolation sources became more complex, this field became difficult to analyze and interpret using computational approaches.

Purpose

In order to maximize the use of GenomeTrakr data and make this resource FAIR (findable, accessible, interoperable and reusable), we have standardized the metadata for the isolation source of WGS data for publicly available GenomeTrakr records.

Methods

We evaluated and utilized LexMapr, a rule-based text-mining tool, to automate the curation of isolation source metadata and assign categories from the expanded source categorization schema Interagency Food Safety Analytics Collaboration (IFSAC+) based on IFSAC categories. LexMapr processes the text from the isolation source and extracts entities incorporate new standard descriptors for the isolation sour that are mapped to standard ontology terms from relevant ontologies such as: FoodON, ENVO, UBERON, among others.

Results

GenomeTrakr has a total of 9,452 unique isolation sources. LexMapr successfully processed 88% of these records, as determined by manual curation and verification. After the evaluation of LexMapr, 71,886 publicly available records were curated, assigned ontology terms, and categorized using the IFSAC+ categorization schema.

Significance

The use of standard terminologies in the context of metadata for WGS is essential to facilitate data exchange and generate machine-readable resources that can expand our understanding of the dynamics of pathogen transmission across the food chain.


Poster Image
 Preview image of the scientific poster. For more information, please refer to the abstract or download the PDF version of the poster.

Download the Poster (PDF; 8.04 MB)

Back to Top