Expanding next-generation sequencing tools to support pandemic preparedness and response

FDA-ARGOS database updates may help researchers rapidly validate diagnostic tests and use qualified genetic sequences to support future product development

Genetic sequencing conceptual illustration

Background | Project description | Project outcomes | Related links

Performer: Embleema and George Washington University
Project leaders: Vahan Simonyan, PhD, and Raja Mazumder, PhD
Contract value: $1.99 million
Contract option award (2022): $3.47 million
Project dates: September 2021 – September 2025

Background

In 2014, FDA established a public database for reference-grade microbial sequences, known as FDA-ARGOS, which contains curated, high-quality genomic sequence data to support research and regulatory decisions. For example, researchers can use the FDA-ARGOS database—a validated source of reference datasets—along with bioinformatics tools to validate the performance, sensitivity, and specificity of diagnostic tests with computer modeling (in silico). FDA-ARGOS genome data could potentially reduce the testing burden on industry by providing a standardized and reliable knowledge base.

FDA-ARGOS genome submissions require that raw reads, assembled genomes, and associated metadata are publicly available. FDA-ARGOS developers initially focused on advancing the technology used to generate the sequences, improving access to raw sequence information and relevant metadata, and generating provenance information that links the microbial organism identification to the sequence reads.

During the initial phase, FDA-ARGOS focused on acquiring high quality sequences by directly sequencing genomes from curated organisms. The self-sequencing is not sustainable and is duplicative of sequencing efforts being performed worldwide. The next step to make FDA-ARGOS more usable and exponentially expansive is to develop tools where sequences can be obtained through submission and mining of publicly available sequence databases.

Project description

Embleema and George Washington University will conduct bioinformatic research and system development, focusing on expanding the FDA-ARGOS database.

This project will expand datasets publicly available in FDA-ARGOS, improve quality control by developing quality matrix tools and scoring approaches that will allow the mining of public sequence databases, and identify high-quality sequences for upload to the FDA-ARGOS database as regulatory-grade sequences. Building on expansions during the COVID-19 pandemic, this project aims to further improve the utility of the FDA-ARGOS database as a key tool for medical countermeasure development and validation.

Project outcomes

The primary outcomes of this project are to:

Identify genomes of microbial species of high clinical relevance that could qualify as regulatory-grade sequences from public resources and generate annotation data model: capture, annotation, and harmonization of sequence data.
Develop novel analysis tools for quality control (QC): more comprehensive quality assessments to improve utility and reliability of the current FDA-ARGOS database in the context of sequence representativeness.
Prepare NCBI submission packages and upload model sequences to the FDA-ARGOS database; and provide documentation, outreach, and training to FDA personnel.

This project was funded through the MCMi Regulatory Science Extramural Research program.