Data Mining at the National Center for Toxicological Research

NCTR Data Mining Work

National Center for Toxicological Research (NCTR) conducts full range of product safety and translational safety studies in support of FDA’s product portfolio including foods, cosmetics, dietary supplements, human and animal drugs, tobacco and devices. Text mining and data mining are mostly pursued in the Division of Bioinformatics and Biostatistics, and are designed to align with FDA Product Center needs, both current and prospective.

Current Initiatives

Next-generation sequencing (NGS) quality control; application of RNA-seq; quality control metrics for improving reproducibility of SNPs called from DNA-seq data.
Development of qualitative and quantitative prediction models for different toxicological endpoints.
Data analysis and data mining for pathogen detection and characterization.
Safety signal detection and analysis in the FDA’s adverse event reporting systems.
Data mining on FAERS and Electronic Medical Records to detect drug-induced cardiac disorders and drug associations using big data analysis.

Publications of Interest

Luo H, Ye H, Ng HW, Sakkiah S, Mendrick DL, Hong H*. sNebula, a network-based algorithm to predict binding between human leukocyte antigens and peptides. Scientific Reports 2016. 6: 32115.
Zhao W, Chen JJ, Perkins P, Liu Z, Wang Y, Hong H, Tong W, and Zou W*. A novel procedure on next generation sequencing data analysis using text mining algorithm. BMC Bioinformatics 2016, 17:213, DOI 10.1186/s12859-016-1075-9.
Ye H, Wei J, Tang K, Feuers R, Hong H*. Drug repositioning through network pharmacology. Current Topics in Medicinal Chemistry, 2016, 16:30, 3646-3656
Wang, S-H., Ding, Y., Zhao, W., Yu, K., Huang, Y-X., Perkins, R., Zou W*, and Chen, J.J*. Text mining for identifying topics in the literatures about adolescent substance use and depression. BMC Public Health 2016, 16:279, DOI: 10.1186/s12889-016-2932-1.
Ye H, Luo H, Ng HW, Meehan J, Ge W, Tong W, Hong H*. Applying network analysis and Nebula (neighbor-edges based and unbiased leverage algorithm) to ToxCast data. Environment international 2016, 89, 81-92.
Zhao W, Chen JJ, Zou W*. Biomarker identification from next-generation sequencing data for pathogen bacteria characterization and surveillance. Biomarkers in Medicine. 2015, doi:10.2217/bmm.15.88.
Zhao W, Chen JJ, Liu Z, Ge W, Ding, Y, and Zou W*. A heuristic approach to determine an appropriate number of topics in topic modeling. BMC Bioinformatics Proceedings. 2015,16 Suppl 14:S8.
Luo H, Ye H, Ng HW, Shi L, Tong W, Mendrick DL, Hong H*. Machine learning methods for predicting hla–peptide binding activity. Bioinformatics and biology insights, 2015, 9 (Suppl 3), 21.
A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. 2014, Nature Biotechnology 32 (9), 903–914.
The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance. Nature biotechnology, 2014, 32 (9), 926-932.