Science & Research
ArrayTrack™, developed by FDA’s National Center for Toxicological Research (NCTR), is an integrated suite designed for management, analysis, and interpretation of microarray experiment data. It is publicly available. There are three integrated components:
- Database: a MIAME-supportive database that stores and annotates essential information for an experiment.
- Library: a number of libraries that provide gene annotation, protein function, and pathway information that are directly hyperlinked within the data-analysis process.
- Tools: analysis tools with an intuitive user interface providing the ability to search, filter, apply statistical operations, and graphically visualize data.
- Accepts microarray data from various platforms or scanners. The upload function has been tested for six platforms (Affy, Agilent, GE Health Care, Applied Biosystems, Illumina, and customized 2-color array).
- Accommodates many toxicological parameters, such as dose, chemicals, treatment schedule, and sacrifice time.
- Accepts Affymetrics (Affy) data in the CEL file format. Converts CEL file to probe set (RMA, DChip, Expresso, Plier, Plier + 16) file. The R server from BioConductor must be installed to use this function.
- Contains a high-throughput data uploading (batch import) function that uploads entire datasets of an experiment in a single procedure.
- Contains a comprehensive reporting system that provides a summary of variables associated with an experiment.
- Allows sharing data easily and safely using a data-security function.
- Allows data exporting of original-data file, image file, CEL file, and export multiple datasets in one spreadsheet.
- Accepts proteomics and metabolomics data.
- IDConverter—allows conversion between about ten different IDs used by various public databases, including GenBank, LocusLink, UniGene, and IMAGE.
- Gene Library and Protein Library—contain the functional information about genes, and proteins for facilitating microarray data interpretation. All of these data are derived from public databases, including LocusLink, GenBank, UniGene, SWISS-PROT, and KEGG (Kyoto Encyclopedia of Genes and Genomes). Users can quickly identify the functional information for a set of significant genes derived from analysis by searching these libraries as well as other similar libraries included in this category.
- Pathway Library—provides a collection of pathways from KEGG (see bullet below) and PathArt (see next bullet). More Pathways will be included with further development of the software. Using this library, users can easily identify a list of statistically significant pathways (Fisher Exact Test) based on a list of genes, proteins, or metabolites. Useful for genomics, proteomics and metabonomics researches.
- PathArt—commercial software from Jubilant Biosys provides manually curated pathways (mainly regulatory and disease pathways) for interpretation of microarray results. ArrayTrack™ is integrated with PathArt. You need to purchase the PathArt license separately from the vendor to use this function in ArrayTrack™.
- KEGG—mainly contains metabolic pathways. ArrayTrack™ is integrated with KEGG. Although KEGG is a public pathway package, commercial users need to contact Pathway Solution, Inc. (firstname.lastname@example.org) to acquire a KEGG license for accessing this function through ArrayTrack™.
- IPA (Ingenuity Pathways Analysis)—delivers systems biology expertise to biologists and bioinformaticians through pathways analysis software, genome-scale computable network databases and knowledge-management services and infrastructure. The user needs to have a license from Ingenuity to log in to IPA through ArrayTrack™.
- GOFFA (Gene Ontology For Functional Analysis)—to analyze microarray results using the GO (Gene Ontology) resources. For example, it is straightforward in GOFFA to determine the statistically significant GO terms corresponding to a list of genes derived from a microarray experiment using Fisher Exact Test. The GOFFA in ArrayTrack™ provides GO path plot, pruned GO tree plot, all gene list and term clustering categorized in molecular function, biological process, and cellular component.
- IPI Library—a nonredundant protein database that is downloaded from the European Bioinformatics Institute (EBI) Web site and
particularly useful for proteomics research.
- Orthologene Library—contains data from the NCBI Homologe database by augmenting with other functional information. This is a resource particularly useful for cross-species research based on gene homology.
- Chip Library—contains all the chips that are used to generate the data stored in the ArrayTrack™ database. The chips are organized according to species, manufacturer, and platform. The manufacturer-provided information for each chip, including sequence information, is also available.
- Toxicant Library and EDKB Library—contain chemical structures together with toxicological endpoints. The chemicals can be directly mapped to various metabolic pathways. Toxicant Library has been initially populated with data from the Carcinogenicity Potency Database and the EDKB Library that contains data associated with endocrine disruptors. Thus, these libraries are useful for integrating the traditional toxicology data with genomics data. Since chemicals with similar structures are likely to exhibit similar biological (or toxicological) activities, we are also implementing an algorithm for assessing structure similarity of chemicals and exploring structure-toxicity relationship based on the substructure features and physicochemical properties derived from the structure.
- Microbial Library—stores bacterial gene data data downloaded from NCBI. Currently holds data for Escherichia coli, Salmonella enterica, Shigella spp., and Vibrio spp.
- SNP (single nucleotide polymorphism)—Data for over 15 million human SNPs were downloaded from the UCSC Genome Bioinformatics Site and the NCBI dbSNP.
- QTL (quantitative trait locus)—Mouse data were taken from the Mouse Genome Database at Jackson Laboratory, and human and rat data were taken from the Rat Genome Database developed by the Medical College of Wisconsin. Useful for finding the overlap between the map position of a gene and QTLs from these species.
Note: All libraries within ArrayTrack™ are interlinked. This is very convenient for users that are searching for multiple types of data.
- Seven normalization methods (MAS5, RMA, dChip and Plier) are available for the Affymetrix cel file. In addition, traditional normalization methods (LOWESS, Linear LOWESS, Total-Intensity Normalization, Mean/Median Scaling and GenePix Mean Log Ratio Normalization, Quantile and Reference Average Comparison Normalization) are also implemented for both one- and two-channel microarray data.
- T-Test—calculates p-values for each gene on the chip. This function contains the standard and Welch t-test as well as the permutation t-test for one and two-class samples.
- ANOVA—allows statistical testing on multiple groups or variables (different from t-test). Currently, only one-way ANOVA is available. High-dimension ANOVA has been implementing and will be available very soon.
- After obtaining p-values using T-test/ANOVA, ArrayTrack™ provides several methods to select a list of significant genes for further analysis or biological interpretation:
- More stringent statistics, such as Bonferroni correction, to select a list of genes
- Benjamini-Hochberg method-based False Discovery Rate (FDR) to identify a list of significant genes
- Ranking genes based on p-value cut-off, fold-change, intensity cut-off or combinations thereof
- Volcano Plot to select a list of genes based on both p-value cut-off and fold-change
- P-Value Plot—determines a list of genes by adjusting the rates of false positives and false negatives
- Two-way Hierarchical Clustering Analysis—provides an unsupervised clustering approach to group samples based on the similarity of gene expression patterns or gene presence/absence calls. Similar genes are also clustered together. The genes can be linked to the ArrayTrack's libraries. The image of the HCA or a sub cluster can be saved.
- Principal Component Analysis—provides another unsupervised learning method to investigate the sample clustering based on the gene expression profiles.
- Correlation Matrix—computes the R correlation coefficients of different arrays and displays the matrix visually in a heat map. The result of R value can be exported. The flag concordance can also be computed.
- Both ScatterPlot and Mixed ScatterPlot—provide pair-wise scatter-plot functions. The ScatterPlot is a function that is specifically applied to two-color array data by plotting cy3 intensity vs. cy5 intensity. Mixed ScatterPlot is a general pair-wise plotting function that allows plotting of any one measure (intensity or ratio) against another similar measure in the same experiment.
- MA Plot—provides two-color array specific function, where the log intensity ratio M = log2(Cy5/Cy3) is plotted against the mean log intensity A = 0.5log2(Cy3xCy5). This function might provide better visual inspection of the concordance and quality of a two-color chip expression data than the scatter plot.
- Virtual Array Viewer—displays expression data in the format of the original array image. This function reconstructs the original array image based on either the raw or normalized expression data and provides a visual representation of data for further exploration, analysis and interpretation. The function is applied to both one channel and two-channel data, including Affy data.
- Rank Intensity Plot—sorts intensities of genes in a descending order along the y-axis, and each gene is given an ordinal number along the x-axis to reflect its relative position on a chip. The shape of the curves characterizes the general properties of the expression data and provides a general assessment of the quality of data. This function is particular useful to examine the quality of a two-color array data. For example, if the green curve represents the cy3-labeled samples while the red curve represents the cy5-labeled sample, a well-balanced two-channel microarray data should show a superimposed or parallel distribution of the green and red lines and the crossover of the green and red lines indicate an unbalanced bias between the two channels.
- BarChart—allows comparison of the expression level of a gene across the array data within a single experiment or across multiple experiments or platforms.
- VennDiagram—displays the overlapping among 2~3 gene lists. The user can draw the diagram by common ID (gene ID, Locus ID, Spot ID, etc.), common pathway, or common GeneOntology.
- Quality Control—enables evaluation of the overall quality of two-color array GenePix data using visual inspection, statistical metrics, and experiment annotation.
- Quality Filtering—provides a means to examine the quality of each spot in a two-color array GenePix data.
- Two-way ANOVA—compares groups with two different variables. A Sources of Variation (SV) plot is available and graphically depicts how much variance are attributed to each of the sources (individual groups, combinations of groups, and error).
- SAM Test (significance analysis of microarrays)—used to determine whether gene expression changes are statistically significant. Several different options are available including one class, multiclass, quantitative, survival, etc.
- K-Means Clustering—separates the population into a predefined number of clusters chosen by the user in which each cluster elements similar to each other.
- K-Nearest Neighbors—used on a predefined grouping of samples and classifies known or unknown data into these groups based on the K closest samples in their feature space where K is chosen by the user.
- Linear Discriminant Analysis—used on a predefined grouping of samples and classifies additional samples into these groups by minimizing variance within each group and maximizing variance between groups, placing similar samples together.
|Collaboration Information:||Dr. Weida Tong (870-543-7142 or email@example.com)|
|Assistance Using ArrayTrack™:||Feng Qian (870-543-7290 or firstname.lastname@example.org)|
|Submit Suggestions:||Dr. Hong Fang (870-543-7538 or email@example.com)|
|Report Technical Problems:||NCTRBioinformaticsSupport@fda.hhs.gov|
ArrayTrack™ is a product designed and produced by the National Center for Toxicological Research (NCTR). FDA and NCTR retain ownership of this product.
© Copyright 2004-2013, NCTR/FDA.