Salmonella Mutagenicity E-state Descriptors
The US Food and Drug Administration (FDA), Center for Drug Evaluation and Research (CDER), Office of Pharmaceutical Science (OPS), Informatics and Computational Safety Analysis Staff (ICSAS) is an applied regulatory research unit that compiles toxicology and safety related databases as a toxicological information resource for the Agency. ICSAS also produces databases suitable for quantitative structure activity (QSAR) modeling and uses these transformed databases to develop toxicology prediction software and to evaluate commercial QSAR, SAR, and data mining software to meet the needs of the FDA, other regulatory agencies, and the scientific community. These efforts are accomplished through research collaborations with software developers using leveraging arrangements such as Material Transfer Agreements (MTAs) and Cooperative Research and Development Agreements (CRADAs). ICSAS’ mission is to develop a complete battery of predictive software for all of the major toxicology studies recommended by the FDA's Centers. The software can be used to: (1) improve lead compound selection by identifying and eliminating compounds with potentially significant adverse properties early in the drug discovery and development process; (2) reduce the use of animals in testing by eliminating non-critical laboratory studies; (3) facilitate and accelerate the review process by making better use of accumulated scientific knowledge (regulatory decision support); and (4) expand the role of QSAR and predictive toxicology by encouraging the development of complementary predictive software systems through collaboration with software developers and the scientific community.
In the recently published research article, Contrera, J.F., Matthews, E.J., Kruhlak, N.L., and Benz, R.D. (2005) In Silico Screening of Chemicals for Bacterial Mutagenicity Using Electrotopological E-state Indices and MDL QSAR Software, Regulatory Toxicology and Pharmacology 43:313-323, we described the relationship between the values of electrotopological descriptors of a molecule and whether the molecule has the ability to mutate Salmonella. The abstract for this article is as follows:
Quantitative structure activity relationship (QSAR) software offers a rapid, cost effective means of prioritizing the mutagenic potential of chemicals. MDL® QSAR models were developed using atom-type E-state indices and nonparametric discriminant analysis. Models were developed for Salmonella typhimurium gene mutation, combining results from strains TA97, TA98, TA100, TA1535, TA1536, TA1537, and TA1538 (n = 3228), and Escherichia coli gene mutation tests WP2, WP100, and polA (n = 472). Composite microbial mutation models (n = 3338) were developed combining all Salmonella, Escherichia coli, and the Bacillus subtilis rec spot test study results. The datasets contained 74% non-pharmaceuticals and 26% pharmaceuticals. Salmonella and microbial mutagenesis external validation studies included a total of 1444 and 1485 compounds, respectively. The average specificity, sensitivity, positive predictivity, concordance, and coverage of Salmonella models was 76%, 81%, 73%, 78%, and 98%, respectively, with similar performance for the microbial mutagenesis models. MDL® QSAR and discriminant analysis provides rapid and highly automated mutagenicity screening software with good specificity, sensitivity, and coverage that is simpler and requires less user intervention than other similar software. MDL® QSAR modules for microbial mutagenicity can provide efficient and cost effective large scale screening of compounds for mutagenic potential for the chemical and pharmaceutical industry.
E-state indices are a combination of electronic, topological, and valence state information that incorporate information related to atom types and electron accessibility that are influenced by all of the structural features of a molecule. In general, atom-type E-state descriptors along with molecular connectivity indices are the most useful descriptors for QSAR modeling of biological endpoints. The ability of the simple molecular E-state and connectivity indices to categorize a set of molecular structures has been demonstrated. The training data set of molecules is partitioned to group compounds with high potential for a particular property (i.e. high mutagenic potential or risk). These compounds are more closely associated with each other than with those that are associated with low mutagenic risk. The atom-type E-state structure descriptors have been shown to organize molecular structures in a chemically meaningful manner, emphasizing electronic molecular information. Based on the structure space provided by the atom type E-state descriptors, excellent similarity searches through a chemical database have been reported.
We developed MDL® QSAR models from 46 E-state descriptors and associated E-state atom counts (E-state, E-state _acnt). E-state atom counts are a tally of the number of each E-state descriptors present in each molecule. For example, a compound with two SsCH3 E-state descriptors would have an SsCH3 atom count of 2. Essentially all of the 46 available MDL® QSAR atom type E-state descriptors are represented by molecules in the databases and all were used to create mutagenicity models; however, most individual compounds in the databases are described by fewer than 12 E-state descriptors.
The linked database presents the e-state descriptors of 367 chemicals tested for their association with bacterial mutagenic potential.
Salmonella Mutagenicity E-state Descriptor Database (367 chemicals)
[Please note that this database is in the form of a 231KB Excel spreadsheet.]
Link to free Excel Viewer 2003 [external link]
The database contains the following fields:
- The first column lists the generic name of each chemical
- The second column shows chemical structures in Simplified Molecular Input Line Entry System (SMILES) code format (generated using MultiCASE Inc.'s MC4PC program)
- The third column gives an indication of whether the chemical has (+) or has not (-) been found to be a mutagen in Salmonella
- All remaining columns show E-state indicator values for the chemicals
All data contained in this database are non-proprietary.
Comments or corrections of these data should be sent to: R.Daniel.Benz@fda.hhs.gov.
PDF requires the free Adobe Acrobat Reader