Statistical Assessment Methodology for Diagnostics and Biomarkers
Contact
Frank Samuelson, Ph.D.
Weijie Chen, Ph.D.
Upper left: densities of reader scores; Lower left: cumulative distributions of reader scores; Right: ROC curve, a plot of one distribution against another.
This plot shows the average AUC performance of a classifier on a new data set (open symbols) and on the same data set used to train the classifier (filled symbols) as a function of the dimensionality (D) of the data and the sample size used to train the classifier.
Summary
The ability of the FDA to assess the effectiveness of imaging devices and predictive biomarkers depends strongly on the FDA’s ability to evaluate (1) study designs for these devices, and (2) the statistical significance of data collected from these studies.
The program aim is to develop mathematical and statistical methodologies for designing, analyzing, and evaluating studies of diagnostic devices in imaging, predictive biomarkers, and computer algorithms based on imaging or biomarker data. Our research includes methodologies for evaluating the effectiveness of these devices at various levels: image quality assessment with model observers; the assessment of the device’s stand-alone performance with clinical data; and, when necessary, the assessment of physicians’ performance in using the device for a diagnostic task (i.e., reader studies). This includes methods for establishing the effectiveness of biomarkers or algorithms from survival outcome data and two-class receiver operating characteristic (ROC) data, which have a long history at the FDA. One emphasis of our research is to appropriately account for various sources of variability in an evaluation study for valid statistical inference; for example, case variability and reader skill are two important sources of variability.
In addition, the program examines data from published studies and studies submitted to the FDA to guide our methodological research and to produce recommendations for future FDA submissions.
Personnel
FDA Staff:
Frank Samuelson, Ph.D.
Weijie Chen, Ph.D.
Brandon Gallas, Ph.D.
Berkman Sahiner, Ph.D.
Nick Petrick, Ph.D.
Kyle Myers, Ph.D.
ORISE Fellow:
Zhipeng Huang, Ph.D.
External collaborators
Steve Hillis, Ph.D., University of Iowa
Craig Abbey, Ph.D,, University of California Santa Barbara
Le Kang, Ph.D., Virginia Commonwealth University
Public domain software
iMRMC Software
The primary objectives of iMRMC applications are to assist investigators with analyzing and sizing multi-reader multi-case (MRMC) reader studies that compare the difference in rates of performance or agreement (binary data) or the difference in the area under ROC curves (AUCs from ROC data).
Assessment of Classifiers
These are software tools for the evaluation of classifiers/algorithms that predict binary or survival time outcomes.
IQmodelo: Statistical Software for Image Quality Assessment with Model Observers
Selected peer-review publications
- Chen et al., The Average Receiver Operating Characteristic Curve in Multi-reader Multi-case Imaging Studies
- Samuelson et al., Inference Based on Diagnostic Measures from Studies of New Imaging Devices
- Abbey et al., Comparative statistical properties of expected utility and area under the ROC curve for laboratory studies of observer performance
- Kang et al., Comparing two correlated C indices with right-censored survival outcome: a one-shot nonparametric approach
- Chen et al., Multireader multicase reader studies with binary agreement data: simulation, analysis, validation, and sizing
- Gallas et al., One-Shot Estimate of MRMC Variance: AUC
- Gallas et al., A Framework for Random-Effects ROC Analysis: Biases with the Bootstrap and Other Variance Estimators
- Gallas et al.,Generalized Roe and Metz ROC Model: Analytic Link Between Simulated Decision Scores and Empirical AUC Variances and Covariances
- Wunderlich et al., Nonparametric estimation receiver operating characteristic analysis for performance evaluation on combined detection and estimation tasks
- Chen W, et al, N. Calibration of Medical Diagnostic Classifier Scores to the Probability of Disease