• Decrease font size
  • Return font size to normal
  • Increase font size
U.S. Department of Health and Human Services

About FDA

  • Print
  • Share
  • E-mail

Harnessing the Potential of Data Mining and Information Sharing

Previous Section: Expedited Drug Development Pathway

As noted in PCAST’s Report to the President on Health Information Technology, IT has the potential to transform healthcare and—through innovative capabilities—improve safety and efficiency in the development of new tools for medicine, support new clinical studies for particular interventions that work for different patients, and transform the sharing of health and research data.

FDA currently houses the largest known repository of clinical data (all of which is de-identified to protect patients’ privacy), including all the safety, efficacy, and performance information that has been submitted to the Agency for new products, as well as an increasing volume of post-market safety surveillance data. The ability to integrate and analyze these data could revolutionize the development of new patient treatments and allow us to address fundamental scientific questions about how different types of patients respond to therapy. It would also provide an enhanced knowledge of disease parameters— such as meaningful measures of disease progression and biomarkers of safety and drug responses that can only be gained by analyses of large, pooled data sets— and would allow a determination of ineffective products earlier in the development process.

Additionally, the ability to share information in a public forum about why products fail, without compromising proprietary information, presents the potential to save companies millions of dollars by preventing duplication of failure. FDA sometimes sees applications from multiple companies for the same or similar products. Although we may have reason to believe that such a product is likely to fail or that trial design endpoints will not provide necessary information based on a previous application from another company, we are currently unable to share this information. As a result, companies may pour resources into the development of products that FDA knows could be dead ends.

To harness the potential of information sharing and data mining, FDA is rebuilding its IT and data analytic capabilities and establishing science enclaves that will allow for the analysis of large, complex datasets while maintaining proprietary data protections and protecting patients’ information.

Scientific Computing and the Science Enclaves at FDA

Historically, the vast majority of FDA de-identified clinical trial data has gone un-mined because of the inability to combine data from disparate sources and the lack of computing power and tools to perform such complex analyses. However the advent of new technologies, such as the ability to convert data from flat files or other formats like paper into data that can be placed in flexible relational database models, dramatic increases in supercomputing power, and the development of new mathematical tools and approaches for analyzing large integrated data sets, has radically changed this situation. Furthermore, innovations in computational methods, including many available as open-source, have created an explosion of statistical and mathematical models that can be exploited to mine data in numerous ways to enable scientists to analyze large complex biological and clinical data sets.

The FDA scientific computing model provides an environment where communities of scientists, known as enclaves, can come together to analyze large, integrated data sets and address important questions confronting clinical medicine. These communities will be project-based and driven by a specific set of questions that will be asked of a dataset. Each enclave is defined by its participants, datasets, and sets of interrogations to be performed on the data. Enclaves may be comprised of internal FDA scientists and reviewers working together or outside collaborators working with FDA scientists under an appropriate set of security controls to protect the sensitive and proprietary data of patients and sponsors, respectively. Engagement of industry sponsors as part of community building will be vigorously pursued, leveraging expertise from the companies that submitted the data in a public-private partnership model.

The scientific computing environment will also provide a dedicated infrastructure for application development and software testing for FDA scientists and reviewers. This will allow FDA staff to develop new applications to improve review, monitoring, and business processes in an environment separate from where regulatory review data is assessed. Additionally, the scientific computing environment will be used to evaluate novel software developed outside of FDA and to rapidly incorporate innovative developments in support of FDA regulatory reviews. This ability to “test drive” new applications outside the regulatory review environment has the potential to shorten traditional FDA development cycles and facilitate the adoption of new software that can enhance quality, efficiency, and accuracy of FDA regulatory reviews, as well as streamline the adaptation of new higher-powered analytical tools into FDA review and research efforts.

The ability to integrate large data sets across multiple clinical trials, post-market surveillance data, and pre-clinical data will enable FDA to generate new insights into a variety of important issues confronting medical product development and use. Examples of such insights include the identification of patient subsets who do or do not respond to a specific therapy during a clinical trial, which has the potential to drive personalized medicine; identification of patient subsets with differential safety profiles, efficacy, or side effects related to age or gender; evaluations of standard of care; analyses of disease progression; assessment of current endpoints based on aggregated data; and potential to generate better endpoints and insight into placebo effects. This work, which will address broader scientific issues, is intended to impact whole product classes and therapeutic areas and will be central to driving innovations in medical product development and basic research.

Modernizing the FDA IT Infrastructure to Support Scientific Computing

FDA is currently embarking on a landmark IT modernization effort. Information Computing Technologies for the Twenty-First Century (ICT21) is a major initiative that lays the foundation for the modernization of the FDA’s aging IT infrastructure and core computing capabilities. The first phase of the ICT21 effort was completed in April 2011 and resulted in the consolidation of FDA’s many disparate systems into two modern data centers and the virtualization of over ninety percent of FDA data.

The first phase of the ICT21 Program anticipates the Data Center Consolidation strategy which was subsequently outlined in White House Chief Information Officer Vivek Kundra’s 25 Point Implementation Plan to Reform Federal Information Technology Management. The purpose of the Data Center Consolidation strategy is to enable rapid migration to a Cloud First policy, using cloud computing technologies to maximize capacity. At its core, a cloud strategy reduces computing to a utility and enables the end user to have simplified, rapid, on-demand access to computing resources.

Building an Infrastructure for Patient-Centered Outcomes Research

FDA is also currently developing the policies, standards, infrastructure, and tools for clinical study data to enable analyses across multiple studies. These investments form the core infrastructure needed to build out a clinical data repository, which can then be expanded and seamlessly linked to other data sources, such as pre-clinical pharmacology and toxicology data or post-market safety surveillance clinical data. Additionally, FDA has launched the Partnerships in Comparative Effectiveness Science (PACES) program to support the development of new mathematical methods to support patient-centered outcomes research. PACES provides funds to pilot out the technical, infrastructure, scientific and legal constructs that will be used as foundations for science computing communities involving FDA scientists and data. These activities will support scientifically sound assessments of medical interventions consistent with FDA’s public health responsibilities.


FDA Innovates
Virtual Patient

Medical device design is highly iterative, and the ability to test novel designs within computer models constructed from digital images of diseased and normal human anatomy could greatly reduce the cost, time, and risk to patients normally involved in producing a new medical device. FDA is in the process of developing a Virtual Physiological Patient—a collection of functional computer models including both normal human anatomy and diseased tissues. These models, which are being developed in partnership with stakeholders, will be made publicly available for medical device companies. Once fully developed, the Virtual Physiological Patient may allow personalization of medical devices so a device can be redesigned to suit an individual patient’s anatomy, physiology, and disease state.


Opportunities through Public-Private Partnerships

FDA is also creating a framework for building collaborative scientific computing communities through public-private partnerships. These public-private partnerships incorporate multiple medical product development companies and will be invaluable for enhancing new leads for drug discovery and development, diagnostic and device development and refinement, and the most efficacious use of products in the real world environment. Additionally, building communities where industry and FDA scientists are working collaboratively to address complex data problem solving will enhance external communications between product sponsors and FDA staff around general product classes and scientific principles. The importance of improved communication cannot be underestimated. These projects may not only result in deeper understanding of diseases and their treatments, but may also lead to a generation of new standards that can be used for regulatory review, resulting in a reduction in scientific uncertainty. Enhanced communications can also help facilitate a better understanding of scientific thinking on both sides, thereby enhancing future sponsor-FDA review discussions; incorporation of academic researchers into these data analyses communities will drive new lines of academic investigation.

Table of Contents: Driving Biomedical Innovation

Next Section: The Future of Medical Devices