Clinical Data for Premarket Submissions
The Center for Devices and Radiological Health (CDRH) accepts and encourages the inclusion of clinical data in electronic (non-PDF) form as supporting material to a premarket (PMA or 510(k)) submission.
The answers to the following questions explain how to create and organize such supporting material. Please read all of the information carefully before submitting clinical data in electronic form. To quickly navigate to a particular question and answer, click on the specific link below.
Note: The information presented below is written for individuals who specifically prepare clinical data.
- Q1. What Premarket Clinical Data Can Be Submitted Electronically?
- Q2. What are Appropriate Transport Formats for Electronic Data?
- Q3. What Are Some Appropriate Data Structures?
- Q4. How Can I Provide Electronic Data to CDRH?
The most readily usable file, called an “analysis dataset,” is one that can be read directly into analytical software (such as spreadsheets or statistical analysis packages) and then utilized with minimal manipulation. An analysis dataset usually includes one line of data for each observation (subject, sample, visit, etc.), as specified in the clinical protocol.
For a particular premarket submission there are likely several analysis datasets that you have created in order to produce the tables and analyses in your submission (e.g., one dataset for effectiveness and a second dataset for adverse events). You may submit all or any subset of these datasets electronically in support of your application.
Here are examples of variables you might have included in your analysis datasets:
- site or laboratory identifiers
- individual components of inclusion/exclusion criteria
- identifiers of important analysis cohorts
- subject demographics (e.g., age, sex, race)
- appropriate covariates (e.g., BMI, etiology, drug regimes)
- randomization status (randomized versus roll-in or other patients)
- randomization outcomes
- follow-up times
- treatments applied (all treatments, with times applied)
- adverse events (with appropriate descriptions).
Statistical computer instruction sets facilitate our review in complex situations
For submissions that require computer-intensive analyses (e.g., submissions using Bayesian methodology where probabilities are calculated by simulation), FDA review of the application is facilitated if you include the statistical instruction sets used by the software to carry out the analysis. Inclusion of the instruction set along with the analysis data set will allow FDA to replicate the data analysis exactly.
The term “statistical instruction sets” refers to the instructions that tell the analysis software exactly how to filter, manipulate, or analyze the data; we do not mean the analysis software itself (e.g., include your SAS or WinBUGS instruction sets but not the SAS or WinBUGS analytical software packages). For the Windows version of the SAS software package that would mean including the files with a .sas extension (the statistical instructions) along with the files with a .sas7bdat or .sd2 extension (the data files).
The standard electronic transport format acceptable to all FDA centers is the public domain SAS XPORT file format, although data for medical device premarket applications to CDRH can be submitted in any file format mutually acceptable to you and CDRH. Formats most commonly used include SAS (as SAS datasets or SAS XPORT files), spread sheets (Microsoft Excel or other), S-Plus or R files, XML files, or ASCII flat files (comma or tab-delimited). These formats are all either directly useable or easily translatable into a format that we can use. The review process is greatly facilitated when we agree in advance on a convenient and reliable format for data exchange.
Data structure: Effectiveness data
One common structure for data relating to device effectiveness is a data set with one line per subject/sample/visit and an appropriate number of columns containing:
- a unique identifier
- baseline covariates
- treatment information
- outcomes for that subject.
Usually every line of data in the analysis dataset includes an appropriate unique identifier for the source of the data on that line. Depending on the kind of study and the kind of device, the identifier could be constructed from one or more of the following components:
- an individual subject ID code
- a treatment or laboratory identifier (for multicenter or multilaboratory studies)
- an identifier for the visit (e.g., visit time, visit number)
- a treatment location (e.g., which eye was treated) identifier
- a sample source identifier (for diagnostic devices where samples come from a bank)
- any other identifier that is necessary to uniquely identify the source of the data on that line.
The unique identifier is often created by concatenating some combination of the above types of variables into a single character string that uniquely identifies the source of each line of data. It is often useful to retain, as separate variables in the dataset, the individual components of the unique identifier.
To preserve patient confidentiality (see 21 CFR 20.63 for some requirements on maintaining patient confidentiality), we encourage you to avoid specific personal identifiers for subjects (name, SSN, etc.) to create the unique identifier.
Time-to-event outcomes (e.g., time until death) are usually recorded in two columns, with the time in the first column and an event code (a variable that indicates the event type, e.g. died or censored) in the second. The event code sometimes indicates cause of death. If there is more than one type of censoring, the event code often also indicates censoring type.
Multiple observations/measurements per subject or sample
If there are multiple observations per subject (e.g., baseline, 6 months, 1 year), or multiple measurements per sample, you may have chosen to organize the data using one of two common methods.
One method is to use multiple lines of data for each subject, corresponding to the number of observations, with an identifier for the observation. In this case, the covariate information (the measurements on variables other than treatment or outcome) can either be (a) repeated on each line; or (b) contained in a separate file with one line per subject, along with a subject identifier.
The second method is to use one line of data for each subject, with appropriately labeled columns for outcomes at each observation time. The column labels usually indicate the time of the observation in that column (e.g., outcome.00mo, outcome.06mo, outcome.12mo).
In general, the method you choose to organize the data might depend on:
- number of observations per subject
- extent of the covariate information
- specified primary analysis
- data management software you are using.
Data Structure: Adverse event datasets
Clinical trial data relating to adverse events is usually structured as one line per adverse event, with columns indicating:
- event type
- event description
- time and severity of the event
- the same unique identifier that can be linked back to the effectiveness data and associated covariate information.
Variable names and missing value codes
Variable names that are relatively short and free of special (non-alphanumeric) characters facilitate error-free conversion of variable names across various analytical programs. Note that headings in spreadsheets are converted into variable names by most import filters; thus special characters in spreadsheet headings can be a problem when moving from one analytical program to another.
Creating custom “error codes” different than those defined by the data analysis package can cause difficulties:
- If they are not formatted in the same way as the data (e.g., a character code when valid data are all numeric).
- If they are not readily distinguishable from valid data (e.g., code “9” when valid data may be 0,1,2,3,4).
The use of these custom codes can introduce errors into data conversion and/or analysis.
Converting data from a relational database
In most cases, the data structure that you used in the construction of the tables and for analysis in the submission will be in a form that is usable by us. Depending on the nature of the clinical study and the complexity of the analysis plan, you may have created several different analysis datasets from the relational database. Data collected as raw tables or the relational database produced from keying in case report forms are not readily usable.
Documentation of datasets
A separate (“metadata”) file can be included to document each dataset. This documentation explains the identity of each variable field (or column) in the data. Data without documentation is not useable. An example of useful documentation is:
The Excel version 6.0 spreadsheet: Study_1_Efficacy
Contains 84 subject records, each with 14 columns
Column 1 ......... Subject ID number
Column 2 ......... Subject weight in kg
Documentation can also include a complete key defining any coded variables, for example:
Column 6 ......... Treatment: P = PTCA, S = stent
This documentation can be included in an accompanying electronic file (word processor, PDF, spreadsheet, or ASCII file) in the same electronic folder as the data files.
Format and documentation of computer code
Computer code is most usefully formatted as standard ASCII text files. Such files can be read into almost any text editor, word processor, or analytical software for examination, printing, or execution.
Compiled code is of limited use to CDRH reviewers. In some instances we may not have the appropriate software to work with compiled code. In other cases, security settings on FDA computers may preclude our running compiled programs.
Clearly documented computer code facilitates the FDA review. For example, if there are multiple programs that need to be run in a particular order to obtain the correct analysis or to filter the data appropriately, then clear documentation of these steps will aid the FDA review. The computer code can be included in the same electronic folder as the data files.
Electronic data should be transferred to CDRH via CD or DVD.
If you choose to also submit an electronic copy of your premarket application, one (the preferred) option is to provide both the electronic copy and the supporting data on the same CD/DVD. See the CDRH website1 for details on how to create the electronic copy portions of the CD/DVD (including some important file and folder naming conventions).
When to provide the CD/DVD
We recommend that the CD/DVD containing the electronic copy and/or the supporting data be included with the paper copies in the initial submission.
Folder naming convention
Regardless of whether the supporting data are submitted to us on the same CD/DVD with an electronic copy of the submission or on a separate CD/DVD, use of the electronic data is facilitated if all of the data files, documentation, and computer code are contained in a single folder (directory) labeled “STATISTICAL DATA.” Having a consistent top-level folder name facilitates uploading of the data into the document repository. This main folder can optionally be divided into sub-folders as you deem appropriate. For further description of the folder (directory) structure of CDs/DVDs that also contain an electronic copy of the submission, see the CDRH web site2.
For the electronic data, documentation, and computer code portion of the CD/DVD, CDRH will accept any file naming convention that is clearly described in a README file.