UNITED STATES OF AMERICA
FOOD AND DRUG ADMINISTRATION
IMMUNOLOGY DEVICES PANEL OF THE
MEDICAL DEVICES ADVISORY COMMITTEE
NYMOX URINE NEURAL THREAD PROTEIN (NTP) KIT (P040010)
Friday, July 15, 2005
The hearing came to order at 8:30 a.m. in the Grand Ballroom of the Holiday Inn Gaithersburg, 2 Montgomery Village Ave, Gaithersburg, Maryland. Clive Taylor, MD, D. Phil, Presiding.
CLIVE R. TAYLOR, M.D., D. PHIL., CHAIR
SUSANNE M. GOLLIN, PH.D., VOTING MEMBER
JAMES L. GULLEY, M.D., PH.D., VOTING MEMBER
TERRANCE R. LICHTOR, M.D., PH.D., VOTING MEMBER
WILLIAM DUFFELL, JR., PH.D., INDUSTRY REPRESENTATIVE
VELIA BUTCHER, J.D., CONSUMER REPRESENTATIVE
JOSEPH PARISI, M.D., DEPUTIZED VOTING MEMBER
AVINDRA NATH, M.D., DEPUTIZED VOTING MEMBER
OSCAR L. LOPEZ, M.D., DEPUTIZED VOTING MEMBER
BRENT BLUMENSTEIN, PH.D., DEPUTIZED VOTING MEMBER
RUFINA CARLOS, B.S., EXECUTIVE SECRETARY
STEVEN GUTMAN, M.D., FDA
Call to Order and Introductions
Dr. Clive Taylor.......................... 4
Conflict of Interest Statement
Ms. Jenny Slaughter....................... 7
Ms. Rufina Carlos......................... 9
Critical Path Initiative
Dr. Sousan Altaie........................ 11
Role of OSB in the Review of Postmarket
Dr. Susan Gardner........................ 18
Open Public Hearing
Dr. Stephen McConnell.................... 26
Dr. Paul Averback........................ 28
Dr. Daniel Bloch......................... 36
Dr. Patricio Reyes....................... 43
Dr. Ralph Richter........................ 52
Panel Questions.......................... 71
Dr. Robert L. Becker..................... 86
Dr. Marina Kondratovich................. 101
Dr. Ranjit Mani......................... 119
Panel Discussion.............................. 138
Question 1.............................. 183
Question 2.............................. 190
Question 3.............................. 201
Question 4.............................. 202
Open Public Hearing........................... 215
Panel Deliberations and Vote.................. 219
DR. TAYLOR: Good morning. My name is Clive Taylor and I am now going to call this meeting of the Immunology Panel to order. With regard to the record, the voting members present constitute a quorum required by 21 C.F.R. This is the majority of voting members. The minimum is normally seven. At this time I would like to ask each panel member sitting at the table to introduce him- or herself, and to state his or her specialty, position, title, institution, and status on the panel. We begin with Mrs. Butcher on my left.
MS. BUTCHER: Good morning, I'm Vicky Butcher -- Velia. I am the consumer rep on this panel, and I represent the ANMA, the auxiliary to the National Medical Association, as well as my non-profit Water for Children Africa.
DR. DUFELL: Good morning, I'm Bill Duffell. I'm the industry representative on the panel. I work for Gambro BCT in Denver, Colorado. My area of expertise is clinical trials, biostatistics, behavioral sciences, and product development.
DR. GOLLIN: My name is Susanne Gollin. I'm a professor at the University of Pittsburgh Graduate School of Public Health in human genetics. My areas of expertise are public health genetics biomarkers, and cancer.
DR. LICHTOR: My name is Terry Lichtor. I'm a neurosurgeon at Rush University in Chicago. And my research interests are in developing brain tumor vaccines.
DR. GULLEY: I'm James Gulley. I'm a medical oncologist working at the National Cancer Institute. My area of interest and expertise is in immunology and immunotherapy for cancers.
DR. TAYLOR: And I'm Clive Taylor. I'm a Professor and Chair of the Department of Pathology and Laboratory Medicine at the Keck School of Medicine, University of Southern California. In my spare time I'm Dean for Student Education. My research interests have been over the years in immunology, immunodiagnostics, and lymphoma leukemia.
MS. CARLOS: I am Rufina Carlos, and I am the Executive Secretary of the Immunology Devices Panel of the Medical Devices Advisory Committee.
DR. LOPEZ: I am Oscar Lopez. I am a Professor of Neurology at the University of Pittsburgh, and my area of expertise is Alzheimer's disease and related dementias.
DR. BLUMENSTEIN: I'm Brent Blumenstein. I'm a biostatistician working privately.
DR. NATH: I'm Avi Nath. I'm a neurologist at Johns Hopkins University, and the Director of the Division of Neuroimmunology, and I do both clinical and basic science research on multiple sclerosis as well as the neurological complications of HIV infection. A minor correction to my name tag. I'm only an M.D.
DR. PARISI: Good morning. I'm Joseph Parisi. I'm a Professor of Pathology at the Mayo Clinic. I do neuropathology. That is my specialty. And I have a special interest in Alzheimer's disease and the dementias.
DR. GUTMAN: I'm Steve Gutman. I'm the Director of the Office of In Vitro Diagnostics at FDA, which is the unit which is sponsoring this event.
DR. TAYLOR: Thank you. At this time I'd like to ask Ms. Jenny Slaughter, who is the FDA Supervisory Program Integrity Officer, to make a statement.
MS. SLAUGHTER: Thank you Dr. Taylor. I'm here today to read the conflict of interest disclosure statement. The Food and Drug Administration is convening today's meeting of the Immunology Devices Panel of the Medical Device Advisory Committee under the authority of the Federal Advisory Committee Act of 1972. With the exception of the industry representative, all members of the panel are special government employees -- and I will now refer to you all as SGEs -- or regular federal employees from other agencies, and are subject to the federal conflict of interest laws and regulations.
FDA has determined that members of this panel are in compliance with federal conflict of interest laws, including but not limited to 18 U.S.C. 208, and 21 U.S.C. 355. Under 18 U.S.C. 208, the regulation applicable to all government agencies, and 21 U.S.C. 355, which is applicable only to FDA, Congress has authorized FDA to grant waivers to special government employees who have financial conflicts when it's determined that the agency's need for a particular individual's services outweighs his or her potential conflict of interest. Members who are special government employees at today's meeting, including SGEs appointed as temporary voting members, have been screened for potential financial conflicts of interest of their own, as well as those imputed to them by their spouse, minor child, and it's in relation to the discussions of today's meeting.
Based on the agenda for today's meeting, and all financial interests reported by the panel participants, it's been determined that all interests and firms regulated by the Center for Devices and Radiological Health present no actual or appearance of conflict of interest for today's meeting. Today's agenda includes a premarket approval application for a laboratory assay designed to measure levels of neural thread protein in urine specimens from patients presenting with cognitive complaints, or other signs and symptoms of suspected Alzheimer's disease. Dr. Duffell, as he had just mentioned, is serving as the industry rep on the panel today, and he is employed by Gambro BCT. In the event that the discussions involve any other products or firms not already on the agenda for which FDA has a financial interest, the participants are aware that the need to exclude themselves from such involvement, and their exclusions will be noted for the record.
Finally, with respect to all other participants, we ask that in the interest of fairness, that they address any current or previous financial involvement with any firm whose products they may wish to comment upon. Thank you.
DR. TAYLOR: Thank you. Ms. Rufina Carlos, who is the Executive Secretary, will now make some introductory remarks.
MS. CARLOS: Good morning again. First, some housekeeping matters. If you haven't already done so, please sign the attendance sheets that's available on the table outside the door. Information for today's agenda is at this table. Upcoming panel meetings are announced in our advisory panel website, and in the Federal Register. Finally, as a courtesy to others in the room, please turn off your cell phones during the meeting. And before I turn the meeting to Dr. Taylor, I am required to read into the record the deputization of temporary voting members statement.
Pursuant to the authority granted under the Medical Devices Advisory Committee Charter, dated October 27, 1990, and amended August 18, 1999, I appoint the following as voting members of the Immunology Devices Panel for the duration of this meeting on July 15, 2005: Dr. Joseph Parisi, Dr. Oscar Lopez, Dr. Avindra Nath, and Dr. Brent Blumenstein. For the record, these people are special government employees, and are consultants to this panel or another panel under the Medical Devices Advisory Committee. They have undergone the customary conflict of interest review, and have reviewed the material to be considered at this meeting. Signed, Daniel Schultz, M.D., Director, Center for Devices and Radiological Health, June 29, 2005. I would now like to turn the meeting over to our chairperson, Dr. Clive Taylor.
DR. TAYLOR: Thank you. This panel is here today to discuss, make recommendations, and vote on a premarket approval application PMA P040010 for Nymox Urine Neural Thread Protein (NTP) Kit. This is a laboratory assay designed to measure levels of neural thread protein in urine specimens from patients presenting with cognitive complaints, or other signs and symptoms of suspected Alzheimer's disease. Results from this test are intended for use in conjunction with, and not in lieu of, current standard diagnostic procedures, to aid the physician in the differential diagnosis of Alzheimer's disease.
The first item on the agenda is a presentation by Dr. Sousan Altaie. She will discuss the Critical Path Initiative. Dr. Susan Gardner will then discuss the role of the Office of Surveillance and Biometrics -- that's OSB -- in the review of postmarket study designs. Dr. Altaie?
DR. ALTAIE: Thank you. On my daytime job I am the Scientific Policy Advisor for the Office of In Vitro Diagnostics, and in my spare time I advocate for Critical Path. It's an endeavor that the Center took on and is determined to see through. So let's see if I can get the presentation up. All right, we're in business.
All right. For an outline, I'll first tell you what the Critical Path is about, and then I'll go through the differences between the Critical Path in CDRH versus what's in CDER, in the drugs. And then I'll describe some of the projects that we have in the Center that we're trying to see through in fulfilling Critical Path needs. And then I'll ask you to participate and take part in this effort because it's a leveraging effort. And I'll give you information as to where to and how to participate.
Critical Path is a serious attempt to make product development more predictable and less costly. Okay, that's an overview. Obviously these slides are a bit out of order, and I apologize. Critical Path, if you look at the development of the devices, you will be looking at basic research prototyping in preclinical and clinical development, and industrialization of the devices. Critical Path Initiative deals with the prototyping all the way through the marketing, but not dealing with the basic research.
You might wonder why FDA is interested in Critical Path. Well, because we realize the significant benefit of bringing innovative products to the public faster. Because we have a unique perspective on product development, we see successes, failures, and missed opportunities. Because Critical Path will help us develop guidance and standards that foster innovations. We want to work together with the industry, academia, and patient care advocates to modernize, develop, and disseminate solutions. These are tools to address scientific hurdles in device development.
What are Critical Path tools? The tools are methods and techniques used in the regulatory three dimensions. And those are, in assessment of safety, the tools predict if potential products will be harmful. In proof of efficacy, the tools determine if a potential product will have medical benefit. And in industrialization, the tools help in manufacturing the products with consistent quality.
At the FDA, we think of Critical Path tools to be biomarkers, Bayesian statistics, animal model biomarkers, clinical trials designs, computer simulations, quality assessment protocols, postmarket reporting, and any other suggestions you might give us that might fall under these categories. We're open to your participation. In the medical devices, if you consider the vast majority of the devices we regulate, they actually have a complex -- and a whole array of complexity, starting from being scissors, surgical scissors, to a heart valve, to a glucose monitor, to a CT scan, a PET scan. So we are looking, actually, at the diverse sets of products, and each one of them has a certain need. So that's why we are actually different from the drugs, and we're different by the complexity of the components of the devices, by the biocompatibility issues that we have to deal with. We are looking at durable equipment, and rapid product cycles. They get upgraded, and made into a new model very fast. We are dealing with device malfunctions, and user errors, and we're dealing with bench studies as well as clinical studies. And we do put a lot of weight on our non-clinical studies as well. As far as the regulations, we deal with the quality systems regs and ISO 9000, rather than the GMPs that the drugs deal with. So, we're totally a different beast as far as Critical Path concerns.
And I want to just go through now some of the medical device areas that we are interested in CDRH to pursue. Under safety tools, we look at biocompatibility databases, and effects of products on disease or injured tissue. Under the effectiveness tools, we're looking at surrogate endpoints for cardiovascular device trials, and computer simulation modeling for important devices. We are looking under industrialization tools to practice guidelines for follow-up of implanted devices. And we're looking at validated training tools for devices with a known learning curve.
These are some of the projects that are being pursued in the Center. And all of these things are not funded. We're trying to go through leveraging, and using the partnerships with the industry and the interested public to see these projects through. For validation of biomarkers, we're looking at blood panels to assess sensitivity and specificity. For peripheral vascular stents we are working to develop computer models of human physiology to test and predict failures before going into animal and human studies. In intrapartum fetal diagnostics devices we are developing clear regulatory path with consensus from obstetric community, and NIH is playing a good role there. And we are collaborating with NIH on pharmacokinetics and image-guided interventions. We are working with CDC and Johns Hopkins to develop a well-defined serum panel to test sensitivity and specificity for new hepatitis assays. We are working on pathways for the statistical validation of surrogate markers. We are working with medical specialties, organizations, to develop practice guidelines for appropriate permanently implanted devices. And we are determining the extent of neurotoxicity for neural tissue contacting materials. So these are some of the projects that we're dealing with.
And this is how you could get involved. There's two ways for you as an interested individual to get involved. That is that you can send in your comments to the Critical Path Initiative through our dockets, and I'll give you the docket number in the next slide. And we are also compiling a national Critical Path opportunities list that is being developed now. We're almost there, so if you have an idea of what should be an opportunity to be pursued under Critical Path you can contact us and tell us under that comment, in the docket comment as well. So these are the webpages and the docket number that I was mentioning. And you can always contact me if you need something that's related to CDRH specifically because I'm the Center rep, not the entire FDA rep.
So at this point I want to leave you with this thought. At CDRH, we believe in the total product lifecycle, and we believe that ensuring the health of the public through the total product lifecycle is everyone's business. With that, I'd like to have a question if there is one.
DR. TAYLOR: Are there any direct questions? Thank you.
DR. ALTAIE: Thank you.
DR. TAYLOR: Next will be Dr. Susan Gardner.
DR. GARDNER: Okay. Technology's working. I'm going to spend just a few minutes this morning telling you about a fairly major programmatic change in CDRH, and what that means to us, and what it might possibly mean to you. The basis of the change is the movement of the conditions of approval studies from the Office of Device Evaluation, ODE, to the Office of Surveillance and Biometrics, which is called OSB.
Briefly, the functions of OSB are that we do have a major supporting role in the premarket activities by virtue of the fact that the statisticians are in our office, and also the epidemiologists. We're also responsible for signal detection of adverse events by having a number of the major monitoring tools in our office, the Medical Device Reporting Program, and the MedSun Program. We're responsible for characterization of risk by doing analysis of these adverse event reports, and using our other monitoring tools, for coordination of the Center response to public health problems through communications to the health care professionals, and also for interpretation of the MDR regulation.
So, the condition of approval studies. The regulation, which is in 21 C.F.R. 814, etc., tells us that post approval requirements can include continuing evaluation and periodic reporting on the safety, effectiveness, and reliability of the device for its intended use. The basis of this change actually came from an internal review that we did a couple of years ago to look at our condition of approval program. We went back and we looked at all the PMAs that were approved from 1998 to the year 2000. There were 127 PMAs, and 45 of those had condition of approval studies. When we began this project, what we were really looking for was to assess the quality of the studies. We found out that we actually couldn't locate a lot of the studies, that we had a very limited process, and no standard procedures for tracking these studies. There was, as one might expect in any organization, a turnover of the lead reviewers, and that of course has also resulted in a lack of follow-up. And essentially, over in the premarket shop, they didn't really have the resources while they were doing these heavy premarket review tasks, to focus on the postmarket studies once the product had been approved. So we decided to attack this problem.
The goal of changing the program is to obtain better postmarket information as the device enters the market, which is obviously a really critical time as it moves from clinical trials into real world use. We want to better characterize the risk/benefit profile again as it moves into community practice, and of course add to our ability to make sound scientific decisions, and communicate back, obviously, anything that we find in the immediate postmarket period.
So, as I said, we officially transferred the program from ODE to OSB. That happened on January 1, although we had been doing a pilot for about two years before that time. So when we made the official transfer, we had some really good experience on how our procedures were going to work. We had developed, and in fact it's up and running, an automated tracking system so we know where every study is. We have information on the date the product was approved, what the condition of approval requirements are, and we'll track the studies, and we're going to acknowledge the receipt of mandated reports, and obviously we're going to let people know when they haven't sent in the additional information, or the required information.
We're also adding an epidemiologist to the PMA review team when we expect that there's going to be a condition of approval study. The epidemiologist is tasked with the development of a postmarket monitoring plan during the premarket review. So as the product's moving along this premarket review path, there's somebody on the team that's sort of thinking postmarket as it goes. The epidemiologist is going to have the lead in developing, and this is really important, and it's something that we felt we were also missing, really a well-formulated postmarket question. If we don't do those, these studies are not going to be important to anybody. They're going to have the lead in the design of the condition of approval study protocol, and in the evaluation of the study progress, and the results after approval. So we'll follow the studies, and again, we're going to be giving feedback on the studies. And of course, the PMA team will continue to work throughout. It's only the lead that will be switching, not the members of the team.
So we think this is going to be better for a number of reasons. First of all, I want to emphasize again that we want to be very careful when developing important postmarket questions, and a really good, solid study design, so that everybody's who's involved will think that this is an important activity to be done. Second of all, acknowledgement and feedback on the work they're doing is important to industry, and it's also going to be important to us. And we think it'll be a motivating factor, again, for getting these studies done. We will be posting the study status on the CDRH website. So for industry that is doing the studies and have them on time and cooperating, that information will be available. But if they're not being done, that information will also be available. And also, we do have the ability through Section 522 to mandate a postmarket study if these studies are not done, and there are penalties for not doing that.
How might this impact the advisory panel? Well, during the approval process, often first of all, postmarket questions come up pretty naturally, and sometimes the epidemiologist will be speaking, and will bring up postmarket issues. As this happens, of course as always we're going to look for your advice, because what you say during the panel process about issues that you're concerned about postmarket, advice on methodologies, whatever, will be extremely important as we develop the protocol if the product is approved and we decide to do a condition of approval study. And also, we're committed to having either FDA or industry come back to the advisory panel and update you on progress of any condition of approval studies that have been instituted for approved devices. Now, let me just also mention that because Dr. Gutman's office, the Office of IVDs, is set up a little bit differently, with premarket and postmarket activities going on in the same organization, the process will be a little bit different. But we do have an epidemiologist that is what we call a shared hire, works part-time in my office, and part-time in Steve's office. And the basic principles of committing, if we are going to have condition of approval studies, to having a well designed study, to tracking them to follow-up, and interacting with industry, and giving feedback, remain the same. So again, the process may be a little bit different, but the commitment and principles are the same. Do you have any questions? Thanks.
DR. TAYLOR: Okay, thank you. Now we have an opportunity, one of two opportunities in fact, for our open public comments. And we've had one official notification of intent to comment. This is Dr. Stephen McConnell, Ph.D., from the Alzheimer's Association. Dr. McConnell?
DR. McCONNELL: My name is --
DR. TAYLOR: Could you just hold one moment?
DR. McCONNELL: Sure.
DR. TAYLOR: We're just going to first read the open public hearing statement from the Executive Secretary.
MS. CARLOS: Both the FDA and the public believe in a transparent process for information-gathering and decision-making. To ensure such transparency at the open public hearing session of the advisory committee meeting, FDA believes that it is important to understand the context of an individual's presentation. For this reason, FDA encourages you, the open public hearing speaker, to advise the committee of any financial relationship that you may have with the sponsor, its product, and if known, its direct competitors. For example, this financial information will include the sponsor's payment of your travel, lodging, or other expenses, in connection with your attendance at the meeting. Likewise, FDA encourages you at the beginning of your statements to advise the committee if you do not have any such financial relationships. If you choose not to address this issue of financial relationships at the beginning of your statement, it will not preclude you from speaking. Dr. McConnell?
DR. TAYLOR: Go ahead, sir.
DR. McCONNELL: Okay. My name is Steve McConnell. I'm Senior Vice President for Advocacy and Public Policy with the Alzheimer's Association, and I have no relationship to the sponsor, financial or otherwise.
First of all, I want to start by acknowledging the dedication and expertise of this panel, and the excellent work that you do to advise the FDA on making critical decisions about interventions, devices, and tests. Second, I'm here representing the Alzheimer's Association. I am not an expert, a deep expert, on this issue. The Alzheimer's Association is advised by an expert panel, a medical and scientific advisory council made up of scientists and clinicians who are the leaders in the field of research in Alzheimer's disease. So the statement I will read this morning has been approved by the Medical and Scientific Advisory Council, as has all public statements that we make about such issues.
The AlzheimAlert NTP Test, AD7C Urine NTP Test, marketed by the Nymox Corporation detects levels of neural thread protein in urine. The Alzheimer's Association does not recommend use of this test either for diagnosing or ruling out Alzheimer's disease. Studies supporting the validity of the test for such purposes have not been replicated by independent laboratories, or conducted in an adequate number of individuals. At this time there is no consensus among Alzheimer's experts that this test is valid or useful, and its use is not part of any recognized diagnostic guidelines developed by professional organizations.
Currently, there is still no single test for Alzheimer's disease, and a diagnosis is a multifaceted process that must be administered and evaluated by a skilled health care professional. To date the Urine NTP test has no established clinical utility in the diagnosis of Alzheimer's disease. Thank you.
DR. TAYLOR: Thank you. Are there any other speakers from the floor who have not previously notified us? There will be another opportunity in the afternoon agenda. At this point we are in fact scheduled to take a short break. We're running well ahead of schedule, but I think nonetheless we will take a 15-minute break at this point in time. That would mean reassembling at let's say 9:20. Thank you.
(Whereupon, the foregoing matter went off the record at 9:07 a.m. and went back on the record at 9:20 a.m.).
DR. TAYLOR: Thank you. We will now proceed with the Nymox presentation for their device. The first speaker according to my agenda will be Ira Goodman, the Chairman of the Department of Neurology, Orlando Regional Healthcare Systems. Please, sir.
DR. AVERBACK: First of all, I'm not Ira Goodman. The draft changed. We apologize for that.
DR. TAYLOR: That's fine.
DR. AVERBACK: Good morning. I'm Paul Averback, and I'm the CEO of Nymox. On behalf of Nymox I want to thank the panel members and the FDA for coming here today to hear about NTP. We're very happy to have this opportunity to present to you the device which we believe is entirely safe, and a very effective technology to assist physicians in the evaluation of cases of suspected Alzheimer's.
By way of introduction, Nymox is a small biotech company. We're based in Maywood, New Jersey, and Montreal, Quebec. My personal background is I'm a U.S. board certified neuropathologist. I've been in primary care for close to 30 years, emergency room, and neuropathology for 25 years.
Right at the outset we'd like to emphasize three things which we believe are important concerning the safe and effectiveness of the NTP measurement. First, NTP is not a stand-alone diagnostic. It's entirely safe, can't hurt anybody, there's no downside, it's a urine sample, there's no bodily risk, there's no radiation, there's no lumbar puncture, there's no downside to anybody. There's no decision that directly gets to anything invasive from this. But it's not a stand-alone diagnostic. It's not like a liver biopsy. It's not like an HIV test. It's a measurement that adds useful information. It's similar to like a urinary dipstick for urinary tract infection. A urinary dipstick is not definitive, but it helps move the diagnostic along. It's absolutely useful, as every practicing physician knows.
Second point at the outset is that comprehensive specialist evaluation is a higher diagnostic standard of truth than primary care evaluation. I'll say that again. Comprehensive specialist evaluation is a higher standard of truth than primary care evaluation. And we are going to show you today that NTP has a very high percentage of agreement with comprehensive specialist evaluation. So if this agrees closely with a comprehensive specialist evaluation, it's going to be helpful to the primary care physician who has accuracy that's perhaps half what a specialist is.
The third point at the outset that I want to emphasize, when you consider safe and effectiveness today, each individual in this study, each subject, had only two data points. They had an NTP level, and they had a diagnosis. Two data points, NTP and diagnosis, and the diagnosis was according to conventional standards and criteria. Our thesis is twofold. First, the statistics, we believe, are very clear. We will go into this more as the talk progresses, but basically in this clinical study, 96 percent of the people with elevated NTP had disease, that being probable AD, possible AD, or MCI, and there was a 78 percent improvement in the negative predictive value by using this measurement. For the primary care physician, these are extremely compelling, and I think these are exciting improvements. The second part of our thesis is that this device is extremely safe. There is no downside for anybody. It's a urine sample. There's no bodily risk whatsoever, and it's going to provide very useful information for the primary care physician.
I don't want to put this laser beam into anybody's retina, but if I could ask you to ?. This is the basic scatter plot from this study. And as you can see, 96 percent of the points here that are above the cutoff are in the disease groups. 96 percent. And 89.5 percent of the probable ADs are above the cutoff, and 91 percent of the definite non-ADs are below the cutoff. We believe that this shows lots of usefulness for the primary care physician who detects Alzheimer's, according to experts and published studies, in the 15 - 25 percent sensitivity range.
If I could have the next slide, please. The Division was kind enough to provide us with the questions for the panel, and we've had a chance to look at these. And we just wish to clarify that in the third question here, "Does the NTP test add certainty to the diagnosis or exclusion?", we would like to clarify that the device is not confirmatory, but it adds useful information to help in the evaluation. And in the fourth question, where it says "Is the test safe and effective for the diagnosis?" we would clarify that the device adds useful information to help in the evaluation.
Now that I've said the introductory points, I'd like to proceed into a more formal introduction. Our agenda today in our talks, I will be followed by Dr. Daniel Bloch from Stanford University who will make a couple of comments about some statistical issues. He will be followed by Dr. Patricio Reyes from the Barrow Neurological Institute who will talk about utility and design. And he will be followed by Dr. Ralph Richter from the University of Oklahoma who will describe this study and its results. And then Dr. Ira Goodman will give some illustrative case examples showing the utility of the device. Next slide, please.
There has been some evolution in the intended use in the course of this PMA, but the core concepts have not changed. We have tried to clarify things, but we haven't changed anything. I draw your attention to "The results are intended for use in conjunction with and not in lieu of current standard diagnostic procedures to aid the physician in the diagnosis of definite non-AD versus probable AD, possible AD, or MCI." I would like to emphasize that this is intended to supplement and augment existing procedures, not to replace them. Next slide, please.
Here is the kit. Urine NTP device is an indirect ELISA format assay which measures NTP in a first morning urine sample. Coated plates bind NTP in urine in competition with labeled rabbit anti-mouse IgG. Next slide, please.
We first came to the FDA in the year 2000. It's been five years that we've worked on this study. And as you can see, the Alzheimer's Association doesn't know about any of this work in the past five years. We've been working with FDA for five years on this, and the theme of the clinical design is that the test is -- to test this device in a realistic clinical context of the intended use population, with real patients in a real study. And this design, beforehand, up front, was agreed to by FDA. Next slide, please.
Going back to the beginning, what is NTP? NTP is derived from Alzheimer's disease brain, definite Alzheimer's disease brain. The CDNA was derived from Alzheimer's brain. And immunohistochemical studies have shown that NTP can be detected in the hallmark brain lesions, and it has pathophysiological relevance to tangles and plaques, and cell death, and apoptosis. Next slide, please.
In developing the device, the criteria that we were interested in were that it be simple and readily accessible. This means that no new infrastructure is needed, no great cost outlays, no new training, no new personnel, something that could be readily accessible to most laboratories. It's non-invasive and it's a urine test. It's non-invasive. And to show that it has significant effectiveness that could help the physician in the evaluation of patients with these problems. NTP has limitations. It's not a stand-alone diagnostic, and it is not something that will help in the diagnosis of mixed, incompletely understood, or controversial, difficult, neurological entities. Next.
Safety, safety, safety. No invasion. It's a urine sample, and it's not a stand-alone test. No invasive decisions result from it. It adds useful information. We will show more about this as we go on. With NTP is significantly better than without NTP. And this, I go back to the logic of the primary care physician who, using prior prevalence, and their well documented levels of accuracy, if they use NTP, they're going to be significantly improved.
So to summarize the benefit/risk, we have an urgent unmet need. We have a device that has absolutely no downside to anybody. It's non-invasive. It can't hurt anybody. And it adds extremely useful information. And the data shows 96 percent of people with elevated NTP are in the disease groups. Thank you. I'll now call on Dr. Daniel Bloch from Stanford University.
DR. BLOCH: Thank you, Paul. So my name again is Daniel Bloch. I'm a Professor of Biostatistics at Stanford. I have no financial interest in the company, but I am paid for my time on an hourly basis.
I would like to start by presenting a statistical description of what Paul referred to in terms of positive predictive accuracy in his introduction, if I could have the first slide, and how a statistician interprets positive predictive value, or negative predictive value, but I'll illustrate with positive predictive value. If someone has no knowledge at all, accepting that the prevalence for a person to have probable AD or a mental impairment condition that is to be improbable, possible, or MCI mild cognitive impairment groupings, that prevalence is 78 percent. And if that's the only knowledge that a person has, and that person decides to classify everyone as having that condition, then you would be wrong 22 percent of the time. That's called the no-knowledge diagnosis. Maybe you would call it the dumb diagnosis. But the fact is just knowing the prevalence, you'll be right 78 percent of the time. And of course, you'll be wrong 22 percent of the time.
In contrast, among those patients that have NTP elevated, that is above 22 micrograms per milliliter, the probability of correctly placing a subject into probable, possible, or MCI, raises from 78 percent to 95.9 percent, almost 96 percent. That is, instead of having 22 percent that are misclassified, now only 4 percent are misclassified. Notice that the interpretation of positive predictive value does depend upon the prevalence in the population. And I'll come back to that later on in my last slide that I'll present here this morning.
Now, I would like to address some of the comments that were made in the statistical and medical reviews regarding statistics. And there are so many instances that I was somewhat dismayed with that I just chose one or two to illustrate with. And here on this slide, I'm referring to the statistical reviews, references, and quite extensive references to the Blacker paper that was published in 1994, wherein the 1984 criteria were applied to a group of patients with suspected Alzheimer's disease. The Blacker paper used the three diagnostic categories of 1984, which include probable, possible, and definite non-AD. Our study uses the modern four diagnostic categories, which include MCI. And of course, by including a fourth category, and the four categories now combined are a mutually exclusive and exhaustive partitioning of all patients, just like the three were before, but by doing so the definitions of the other groups have changed, especially the definite non-AD group has a different definition than it did in the Blacker publication. This is critical because in the statistical presentation, there's great lengths played about taking those kinds of criteria, trying to fit our data into the criteria, making assumptions about how that might be done. It's all irrelevant, really. We are not using those criteria. We're using the modern criteria which include MCI. Also in the Blacker paper the classification was based on definite AD or non-definite AD, and definite AD was possible because all patients, all patients, were autopsied. That is, all patients died and they had the autopsy diagnosis. Well, in a clinical diagnostic criteria for early-stage populations, the chronic disease, you don't kill people and get autopsies to get a definite AD classification. And in fact, with the FDA it was agreed that the gold standard evaluation would be the probable AD group.
Also, in the statistical review, references made to so-called best case and worst case scenarios, this is misleading, it's fabricated. And in fact, as we would know, a best case scenario I would think would mean, if it's supposed to be a best case scenario for NTP, it would mean that if you did follow everyone, and you could get autopsies, and all the other information to get a definite diagnosis, then all NTP pluses would have Alzheimer's, definite Alzheimer's. That's the best case scenario for an NTP. And the worst case scenario is that all that were negative would be found to have not Alzheimer's. Of course that's not going to be the case, and it's -- and the data that is presented is very unclear with regard to that point. We'll go to the next slide.
This, as Dr. Averback has said over and over again, is that the intended use is that this is not a stand-alone diagnostic criterion. And if we could have the next slide, I wanted to drive that home with a scenario that was presented in the medical reviewers packet, where he hypothesized there's a middle-age individual, say a male, who might be 40 years old I guess. It's not clear what "middle-aged" means. I guess when you're really young middle age starts at 30, but when you get a little older it's not clear when it starts. But at any case, there's a middle-aged individual, and the workup showed that with the physical and the neurological exams as well as the mental status is that the assessment's inconclusive. And the two scenarios that he presents -- so these facts are the same, represented by these three bullet points. But the two scenarios, one is where the NTP is elevated above 22, and the second scenario he presents is when it's not elevated. And the question is, well, what utility is it? Well, I would say in this particular case it's clear that with a younger person it's very unlikely, we know that the prevalence of Alzheimer's in middle-aged people is low. How low is it? Well, three possible scenarios, perhaps, are as low as 1 percent, it's probably not as high as 5 percent because the prevalence among 65-year-olds isn't even that high. But notice that that's not the prevalence in the study population. The intended use population has a prevalence which is much higher than that of having either probable or possible or MCI 78 percent. But for this hypothetical patient, it's going to be much lower than that. And if you have an elevated NTP with the scenarios, if you could go back to the past slide again, that are bulleted here, which lead to an inconclusive diagnosis, but you have an elevated NTP, then I think that would be extremely interesting for a physician. It would probably drive him to get further tests. That is, it's not a stand-alone test. This is a good example of where it would not be a stand-alone test, and you would want to try to find out and do more workups to see if perhaps the person had probable AD for some reason that you're not finding. On the other hand, the less than or equal to 22, there's inconclusive evidence it might be, so the primary care physician would say, well, I'm going to wait, and maybe I can only see him once a year, rather than next week. Everything's so inconclusive, the NTP is low. I think you get the point. But again, it's not a stand-alone treatment, but the strategy going forward has been influenced by NTP.
This last slide here that Jack has here, I just want to make the point that, again, positive predictive values and negative predictive values depend upon the prevalence. The sensitivities and specificities for NTP are 60 and 91 percent. The 60 percent drives home the point that Paul made, that in the primary care setting where with the information available the sensitivity is documented to be between 15 and 25 percent, with NTP it goes to 60 percent. One might say, well, 60 percent isn't very good, but you have to put it in the right perspective. And given that the sensitivity and specificity is, as we have shown with our study, been with various prevalences, for example with the middle-aged scenario, the positive predictive value is not going to be 95 percent, 96 percent. For example, if the prevalence is 1 percent, the positive predictive value will only go up to 6 percent. But the way to interpret that would be to say you get a six-fold gain knowing NTP than without it. So I think I'll just stop here with those few comments for now, and give the floor over to Dr. Patricio Reyes. And he's going to talk about the diagnosis of Alzheimer's disease.
DR. REYES: Good morning. I'd like to take this opportunity --
DR. TAYLOR: Could you state your name, please, and your institution, etcetera?
DR. REYES: My name is Dr. Patricio F. Reyes. I am Director of the Alzheimer's disease, Cognitive Disorders, and Research Laboratory of Barrow Neurological Institute in Phoenix, Arizona. Five months ago, before I took this job, I was Professor of Neurology and Pathology in Psychiatry at Creighton University, and Director of the Center for Aging and Alzheimer's disease. Prior to that I was a Professor of Neurology and Pathology in Neuropathology at Jefferson Medical School in Philadelphia. I have served as Chair of the Medical and Scientific Advisory Board of the Alzheimer's Association in Philadelphia, and I continue to serve as a member of this board in Arizona. I feel that my role as a member of this board, as a teacher, as a caring physician, and as a researcher, my role is to encourage research, support research so that my patients who suffer from this dreadful disease could have hope and better lives in the future. I will continue to raise money for Alzheimer's Association which I think is a good partner in this endeavor. I do not receive any financial compensation -- or I do not have any financial interest in Nymox. Next slide, please.
Why do we talk about Alzheimer's today? It was Dr. Robert Katzman from Albert Einstein several years ago who really placed Alzheimer's disease in the front line of medical research. It is a disease that affects millions of Americans, and the incidence and the prevalence are supposed to increase in the years to come. It has become a major health problem in our country. Next slide.
It affects late middle age and older people. And as the population increases in this segment, we're going to have more health care problems with degenerative disorders of the nervous system such as Alzheimer's disease. Because they're older patients, we have to deal with comorbid conditions that are common in this age group. We all know that patients in this age group receive multiple medications that confound the assessment of patients with cognitive deficits. Unfortunately, to date, because of the lack of a reliable biomarker antemortem, diagnosis is only confirmed by autopsy.
In looking at the data at present, we know for a fact that primary care physicians, and even some specialists, because of limited time, training, and experience, have difficulty diagnosing Alzheimer's disease, particularly in its early stages. Another challenge that we face is that Alzheimer's disease is a neurodegenerative disorder. It's main clinical domains are three: cognition, behavior, and impaired activities of daily living. All these clinical domains are necessary to diagnose a dementing disorder such as Alzheimer's disease. This is what we assess when we follow-up as we treat or follow-up patients. Yet the symptoms that we deal with are not as specific. They are frequently mistaken as normal aging. And there is truly a need to develop a reliable, non-invasive biomarker for this disorder. Next slide, please.
But we have a major challenge that we face in everyday life in our clinics, and even as a neuropathologist. Many times Alzheimer's disease is difficult to distinguish from other dementing processes, such as vascular dementia, Parkinson's disease with dementia, dementia with Lewy bodies, fronto-temporal dementia, and so on. Let me tell you that several investigators, and including our own team, have now found out that vascular dementia frequently is associated with changes similar to Alzheimer's disease. In addition, what we have called Alzheimer's disease clinically, if you look at them neuropathologically, frequently have vascular lesions as well. This adds to the complexity, both in the clinical diagnosis and neuropathological confirmation of the disease. Parkinson's disease also, if accompanied with dementia, is a difficult entity to face. You could have Parkinson's disease with dementia per se, or you could have by age alone because Parkinson's disease and Alzheimer's disease affect both older individuals. The chance for having two diseases is not uncommon.
Dementia with Lewy bodies is a subject that's direct to my heart because in the mid-?80s we were one of the first to describe the clinical and neuropathological abnormalities in patients with Alzheimer's disease with Lewy bodies. To date, there is quite a bit of confusion how to call this disease. Some investigators call this Diffuse Lewy, Lewy body dementia. In our hands we call it, in many instances, Alzheimer's disease with Lewy bodies. And the neuropathological correlates are poorly defined as well. These are patients who usually have dementia with Parkinsonism. Not Parkinson's disease, but Parkinsonism. Who have fluctuating moods from day to day. And if you treat -- with visual hallucinations so they hallucinate visually. And if you treat them with regular medications for Parkinson's disease, many of them get worse. If you treat them with regular doses of antipsychotics, they get worse. So again, there is tremendous complexity in the diagnosis and neuropathological confirmation of this disease.
Fronto-Temporal dementia is equally difficult to diagnose, although these patients are usually described as having symptoms referable to affect, changes in affect, depression, with memory loss, usually in the younger age group. But we do not have a marker. And the neuropathological correlates could vary from patient to patient. So with this challenges, the reason I mention this is because in this study, we did not include vascular dementia, Lewy body dementia, fronto-temporal dementia, because of the major controversies that exist in the clinical and neuropathological diagnosis of these different entities.
It is also true that the presence of neurofibrillary tangles, and the related plaques, which are the histological hallmarks for Alzheimer's disease, can be found in other neurodegenerative conditions. So with these challenges, we also have opportunities in understanding Alzheimer's disease. We have to use validated and standardized clinical criteria. We have to do comprehensive assessment of each patient to exclude other neurologic and/or systemic disorders. We must develop a non-invasive and reliable antemortem measure that will provide useful information and can serve as a biomarker for this disorder.
How did Nymox approach these challenges? Nymox has devised a prospective blinded study to determine the safety and efficacy of the NTP device. By that I mean that these patients who were included in these studies were newly referred patients to memory disorders clinics all over the land. The technical people who analyze the urine samples were blinded as to the diagnosis and demographics of each patient. And I think -- I strongly believe that it's a safe procedure, and as the data unfold later, I believe that it is also effective in providing useful information in diagnosing Alzheimer's disease and related disorders.
The study utilized subjects representative of the intended use population. They have employed established and qualified investigators. To my knowledge, all of the principal investigators were board certified in American boards of neurology and psychiatry. Many of them are directors of memory disorders clinic, and some of them are even chairpersons of the department of neurology. The study utilized accepted diagnostic criteria for probable, possible Alzheimer's disease, MCI, and definite non-AD. Probable Alzheimer's disease, without going into a lengthy discussion of the criteria, these are patients who are assessed comprehensively by each investigator, and they have excluded other conditions, either systemic or other neurologic conditions that could give rise to dementing symptoms. Possible dementia cases are cases who have dementia that could be due to Alzheimer's disease, but yet the patients have comorbid conditions, such as hypertension, diabetes, little stroke. MCI, on the other hand, as Dr. Bloch has referred to, is a relatively recent concept, which it stands for mild cognitive impairment. These are individuals or patients who may be members of our community. They may have subjective complaints of memory loss. On careful testing they may have mild symptoms of memory loss, or even visio-spatial disorientation. But they are not demented because they are able to do their activities of daily living. Their deficits do not impact on their social or professional functioning. The non-definite AD are those that do not belong to either of the three categories. The reason, again, why we do not include a specific category for fronto-temporal dementia, vascular dementia, Parkinson's disease with dementia, and Lewy body dementia, is because we have considerable controversy in the diagnosis and neuropathological aspects of these disorders. Next slide, please.
If we approach this simply by giving pathologic confirmation, it is obvious that this approach is not viable. It is important, however, if possible to do pathologic confirmation. Next slide.
So in summary, the NTP assay is a laboratory tool that is non-invasive and simple. The sample we are talking about here is readily accessible. It's urine. Assay is devoid of adverse body reactions. It is a reliable test. It has a sound scientific basis. However, it should not be used as a stand-alone test. I will now give the floor to Dr. Richter.
DR. RICHTER: Well, nice to be here with you today. I appreciate the opportunity to present results of an interesting study which I think has really good possibilities for assisting the clinician. I'm Ralph Richter --
DR. TAYLOR: Could you please -- good. Thank you.
DR. RICHTER: I formally served as Clinical Professor of Neurology at Columbia University. For 12 years I directed the service at Harlem Hospital for Columbia, and then was part of the founding of a branch of the University of Oklahoma in Tulsa, and then have been very active in clinical affairs. I've been doing a great deal in clinical investigation in Alzheimer's disease. I have also been very active in the Alzheimer's Association. I was one of the founders in Oklahoma and served as Vice President for a number of years. That's why I was a little surprised to hear Dr. McConnell make comments that I would say judge not, lest ye be judged. Because we've contributed a great deal to that association, and to make those comments based on not even reviewing the data is not fair play. Let me put it on the table.
Now, the -- could we begin our talk. The study itself was a blinded clinical study. It was to demonstrate the effectiveness of the NTP test in the diagnosis in a realistic clinical setting. The individuals that were worked up came out of our own, like in our own clinical research unit. As they came in, we would ask, I'd see seven or eight new memory patients a week, you know, and have a team that helps work them up. We would then offer if, you know, once we'd begin the work we'd say ?Would you like to participate?' So it's kind of not -- we didn't pre-select. They selected in the sense would they be willing. And there were people certainly who were not willing or able to participate, even in a limited study like this. There were only two data points: the clinical diagnosis, and the NTP measurement. Next.
So as we mentioned, the other two centers that were involved here. The other eight also included individuals who were very well qualified in neurodiagnosis, and had extensive experience so that the -- one can rely on the evaluation as being clinically efficient and effective. We gave it our best shot in terms of helping to work up these individuals. Next.
The NTP test itself was a first morning sample. And not everybody is able to give a good first morning sample. That's why some of the urines were inadequate to begin with. The laboratory determination was blinded to gender, to age, to name, to location. There were unacceptable urine samples that might contain glucose or protein. These were then not included. If the creatinine concentration was less than 50 mg/dL it was felt that this was then not a first morning specimen. Also, those that were not included were where the creatinine level was more than 225 mg/dL. As mentioned, the investigators that took part in this study are all very reliable, well trained clinicians. And the workups were then on a standard basis per protocol, but we did our own full workup as per each unit, but then included the way the protocol was set up as well. So this was not an independent -- in other words, we didn't do just this study. We were doing this to help individuals with their disease and their families. Next.
So the comprehensive medical history, physical exam, and complete neurological exam, a psychosocial evaluation. I have the Master's psychologist who helps in interviewing the patients. The Mini-Mental State Exam was done on each as it is on all of our patients, and the imaging as listed, a blood workup which would be standard in all of our units. We're trying to exclude those individuals who have correctable neurological conditions leading to dementia. And we, by gosh we do find some as we all, those who are clinically oriented, hear. There are going to be correctable conditions that lead to dementing symptoms that are not Alzheimer's. Next.
The outcome measurement was then the urinary NTP measurement. The clinical diagnoses followed. We would then put them into categories of probable AD, possible AD, and mild cognitive impairment. The criteria were the NINCDS-ADRDA criteria for Alzheimer's as possible or probable. And the mild cognitive impairment category, actually, is some of the guidelines developed by Ron Peterson of Mayo Clinic, an old colleague of ours that was a subcommittee of the American Neurological Association. And the final category, those who were felt not to have Alzheimer's disease. That group ended up as with high Mini-Mentals who in a number of instances, nine that I can think of, actually had pseudodementia, who were depressed. And then there are other groupings of patients in that category.
The flow chart here of the urines. There were urines here that are listed that were then discarded, or that were not adequate. So there is a sampling that these patients then, the data then is not included in the 200 individuals who were included in the database. The diagnoses of probable Alzheimer's disease were found in 28.5 percent of our database, the possible AD of 28 percent, and MCI of 21 percent, and the definite non-AD as 22 percent. This kind of a spread would represent, I think, kind of community-based studies, you know, as serving all comers. I think that spread would be typical for that. So we're not pre-selecting in this study. Next.
Here again the data. The cutoff point was the NTP of greater than 22 micrograms per mL were those that we felt were significant -- this was a significant marker, and those below 22 micrograms per dL. And the dividing group as you see here, 51 out of 57 of probable AD had this elevation. And also interesting is that the MCI patients, there was an even split in the definite non-AD, only 4 out of 44 had an elevation of the NTP. Next.
Here again is the scatter plot that has been shown several times. The probable AD are clearly higher than the definite non-AD cases. And there is very little overlap between these two groups. The possible AD and MCI groups are in between, which is exactly what we would expect because these two groups contain a significant number of cases of AD and non-AD. But it is not possible to be certain on clinical grounds in any given case. Next.
This slide summarizes the prevalence of definite non-AD versus not definite non-AD, which would then be the MCI, possible AD, and probable AD. The prior prevalence for definite non-AD is 22 percent. Next. This slide shows the performance characteristics of definite non-AD versus probable AD and possible AD plus MCI. The percentage of those that are diagnosed in this category of definite non-AD with normal NTP levels is 90.9 percent. The percent of patients diagnosed as not definite non-AD, of MCI, possible AD, and probable AD is 60.3 percent. The positive predictive value here as outlined, that the -- in the group here, there is the percent of elevated NTP levels as probable Alzheimer's, or possible Alzheimer's, or MCI, is 95 percent. The prevalence here is 78 percent. The use of the elevation in the NTP, the utilization, actually raised the clinical acumen by 23 percent. The utility of the NTP measurement showed the significance as listed here that your percent of potential recognition of those with Alzheimer's who had elevations were 23 percent. Next.
The performance characteristics are outlined in this slide. The negative predictive value here is 39 percent. The prevalence, again, of 22 percent. And that compared to the prevalence without NTP there was an increase in recognition of the potential improvement in diagnosis of some 78 percent. So there is a utility of the NTP measurement with significant improvement utilizing the data derived from the NTP.
The clinical study shows that the NTP measurement provides useful diagnostic information to the diagnostic. It adds a measure of strength to the diagnostic workup process for suspected Alzheimer's disease. There are, then, ongoing reviews which will be taking place. We hope certainly to continue to study other individuals, and develop even new database. And I would hope that our distinguished panel could consider, give us the opportunity to make this diagnostic test available. Many thanks. I'm going to introduce Dr. Ira Goodman, who is from Florida, and is a leading clinician and a clinical professor at the University of Florida, who has a very large database also of patients.
DR. GOODMAN: My name is Dr. Ira Goodman, and I have no financial interest in Nymox. I'm an adult neurologist, Director of the Orlando Regional Healthcare System's Memory Disorders Clinic, and clinical faculty at the University of Florida School of Medicine. We are one of 13 state-funded memory disorders clinics, part of the Alzheimer's Disease Initiative begun several years ago by the Department of Elder Affairs in an effort to support the thousands of people with cognitive impairments and their families who reside in the State of Florida. The mission of the memory disorders clinics is to provide evaluation and treatment, education, as well as research. The State of Florida has a disproportionately large number of people with cognitive impairment, secondary to a disproportionately large number of elderly residents.
Patients are referred to our clinic by primary care and other physicians, as well as referrals by local agencies, sometimes through neighborhood screenings offered by the local agencies. Patients are evaluated and treated regardless of their ability to pay, and efforts are made to reach out to minority communities. In our clinic, after the evaluation and testing are complete, each patient is then presented at conference attended by the examining physician and other physicians, social worker, neuropsychologist, psychopharmacologist, as well as geriatrician. A diagnosis utilizing the NINCDS-ADRDA criteria is then established for each patient, at which time -- the diagnosis is then established. The patients come back with their caregiver, told about the diagnosis, and then a treatment plan is formulated. We do follow the patients longitudinally over the years to monitor their course which aids in confirmation of the clinical diagnosis. In addition, each patient is offered the opportunity to enroll in the brain bank program which allows for pathological confirmation of the diagnosis at the time of death.
I would now like to briefly review several cases in which the urine neural thread protein was measured, which I hope will illustrate the utility of the test. These patients participated in the prospective blinded study which was just discussed. Because our time is limited, I'll only present the salient features. The first patient is AS, who was referred from her primary care physician. She is a 73-year-old white female who presented with a four to six month history of forgetfulness. Her husband who also accompanied her stated that she was also having some difficulties with ADLs. Past medical history was positive for hypothyroidism, and her family history was positive for memory loss in her mother. The evaluation in the clinic demonstrated an MMSE score of 27. She had an unremarkable neurological examination, other than severe chronic kyphoscoliosis. MRI revealed atrophy and small vessel changes. Her extensive laboratory screens were unremarkable, and formal neuropsych testing suggested early Alzheimer's disease with depression. The urine neural thread protein measurement was elevated. This was taken before clinical evaluation was begun. After her clinical evaluation, the diagnosis of probable AD was established. In the two years of follow-up that we have, she's demonstrated significant decline which now has led to institutionalization despite treatment with cholinesterase inhibitors and memantine. Next slide. The utility -- the value of the neural thread protein in this patient can be seen. The patient presenting with only mild symptoms, an elevated NTP level, will provide the primary care physician with additional useful information that may indicate earlier follow-up or referral for more comprehensive evaluation.
The next patient is SM. He is a 72-year-old white male who presented in November of 2002, again referred from his primary care physician. He has a one to two year history of cognitive decline. Past medical history was positive for hypertension and prostate cancer. Family history was negative for dementia. Evaluation at clinic revealed and MMSE score of 30. He had a normal neuro exam. There was no evidence of MCI or dementia on detailed neuropsych testing. His MRI was felt to be normal for age, and extensive laboratory screens were, again, normal. His urine NTP measurement was normal. After evaluation, the clinical diagnosis of normal or the probability of age-related memory impairment was established. For purposes of the study, the patient was categorized as definite non-AD.
About two years later, at the insistence of his daughter he came back to the clinic because she felt that he was showing a clear-cut cognitive decline, and actually specifically asked to be placed on cholinesterase inhibitor. We went ahead and reevaluated him with detailed exam, which was normal. Repeat MRI was normal, repeat labs were normal, and repeat detailed neuropsychological testing was normal. Next slide.
Again, the utility of the NTP measurement can be seen. This presentation of the worried well is a common scenario in the primary care physician's office. The normal NTP measurement offers useful information for the decision whether to refer the patient for specialist evaluation, or just to continue close clinical follow-up.
The next patient is MM. He is a 39-year-old white male who was referred to evaluate for about a six month history of cognitive decline. We first saw him in March of 2003. His primary care physician actually felt that he was demonstrating depression related to a divorce that he was going through after 15 years of marriage. His medical history was unremarkable. However, his family history was suggestive of early onset familiar Alzheimer's disease, with his brother, his father, and paternal aunt dying in their early forties with a diagnosis of dementia. Unfortunately, we had no confirmatory family medical records to look at. Evaluation in our clinic revealed an MMSE score of 18. He had a normal neurologic examination. Detailed neuropsych testing was consistent with Alzheimer's disease and mild depression. The MRI demonstrated brain atrophy. EEG was disorganized and slow, but there were no periodic discharges, and extensive laboratory screens were unrevealing, other than positive ApoE 4 allele, and elevated homocysteine level, which actually normalized with a high dose vitamin supplementation. His urine NTP measurement was elevated. Because of his age and his family history, we elected to proceed with a genetic analysis for the PS1 mutation. As you know, there's a rare inherited autosomal dominant form of early Alzheimer's secondary to genetic mutations on Chromosomes 21, 14, or 1, the most common being a mutation of the presenilin gene on Chromosome 14. The patient did have a unique mutation on PS-1, and this confirmed the clinical diagnosis of probable AD.
Unfortunately, MM passed away last March about at the same time as other family members, in their early forties. Next slide. The PS-1 mutation analysis provided independent confirmation of the clinical diagnosis of probable AD. The urine NTP measurement agreed with the PS-1 mutation result and the clinical diagnosis of probable AD, and the elevated NTP measurement added useful information for -- could add useful information for the primary care physician to increase suspicion of AD in a young or middle-aged patient.
And the last patient I want to talk about is PS. He was seen about two years ago, again, referred from his primary care physician for a second opinion concerning the diagnosis of Alzheimer's disease. He is a 54-year-old white male with about a two and a half year history of progressive cognitive decline. This actually presented with -- he kept repeating himself, and it eventually led to more serious symptoms. His primary care doctor went ahead and worked him up, and got a brain imaging study revealing ventriculomegaly which was out of proportion to atrophy. He, however, on presentation had no clinical symptoms that would have suggested NPH. His past medical history was negative, social history was negative, and family history was negative.
Evaluation in our clinic demonstrated an MMSE of 24. He had a normal neurologic examination, including an entirely normal gait. His detailed neuropsych testing was consistent with AD. Extensive laboratory screens were negative. He did not have an ApoE 4 allele. Lumbar puncture was performed, which revealed a mild CSF pleocytosis. He had eight white cells, normal protein, normal glucose, negative cultures, negative cytology. And his EEG demonstrated mild slowing. His urine NTP measurement checked initially was elevated.
Now, this patient, we were following up, and he developed a severe cognitive decline, with an MMSE deterioration of 24 to 8 in one year. He also developed significant behavioral changes that required the institution of a neuroleptic agent. Subsequent to that he started developing a very, very severe gait disorder, as well as urinary incontinence. And we weren't clear whether the gait impairment was related to a central nervous system disease, or was related to his medication. Well, after awhile it was elected by the family to proceed with a V-P shunt and simultaneous brain biopsy. The family wanted to establish a diagnosis as certainly as possible, and also wanted to see if there would be any clinical improvement with the placement of the V-P shunt. The V-P shunt and brain biopsy were performed in early 2004 without complication. The results of the brain biopsy confirmed definite AD, and for the purposes of the study he was categorized as a probable AD. Next slide.
This is the brain biopsy, with the tangles and plaques. He clinically did not show any improvement or response to the V-P shunt. So, since over the last year and a half or so. Next slide.
The utility NTP measurement in this patient. The elevated NTP measurement adds useful information for the clinician to pursue further investigations for Alzheimer's disease before proceeding to more invasive procedures.
These patients that have just been briefly presented I believe represent a spectrum of the types of patient that present to the primary care physician's office. Measurement of the urine NTP offers a reliable, useful information that can be utilized by the clinician as part of the evaluation of the patients who present with cognitive symptoms. All physicians who evaluate patients with cognitive symptoms have their own individual levels of expertise and comfort based on their training experience. It is these variables that will determine how best the clinician will utilize the proven, accurate, added information supplied by the urine NTP measurement. The result of the urine NTP measurement along does not prove or disprove Alzheimer's disease, nor does it indicate treatment or no treatment. It does, however, provide additional useful information for the clinician to use in the process of arriving at a clinical diagnosis. Thank you.
DR. TAYLOR: Thank you. Dr. Averback?
DR. AVERBACK: Can we have the next slide, please? I'd like to give the conclusion now. And the good news is it's a very short conclusion. I also want to thank our speakers who have presented a very good case, and thank you very much.
Our conclusion is that urine NTP adds useful information for an unmet medical need. And we have a twofold thesis that this is useful information. We've tried to show you how it's useful, and that it's entirely safe. There's no bodily risk to anybody. There's no downside. There's no conclusive invasive action that can come from it. Next slide, please. And I leave you with the scatter plot again, which we've shown frequently today, that demonstrates the difference when the NTP was done with these different diagnoses. Thank you very much.
DR. TAYLOR: Thank you. I would like to thank the Nymox presenters for concluding their presentation well within schedule. So that's laudable, right? And difficult. I'd like to ask the panel members if they have any specific questions that they would like to pose at this point in time to any of the presenters for Nymox this morning. Okay, Dr. Lopez has a couple of questions.
DR. LOPEZ: I have two questions. One is regarding the -- can you hear me?
DR. TAYLOR: Just state your name, please.
DR. LOPEZ: I'm Oscar Lopez. I'm a neurologist and Professor of Neurology at the University of Pittsburgh. My question is regarding the severity of the dementia in the cases with probable Alzheimer's disease. Have you broken down the group by severity? It's possible that all these results are driven by more severe cases. You showed before there were some cases that they have early symptoms of disease. That would be important to know if the test is positive in cases of moderate and severe to moderate, and if they're less sensitive in the mild cases.
The other question is regarding the possible cases, the possible Alzheimer's disease cases. According to the NINCDS criteria, these cases are patients with dementia that they have an atypical progression, or cases that have Alzheimer's disease and other conditions that by itself can cause a cognitive impairment. So in that group, so this is a group that's supposed to have two conditions, Alzheimer's disease plus something else. And I was expecting to see a much better sensitivity of the test in the group of possibles. And this is the group that goes to the PCP. This is the group that goes to the family doctor. This is the group that has comorbidities, and is a group that is going to be followed by -- is the group that sees the PCP frequently because of all these comorbidities. So this is the trouble group. And for what I can see here, it's almost half of the group was identified by the test. So it would be important to clarify who were those possible AD cases.
DR. RICHTER: And Dr. Lopez, in answer to the first question, there was some sorting out by the Mini-Mental in terms of severity. So that the mean -- the ones with probable AD, if you look at kind of their mean value of the MMSE, it was 19, in that range, versus the ones who had, the MCI had in the range of 28 - 27, so only milder one. And the ones that were possible all had a higher Mini-Mental in the low 20 range, so that it was sifted out on that basis.
In terms of your question relating to other diagnoses, I think we did our best to try to distinguish one from another, but you really can't unless you have pathological confirmation. All my patients had brain SPECT scans, which were of some help in diagnosis. A number of them now are getting PET scans. But even there you can be misled as to the, you know, as you know, to the findings, the validity. If their change is more frontal, they're going to be parietal-temporal to a greater extent. What if you have asymmetry? Does the disease begin more on one side in some individuals than others? You know, are there vascular components? None of them had -- there were no clear-cut stroke patients that were in the possible. Does that answer in a limited way? Thank you.
DR. REYES: Mr. Chairman, could I address that question?
DR. TAYLOR: Yes, go ahead. Please again state your name.
DR. REYES: Thank you, Dr. Lopez. Those are good questions. The thing of the --
DR. TAYLOR: Could you state your name again, please, for the record?
DR. REYES: I am Dr. Patricio Reyes.
DR. TAYLOR: Thank you.
DR. REYES: The objective of the study was to correlate NTP measurement with the diagnosis of Alzheimer's disease. The severity of the disease was not a major objective of the study, but it's a good point that you raise. So, what they did was classify patients with probable/possible according to accepted NINCDS-ARDRA criteria, and not the severity. And also MCI, because current evidence indicates that MCI patients -- this came from Mayo Clinic investigators -- a significant number of them can convert to Alzheimer's disease. About 11 - 15 percent per year. The recent physician study indicates 33 percent, after three years, a patient with MCI, may convert to Alzheimer's disease.
DR. TAYLOR: Thank you. Any other comments?
DR. AVERBACK: Paul Averback. I agree with Dr. Reyes, it's a very good question from Dr. Lopez. I would just point out, but I'm sure you know, that the specifically of possible AD is generally considered to only be 50 percent. So while the sensitivity is high, and I echo your sentiment, with a specifically that's low, we would expect that possibly up to half the cases would in fact not be Alzheimer's. To directly answer your specific question, your second question.
DR. TAYLOR: Thank you. I believe, did Dr. Blumenstein have a question? No? Dr. Parisi?
DR. PARISI: I just have a comment --
DR. TAYLOR: This is Dr. Parisi for the record.
DR. PARISI: Dr. Parisi. First of all, I can't underestimate the importance of the autopsy in confirming the diagnosis in these cases. And I wonder if any of your cases have come to postmortem exam, and what's that shown?
I also have to take a bit of an issue with Dr. Reyes' comment about the uncertainty of the pathology. I think we do a pretty admirable job of classifying these disorders at autopsy. And I think certainly the studies have shown that the clinical diagnosis is correct probably 80 or 85 percent of the time, and we often turn up with something different at postmortem. So.
DR. TAYLOR: Response?
DR. GOODMAN: I'm Dr. Ira Goodman. Concerning autopsy material, in our group, the only autopsy material that we had the correlating urine was with PS, the last patient that I showed. The patient with the presenilin mutation we did not get autopsy results. And so far, all the patients that are in the study are registered in our brain bank program, but it's only been about two or three years, and fortunately they're still -- we're still following them.
DR. TAYLOR: Was there another question? Yes. Please, again give your name.
DR. NATH: Avi Nath. And I was wondering if you have any longitudinal data on any of your patients? Do you know if as they deteriorate, does the level change, or stay the same, or what happens to it?
DR. AVERBACK: Paul Averback. Do you mean for this clinical study?
DR. NATH: In this or outside.
DR. AVERBACK: In other studies? We have done some published studies that have followed people up for longer periods. And it's an area that we're very interested in, and it's something that we are looking into quite a bit. But to say something controlled and conclusive, I would be hesitant.
DR. NATH: And the second question I had was with regard to the point made about the difficulty in diagnosis of Alzheimer's, and that a lot of these patients may have mixed diseases, which is either with Parkinson's or Lewy body disease. I was wondering why in your study design you didn't consider the possibility of maybe taking pure Parkinson's Disease, or pure NPH patients, or Huntington's, just to see if they would have NTP levels detectable above the norm range or not.
DR. AVERBACK: That's a very good point. The design was supposed to be all comers to the clinics. So there could be no selection. And of course, no study is perfect. People with clear-cut strokes aren't referred. So a case that would be, for example, a definite vascular dementia, where there's a clear-cut stroke, they don't tend to get referred to studies like this to dementia clinics. So the fallout of the cases is purely consecutive new cases as available.
DR. NATH: But you understand my concern is that if you have any other neurodegenerative diseases, it's possible that maybe -- I'm wondering if this could be just it being released from the brain itself irrespective of whatever the neurodegenerative condition might be.
DR. AVERBACK: Well, we were testing it -- sorry, go ahead.
DR. NATH: And it may not be specific for Alzheimer's disease, even though you know, in your probable Alzheimer's you see a difference compared to others. And it may be that if you had a similar number of Parkinson's patients, you'd probably see similar levels.
DR. AVERBACK: Two answers to that question. It's supposed to be the realistic clinical context. So we had to take a realistic consecutive group of patients. So that's -- we had to do that. There is a study published in the Journal of Clinical Investigation in 1997 that had a control group neurologicals using this marker with a different assay. But there was no significant positivity in some of the conditions that you mentioned. And there are a couple of studies that we have published that have found the same thing as well. But I wouldn't be so bold as to say that it's impossible that all conditions might not have some overlap.
DR. TAYLOR: Thank you. Were there any other? Yes.
DR. GULLEY: James Gulley. I have a question for Dr. Bloch about the performance characteristics of the test. I just was, just for my edification, where were the numbers for prevalence for both the positive predictive value and the negative predictive value obtained from?
DR. BLOCH: Well, the positive predictive value through Bayes' theorem is directly related to the sensitivity, specificity, and the prevalence of the disease. So that's not a number you get directly from the data. It has to be derived through a formula that relates sensitivity, specificity, and prevalence in a certain way, and that is the definition of positive predictive value. And so, the way you derive it, so where the numbers come from, is that there is, for example, for positive predictive value, the prevalence of 78 percent, sensitivity was 60 percent, and so on. Those are the numbers you need to plug into the formula to obtain positive predictive value.
DR. GULLEY: Right. The 78 percent, where did that come from?
DR. BLOCH: That's the study percent of patients that had possible, probable, or MCI.
DR. GULLEY: Right, okay.
DR. BLOCH: Okay.
DR. TAYLOR: I think Dr. Gollin was next.
DR. GOLLIN: I have a more technical --
DR. TAYLOR: Okay, name please?
DR. GOLLIN: I'm Susanne Gollin. I have some more technical questions, possibly for Dr. Averback, about the protein and the antibody. Am I correct in stating that the antibody was prepared against an inferred peptide sequence from the CDNA?
DR. AVERBACK: Paul Averback. The statement we made about the CDNA was a statement that NTP was initially characterized based on CDNA that was derived from postmortem human AD brain. Not the device. This is the background research.
DR. GOLLIN: Okay.
DR. AVERBACK: We were pressed for time. I didn't want to -- we just didn't have enough time.
DR. GOLLIN: Okay, so the antibody used in the test, was that derived against a peptide inferred from that CDNA, or was it from an isolated NTP protein from normal brain?
DR. AVERBACK: The exact construction of the test is some proprietary information that we'd have to go into another room to give you answers, which we'd be delighted to do.
DR. GOLLIN: Okay. Now, I'll ask another question, and this is referring specifically to a 2005 paper in the literature. Does the CDNA sequence isolated from the Alzheimer's brains correspond to the genomic sequence of this alleged gene?
DR. AVERBACK: Jack, could you please put a slide up? We'll have an answer for you for that in a second.
DR. GOLLIN: Okay.
DR. AVERBACK: The background of NTP, it started in research done at Mass General Hospital and Harvard Medical School, and then at Rhode Island Hospital and Brown University in teams led by Dr. Jack Wands and Dr. Suzanne de la Monte. And they have been working on this for close to 15 years. And your specific question, the answer to that paper is that -- I would refer you to this paper in PNAS, which is considerably more authoritative, which explains some of the issues that you've mentioned.
The sequence that the Harvard people derived was from an Alzheimer's brain, and it was work with real test tubes. The paper that you quoted is a software analysis of the human genome. These are not strictly comparable. And we quote a couple of papers here that have worked with the same sequence that we're working with. And one in particular, the Britten paper, gives a good explanation for some of these sequential elements.
DR. GOLLIN: So have you evaluated the population frequency of the variation in the various human populations? This would be important in terms of testing the general human population. Are there polymorphisms that may lead to altered results on the test because of altered proteins?
DR. AVERBACK: That's a very good question. We have not. The percentage difference that's been hypothesized -- and we don't accept the hypothesis, nor do these other papers -- is so small in terms of the total peptide count that I would say it's extremely unlikely.
DR. GOLLIN: It wasn't clear to me the percentage of minorities in the population studied. The examples that were given that said what the rates of the individual was were all Caucasians. So would there be any evidence in the population data that would lead to false negatives, or false positives, due to sequence polymorphisms as a result of race, for example?
DR. AVERBACK: Here's our race data. I mean, in this limited study of all comers, it was pretty representative in the populations that we studied. But we don't have tools to detect these 3 percent snips that you're talking about. And as I say, other workers don't buy into that hypothesis anyhow. So our opinion has been corroborated by American studies published in PNAS. Yours was a software study out of Germany.
DR. GOLLIN: So, do any other proteins cross react with your antibody? Is it very specific for only the neural thread protein, and not any others? Did you do precipitation studies, for example?
DR. AVERBACK: We tested the ones that are listed in the panel package. A goodly list of the usual suspects. And in this assay, there's no cross reaction. This assay does not perform well if there is too much solute. That's why the urine creatinine concentration has to be less than 225.
DR. GOLLIN: Thank you.
DR. AVERBACK: Thank you.
DR. TAYLOR: Thank you. In the interest of remaining on schedule I'm going to ask Dr. Lichtor to hold his question till after lunch, and we will proceed at this point in time with the FDA. So the first speaker, I believe, for the FDA is going to be Dr. Robert Becker, who is Director of the Division of Immunology and Hematology Devices. And then I will ask Dr. Becker to introduce the other FDA speakers in turn. Thank you.
DR. BECKER: I am Robert Becker. I'm Chief of the Division of Immunology and Hematology Devices in the Office of In Vitro Diagnostic Devices. I have no commercial affiliations.
The presentations today from FDA will be from three individuals in four parts. First, I will describe the general nature of issues identified by FDA regarding performance of the urine NTP test, given its proposed intended use. I will also give a short treatment of the challenges that exist for any Alzheimer's diagnostic test due to the natural history of the disease, and the usual unavailability of histological specimens for test validation. Last, I will discuss FDA's evaluation of pre-analytical and analytical data in the sponsor's PMA submission relevant to precision, and the cut point selected for this test. The second presentation is from Dr. Marina Kondratovich, mathematical statistician with the Division of Biostatistics in our Office of Surveillance and Biometrics. She will discuss our findings from the clinical data that the sponsor submitted for review. The third presentation will be from Dr. Ranjit Mani, a medical reviewer and practicing neurologist with the Center for Drug Evaluation and Research. He will discuss the clinical effectiveness of the urine NTP test. For the final presentation, I will summarize the earlier presentations and pose discussion points and questions for the panel to consider.
As with any in vitro diagnostic device, our evaluation begins with the intended use. The latest of several versions provided by the sponsor during the course of our review is projected here. The first notable element is the intended use population. That is, patients presenting with cognitive complaints or other signs and symptoms of suspected Alzheimer's disease. No distinction is made as to the setting in which these patients present. Therefore, we conclude that the test is meant for patients at any stage of the diagnostic workup. As the sponsor makes clear elsewhere, this includes patients seen by general practitioners and other non-specialists, as well as by neurologists or other specialists trained in the diagnosis of AD.
The second important point is that the test is for use in conjunction with, and not in lieu, of current standard diagnostic procedures. More specifically, the test is for use as an adjunctive aid for differential diagnosis. Current diagnostic procedures for the most part apply the criteria from the National Institute of Neurological and Communicative Diseases and Stroke for probable and possible AD, and the Quality Standard Subcommittee of the American Academy of Neurology for mild cognitive impairment, or MCI. For brevity, I'll call the NINCDS-ADRDA criteria just the NIH criteria for the rest of my presentation. Sometimes I'll substitute the phrase "clinical criteria" or "clinical diagnoses" for the combination of the NIH and AAN criteria for diagnoses.
The NIH criteria and the AAN criteria are applied with varying rigor and skill by differently trained health care professionals. Therefore, an important review issue is how well the urine NTP test results work in combination with the usual clinical criteria to inform both primary care and specialist diagnosticians. Characterizing the urine NTP test as adjunctive is an important element to this device's claim. An adjunctive test is subordinate to a parent diagnostic process. For Alzheimer's disease, the parent diagnostic process is the clinical diagnosis determined using the NIH and AAN criteria. The adjunctive test is applied to clarify results from the parent test. It might be used to improve on the sensitivity that is provided by the parent test alone. To be useful in this role, it must improve sensitivity without significant loss of specificity. Alternatively, the adjunctive test might be used to improve specificity while avoiding a loss of sensitivity. And sometimes, improvements in both sensitivity and specificity are sought.
It is essential to recognize that there is only one way to establish improved sensitivity or specificity for the combination test relative to the parent test. That is to compare results from the parent test, alone and combined with the adjunctive test, to results from a higher standard, such as histology in the case of AD. Without such a comparison, it is possible to determine bounds within which the combination test performs perhaps better, perhaps the same as, or perhaps worse than the parent test alone. However, it is not possible to know which state of affairs occurs when using the combination test.
The next point to consider is that the sponsor's intended use and the indications for use define a particular distinction that the urine NTP test is to help us make. The test is designed to help distinguish patients who do not fit the criteria for possible AD, probable AD, or MCI, from patients who do fit the criteria for one of these categories. Patients not fitting any of the NIH or AAN categories are designated definite non-AD patients. It is well known that patients outside the definite non-AD class vary substantially in their cognitive state and prognosis, and so the further distinction of such patients is also important. Indeed, the original intended use put forth by the sponsor concerned distinction of probable AD patients from those meeting criteria for possible AD, for MCI, or for none of the clinically defined AD-related classes. However, the currently proposed labeling aims only at distinguishing definite non-AD from not definite non-AD. And our remarks mainly address this distinction.
The last point we consider from the proposed indications for use concerns the refinement of diagnostic workups. This is in wording that, quote, "An elevated urine NTP level may help the clinician's decision for the need of further diagnostic workups, such as specialist consultations, imaging, in depth neuropsychological testing, EEG, and other testing procedures." Since the urine NTP test can be used by general practitioners and specialists alike, FDA takes this wording to mean that test results will help the general practitioner to decide whether referral to a specialist is needed. In addition, specialists themselves might use the test to help select advanced diagnostic studies for particular patients. FDA will be asking the panel's opinion and recommendations on whether the studies under review support such a claim. If so, what labeling caveats, if any, would be appropriate? If not, what further studies would be needed?
In summary, FDA will be requesting the panel's input to determine whether the settings and studies reported in the sponsor's PMA submission are sufficient to support use of the urine NTP test as an adjunct to clinical testing, or to prompt diagnostic or management changes. FDA will also be addressing issues pertinent to use of the NTP results to exclude patients from the two specific categories of disease targeted in this submission: definite non-Alzheimer's disease and probable Alzheimer's disease, given the diagnostic continuum of affected patients. We will address an issue of intra-patient variability in NTP results over time, and we will also pose questions concerning adequate sampling of the intended use population.
It is worthwhile to consider the challenges confronting any in vitro test aimed at diagnosing AD. The field has near consensus on the reliability of histological criteria for diagnosis of AD, and results from histology are treated as a gold standard against which other testing modalities can be compared. However, the tissue needed to make such comparisons is rarely available early in the course of AD. Even postmortem studies of tissue from advanced cases require substantial time and effort to acquire a large number of cases. But without histology, there is substantial risk for confusion of AD with other dementing disorders, or for failure to recognize AD in the presence of other disorders.
In practice, the NIH criteria for probable AD and possible AD, and the AAN criteria for MCI, help to systemize the workup of patients presenting with complaints consistent with AD. However, it is not realistic to treat the categories defined by these criteria as complete diagnoses in their own right. This is in part self-evident from the NIH terminology, which defines categories only according to the likelihood of AD. The difficulties are given sharper focus by studies that compare results of category assignment by NIH criteria with histological results for patients known to be demented at death. Though about 90 percent of patients classified as probable AD showed its histological signs, only about 75 percent of possible AD patients had positive histology. For patients classified as non-AD, about one-third nevertheless had histological diagnosis of AD. The puzzle of MCI further complicates matters because patients fitting this class in which overt dementias is absent by definition, can indeed show histological signs of AD, and they go on to clinical diagnosis of AD at a substantial rate. It appears that each category mapped by the NIH and AAN criteria contains patients in a variety of states, some with Alzheimer's disease at variously advanced stages, some with AD coexisting with another dementing disorder, and some without AD at all.
Despite their deficiencies, the clinical criteria have been used effectively in cohort studies, and they have been used as the best available endpoint for important therapeutic studies. As insights from the pathophysiology of AD accumulate, and with the ever-increasing ability to define and measure molecular features, emphasis has shifted to the use of biochemical markers, or biomarkers, as our best hope for diagnosis and perhaps intervention against AD. There is significant ambiguity as to what constitutes a valid biomarker. Here, we're concerned with diagnostic effectiveness. There is at least one consensus publication addressing this issue, from a working group convened by the Ronald and Nancy Reagan Research Institute of the Alzheimer's Association and the National Institute on Aging. Their findings were that an effective biomarker should detect a fundamental feature of AD neuropathology, and be validated in neuropathologically confirmed cases, that it should approach 85 percent sensitivity when compared with histology, and that it should have a specificity approaching or exceeding 75 to 85 percent.
OIVD emphasizes the importance of the working performance characteristics, sensitivity, specificity, positive predictive value, and negative predictive value more than the pathophysiologic explanatory power of the biomarker. Still, we retain demonstration of strong diagnostic performance vetted against the pathologic features of AD as an essential requirement to establish a diagnostic claim for this disease. This requirement bears directly on the establishment of safety and effectiveness for an AD diagnostic test. We recognize that a morning void urine test for neural thread protein poses no direct risk to the patient. The remaining important aspect of safety is that reliance on the test should not put more patients at risk for misdirected management than benefit from improved management. Making a direct improvement in the accuracy of AD diagnosis, that is improving the accuracy of diagnosis through improved sensitivity and specificity of the tests that underlie them, is a highly desirable outcome. Proof of such performance might be impractical in many circumstances so long as histological truth is essential. We note again, however, that establishment of a less ambitious claim, such as improved efficiency or overall consistency in application of the current clinical diagnostic criteria might provide a significant public health benefit. In evaluating such a claim, FDA also has a keen interest in the reliability with which health care providers can use the test for the benefit of individual patients.
I'll now turn to a brief description of two pre-analytical and analytical issues that are of concern to FDA from review of the PMA submission. The first issue concerns the patient dropout rate in the study due to inability to collect a valid first morning void specimen. There were 133 patients out of the 366 enrolled who were excluded from the study for this reason. The protocol's procedures for obtaining urine specimens were suitably detailed, and it appears that multiple attempts to obtain a valid specimen were made for many of the excluded patients. Yet, 36 percent of patients could not comply. We cannot know whether the numerous dropouts imposed a bias either for or against the agreement of NTP results with the clinical categories. It is plausible that dropouts were more common with more advanced dementia, and that advanced dementia might therefore be underrepresented in the submitted data set, versus the selected intended use population. Altered disease prevalence would affect the predictive values that are reported by the sponsor.
The second issue arises from figures on the intra-subject biological variability of urine NTP results. During our review, we asked the sponsor to investigate whether the degree of patient hydration affected urine NTP results. We were concerned that variations in hydration might produce an extraneous, that is not AD-related, variation in NTP results. The sponsor replied that hydration issues were of unknown importance, but that they were adequately and most practically addressed through the requirement for first morning voided specimens, plus the urine creatinine check known, or meant, to corroborate first void status. At the same time, the sponsor provided multiple specimen data for 14 patients, nine of whom had reportable urine NTP results spaced no more than one month apart. This is a short interval compared to the time course of AD development. The results for these patients showed variation averaging 5.3 micrograms per milliliter for the six specimens in which the change could be fully calculated. The other three patients, each of whom either overshot 60 micrograms per milliliter or undershot 8 micrograms per milliliter for one specimen, generally showed variations larger than 5.3 micrograms per milliliter. None of the nine reported patients had values straddling the 22 micrograms per milliliter cutoff that divides normal from elevated urine NTP levels. However, several of the patients had NTP values that were near the cutoff value.
FDA's concern is that the degree of intra-patient variability observed for the small number of patients reported by the sponsor calls into question the reliability of the urine NTP test for characterization of individual patients. In this scatter plot furnished by the sponsor, I've added a bar representing both the intra-run standard deviation of the assay result for single samples. That's 2 to 3 micrograms per milliliter using in this case 2.5 mics as an estimate. And that's the bar on the left. I've also added a bar representing the average intra-patient variation using 5 micrograms per milliliter as the estimate for samples obtained no more than one month apart. For one quarter, that is 50 out of 200 patients, the urine NTP result reported for the clinical study was distanced from the cutoff value by no more than one-half of the average intra-patient variation. We also noted from the analytical data that the intra-run standard deviations for medium and low NTP specimens analyzed at the sponsor site were one-half to one-third of the standard deviations observed at remote sites. The sponsor's intra-run standard deviation was about one-half to nine-tenths of the remote site's SD for high NTP specimens. This raises an issue of training and experience for new users that might benefit from attention by the sponsor. No inter-laboratory reproducibility data have been provided by the sponsor.
We turn now to detailed analysis of the clinical study data provided by the sponsor with their PMA submission. The study design and data set were as follows. Nine study centers staffed by specialists in memory disorders obtained urine specimens and worked up 200 patients according to NIH and AAN criteria. The FDA expectation and intent according to the study protocol was that all patients would be undiagnosed and newly referred for evaluation. During the course of review, it became clear that patients were enrolled at disparate points in their evaluation and care. The effect of this patient heterogeneity on the observed concordances between NTP results and clinical categories remains undetermined. The same is true for the effect of patient heterogeneity on characterization of test performance for particular intended use patients, such as patients first presenting to family practitioners versus patients much farther along in their diagnostic workup.
This concludes my introductory presentation. We'll begin now with Dr. Kondratovich who will continue with her presentation with the statistical analysis of the clinical data set.
DR. KONDRATOVICH: Good morning. My name is Marina Kondratovich. I am statistician from Division of Biostatistics, Center for Devices and Radiological Health. In my statistical presentation, I will touch some methodological issues related to diagnostic accuracy of medical test, report on performance of NTP test only for two categories: probable AD and definite non-AD, some comment on evaluating test based on chain of comparisons. Then I will make some statistical interpretation of NTP test as adjunctive aid and as a stand-alone test. Finally, I make some comment on selection of cutoff.
Let me consider the basic concept in evaluation of a medical test. Diagnostic accuracy. What is diagnostic accuracy? Diagnostic accuracy of a test refers to the ability of a test to identify a target condition, or another term, condition of interest. Target condition refers to particular disease, a disease stage, a health status, or to any other identifiable condition within a patient. Here, target condition is Alzheimer's disease.
In a diagnostic accuracy study, first is the investigation, with results positive or negative. It's applied to represent an example from intended use population. The obtained results are compared with the results of the reference gold standard obtained in the same subject. The reference gold standard is considered to be the best available method for establishing the presence or absence of the target condition. Diagnostic accuracy for medical test can be expressed by a sensitivity-specificity pair, or positive-negative predictive value for some prevalence.
In this study, the NTP test was compared to the four diagnostic categories: probable AD, possible AD, mild cognitive impairment, and definite non-AD, not to the pathological diagnosis. It is well known that these diagnostic categories have some false positive and false negative results. Indeed, in paper by Blacker, the following table provides performance of NIH criteria for Alzheimer's disease. I would like to note, emphasize that these numbers do not present exact performance of the NINCDS criteria, but give us message that NIH criteria has some false positive and false negative results. Even for combined categories probable AD and possible AD, sensitivity is only 81 percent, and specificity 73 percent.
So, the diagnostic categories in the clinical study, probable AD, possible AD, mild cognitive impairment, definite non-AD, cannot be considered as the reference gold standard. Because the NTP test was compared to the four diagnostic categories, not to the reference gold standard, the diagnostic accuracy of the NTP test was not evaluated in this study. Therefore, the use of the terms like "sensitivity," "specificity" in this clinical study can be misleading. For this clinical study, we will use terms like "positive percent agreement" and "negative percent agreement."
The sponsor selected cutoff of 22 units for defining positive NTP results and negative NTP results. After establishing the cutoff, the continuous data can be presented by this table with eight numbers. So this table presents all the data from the clinical study. For every clinical diagnostic category, what is the percent of NTP positive results, what is the number of NTP negative results. Performance of the NTP test is described by the proportion of the NTP positive results among each clinical diagnostic category. For probable AD, for diagnostic category probable AD, the percent of the NTP positive result is 89 percent, 51 divided by total number. For possible AD, the percent of the positive NTP results is 38 percent, 21 divided by the total number of the subjects in this category. For the category MCI, the percent of the NTP positive results, 51, even larger than for the possible AD, 22 divided by the total number. And for definite non-AD, the percent of the positive results, 9 percent, four divided by the total number.
In here in your package you saw the sponsor's statistical analysis which compares NTP values for the patient from the extremes of the NIH classes, probable AD and definite non-AD, by discarding 50 percent of intended use population, subjects from possible AD and MCI categories. It can appear that the percent of the positive NTP results among probable AD patients, and the percent of negative NTP results among definite non-AD patients can provide some information about performance of NTP test with regard to reference gold standard. However, these estimates are invalid for the following reasons. First, not all probable AD subjects are subjects with AD-positive pathological diagnosis, and not all definite non-AD subjects are subjects with negative pathological diagnosis. Second, probable AD subjects does not present a random sample of the subjects with AD positive pathological diagnosis, and therefore, spectrum bias can occur. What does mean spectrum bias? It means that probable AD patient and definite non-AD patient are more likely to exclude difficult or complex diagnostic cases for which an unknown number may have been assigned to possible AD or MCI categories. So reporting of the NTP test only for two categories, probable AD and definitely non-AD, may be significantly overstated performance of the test.
Let me consider another problem. We have reference gold standard G. And test A was compared to this gold standard, and we now estimate of diagnostic accuracy of the test A, sensitivity A and specificity A. Also, we compare test B to the test A, and estimate the measures of agreement between test A and test B, positive percent agreement and negative percent agreement. It can appear that the diagnostic accuracy of the test B accuracy here means relative to the gold standard G can be sometimes obtained through this chain of comparison.
Let me consider this hypothetical scenario. I would like emphasize that I use the results from the Blacker paper only to demonstrate the idea. It's not the purpose to feed exactly the data which was in this study or in the Blacker paper. So, like for example, we have combined category probable AD and possible AD. And we had 113 subjects. Definite non-AD, we have 44 subjects. From estimation of positive percent agreement and negative percent agreement, this is the comparison of the test B. Here is NTP test B relative to test A. We know that there are number of the NTP positive subjects 72, and there are number of NTP negative subjects the same. Because we know negative percent agreement, we know the number of the NTP positive subjects, and NTP negative subjects for this category. It is easy to show that based on information about sensitivity and specificity of the test A -- here is test A -- it's easy to obtain that 113 subjects from this category is split in 106 true AD-positive subjects and 24 true negative subjects. And 44 subjects can be split in 24 true AD subjects and 20 true AD negative subjects. So, numbers in the marginal cells, these numbers are known, but there is uncertainty in the cell values because there are a lot of possible combinations for these numbers in order to have the same sum in the row and in the column.
So we can consider, for example, best case scenario and worst case scenario. We can put numbers, the maximum allowable numbers, in the correct cells, and obtain sensitivity and specificity, like sensitivity around 58 percent, specificity 100 percent. Or we can consider worst case scenario, when we're putting the maximum number in their own cells, out of diagonal. Then sensitivity will be only 50 percent, and specificity 59 percent. So, even if we know performance of the test A relative to the gold standard, and performance of the test B relative to the test A, there a big uncertainty about the performance of test B relative to the gold standard. So, we really don't know what is the performance of the test B. Here is the test and NTP. Again, I would like emphasize that this is only example to demonstrate idea, because for example, we are making assumptions that the performance of NIH criteria is the same like in the Blacker paper, like in the study which we really don't know. So this is only to demonstrate idea that even if we know this chain of comparison, it's difficult to come back through the chain of comparison and obtain really diagnostic accuracy of the test B.
Let me consider intended use/indication for use. This is some citation. "Results from urine NTP kit are intended for use in conjunction with current standard diagnostic procedures. Urine NTP measurement can be used as part of diagnostic risk assessment for presence or absence of definite non-AD. So, according to indication for use, NTP tests should be used as adjunctive aid. Here is the table of clinical data. NTP test as adjunctive aid implies that NTP test can provide additional assurance that the patient diagnosed to have definite non-AD after undergoing a standard assessment, does in fact have this diagnosis. The only means of establishing that is to confirm the diagnosis further by other means. Note that from this data it is impossible to know whether the patient with clinical category, for example, definite non-AD and negative NTP results have a lower probability of definitive AD by histology reference compared to the patient with clinical category definite non-AD and positive NTP results. In this study, this information is not available.
Let me consider some example. When a clinical impression is definite non-AD, how much certainty is added to the diagnosis by negative NTP results, when the negative NTP results can occur 61 percent of the time for the patient other than definite non-AD? Should the clinician discard the clinical diagnosis of definite non-AD if the NTP test is positive? From the submitted data, it is impossible to evaluate the significance of this disagreement because there is no comparison to the higher order standard.
Let me consider another example. When the clinical impression is probable AD, how much certainty is added by positive NTP results when clinicians know that the positive NTP results can occur 48 percent of the time for the subjects with other than probable AD categories? Should the clinician discard the clinical impression of probable AD if the NTP test is negative? The significance of this disagreement was not evaluated, and so information is not available.
Let me consider right now statistical analysis which was presented by the sponsor, and which you have in your review package. In your review package, you have two statistical analysis. One statistical analysis was the group, combined group, probable AD, possible AD, and MCI like one group, all the subjects considered like one group, versus definite non-AD. So we can calculate positive percent agreement, and a lot like sensitivity, it was about 60 percent. And negative percent agreement, a lot like specificity for definite non-AD. Also, we can calculate positive and negative predictive value. As was mentioned for calculation positive and negative predictive value, you can use some kind of Bayes theorem using prevalence and sensitivity and specificity. But there are some more easier way to calculate positive and negative predictive value. By definition, what is the positive predictive value? This is the probability that the subject belonged to this class, positive class, if their NTP test is positive. So, we have 98 positive subjects, NTP results are positive. What is the probability that among these subjects, really person belongs to this class? You need to divide sum of these three numbers to 98, and it will be positive predictive value. The same for the negative predictive value. By definition, definition of the negative predictive value, this is the probability that the subject belong to negative class, here's our negative class, if the NTP results are negative. So, 40 divided by 102, because 102, this is the total number of negative subjects, and 40 subjects really belong to this negative class.
Prevalence also can be calculated from this table. Prevalence is, for example, for this class, prevalence for probable AD, possible AD, and MCI. You need to add these three numbers, and then in order to calculate percent, divide it by 200. Then it will be 78 percent. Another type of statistical analysis which you have in your review package was how NTP tests can distinguish between probable AD, and another group. Group is like possible AD, MCI, and definite non-AD. For demonstrating effectiveness of the NTP test, characteristics of NTP performance which are analog to positive and negative predictive value were compared by the sponsor to the corresponding prevalence. In this exercise, NTP test was treated as a stand-alone test. Indeed, if we have in this study that the prevalence of this class, possible, probable AD, and MCI 78 percent, prevalence of the definite non-AD 22 percent. Suppose that one decided to call a patient positive. If I tossed a coin, lands head. And negative, if I tossed a coin, lands tail. For this test -- so all this subject right now has toss of the coin positive. It's easy to see that what is the percent of this class among all these subjects. It's easy to see they're the same, 78 percent. Similar, for the subjects which are negative by the toss of the coin, what is the percent of the definite non-AD for the subjects which are negative by the toss of the coin? These all subjects which are negative by the toss of the coin. And among them percent will be the same like prevalence, 22 percent. So, the NTP test in sponsor analysis. They named this with NTP test was compared with the performance of random test, which was named without NTP test. Random test, it means like toss of the coin, toss of the die. So there are completely uninformative test. In the sponsor's statistical analysis, NTP test was treated as a stand-alone test in contrast to adjunctive use, which was proposed by the sponsor in the intended use/indications for use.
So, let me consider the performance characteristics of the NTP test for distinction between class probable AD, possible AD, MCI versus definite non-AD. For NTP test positive, probability that patient belonged to this class was 95.6 percent, with low bound of 95 percent confidence interval, 90 percent. For the random test, driven by prevalence, if the random test is positive then this probability is equal prevalence. So difference is 17.6 percent, with low bound of 95 percent confidence interval 12 percent. Ninety minus this. For NTP negative test results, probability that a patient belonged to definite non-AD was 39.2 percent, with the low bound of 95 percent confidence interval 30.3 percent. If the random test is negative, then this is the prevalence, 22 percent. The difference was 39.2 percent minus 22, 17.2 percent, with the low bound of 95 percent confidence interval around 8 percent. In sponsor presentation, you saw relative differences. Relative differences mean that 17.6 percent were divided by the prevalence, and then you obtain 23 percent. Or you saw relative difference 78 percent, which means that this difference was considered like what is the percent of this percent from 22 percent. Then relative difference is 78 percent. In my presentation, I will use absolute difference, because relative difference can be some kind of -- have some ambiguous meaning.
So, let me consider statistical interpretation of the NTP results based on the sponsor's statistical analysis. And I would like again emphasize that this is the consideration of the NTP test like a stand-alone test. So, for the NTP results positive, probability that the patient belongs to the class probable AD, possible AD, and MCI, equals 95.9 percent. So all this numbers should be added and divided by 98. So it's like absence of definite non-AD. Rule out of definite non-AD. Because with probability around 96 percent, the patient belonged to the class probable AD, possible AD, or MCI. There are needs for further diagnostic workup. Let us consider the NTP negative results. For NTP negative results, probability of the class possible AD, MCI, and definite non-AD, equals 94.1 percent. For this, you need to consider added -- you need to add all these numbers, divide by 102. So it's like absence of probable AD. I would like to emphasize that absence of probable AD does not mean presence definite non-AD. Absence of probable AD means presence of all this class, possible AD, MCI, definite non-AD. Because with probability at 94 percent the patient belonged to this combined class where possible AD or MCI are not excluded, the clinical decision is further diagnostic workup. So, for both outcomes of the NTP test, NTP positive, NTP negative, the medical decisions are the same. Further diagnostic workup.
Let me consider also confidence interval for these probabilities, because confidence interval is very important piece of information for generalization of the results of the study to the intended use population. In this study, probability of this class was 95.9 percent, with low bound of 95 percent confidence interval 90 percent. It means that if the diagnosis definite non-AD will be ruled out based on NTP positive results, the incorrect decision will be made up to 10 percent of the time, 100 minus 90. In this study, probability of the class possible AD, MCI, definite non-AD, was 94 percent, but 95 percent confidence interval low bound was 87.8 percent. It means that if diagnosis probable AD will be ruled out based on the negative NTP results, the incorrect decision will be made up to 12.2 percent of the time, 100 minus this percent. Again, I would like to emphasize that for the negative NTP results, absence of probable AD does not mean presence of definite non-AD, because probability of definite non-AD if the subject is negative only 39 percent, 40 divided by 102. Like for example, probability of possible AD is around 34 percent, even the subject belongs to all this class. Also, I would like to mention that the absence of definite non-AD does not mean presence of probable AD. Absence of definite non-AD means only presence from all this class.
Finally, let me make some -- a few comments on selection of cutoff. For obtaining the cutoff will relatively high negative percent agreement, an analog of specificity, 44 subjects with definite non-AD diagnosis were used. This is too small a sample to assure a representative sample of all subjects with definite non-AD. This may lead to biased estimates of performance. More details about this problem you will see in clinical presentation by Dr. Ranjit Mani.
The cutoff of 22 units was drawn post hoc from the study itself based on some criteria of optimality. The sponsor checked cutoff values from 20 units to 30 units. So the current data set was used to find the optimal cutoff, and then the same data set was used to estimate the performance of NTP test for this optimal cutoff. It is well known problem of the training and past in fact. The cutoff of 22 units was not validated on independent data set. So estimate of performance of NTP test are overstated to unknown degree. Thank you very much. Clinical issue will be presented by Dr. Ranjit Mani.
DR. TAYLOR: Dr. Becker? The FDA presentation is scheduled to finish at 12:00. Is that okay?
DR. BECKER: Fine.
DR. MANI: Good morning. My name is Ranjit Mani. I'm a neurologist and a medical reviewer in the Division of Neuropharmacological Drug Products at the Center for Drug Evaluation and Research.
What I would like to try and accomplish in about the next 15 minutes is to address two questions. The major focus of my presentation will be on whether the urine NTP test is of clinical value. I will then very briefly touch upon the question of whether the assignment of study subjects to the four protocol-designated categories was appropriate. I will take the liberty of addressing the study data from two perspectives. The first is that of a medical reviewer at the FDA who has worked in the Alzheimer's disease field for the last eight years, mainly on clinical drug development, but also to an extent in evaluating diagnostic tests for that condition. The second is that of a clinical neurologist who has taken care of patients with dementia for almost three decades, and continues to do so.
Let us start by looking at this table which summarizes the results of the key study conducted by the sponsor, and which you have seen before. If one assumes that the conduct of the study was sound, and the methods of analysis appropriate, several conclusions are possible from the results in the table. The vast majority of, but not all subjects in the probable Alzheimer's disease category, had a urine NTP level greater than 22 micrograms per mL. The vast majority, but not all, vast majority of but not all subjects in the definite non-Alzheimer's disease category had a urine NTP level equal to or less than 22 micrograms per mL. However, significant proportions of patients in the possible Alzheimer's disease and mild cognitive impairment categories had urine NTP levels on both sides of the cutoff, indicating that the test may have little value in delineating each of those groups separately from each of the other three groups.
Now, based on the summary data presented in the table, the study results do not run counter to the sponsor's contention contained in the most recent submission that the clinical study results clearly demonstrate the ability of NTP measurement to discriminate between probable AD and definite non-AD, using the device's cutoff of 22 micrograms per mL. However, if one then views the sponsor's scatter plot, which you have already seen repeatedly, and which shows the individual urine NTP levels in each treatment group, an additional concern arises. A substantial proportion of the individual values in each treatment group is clustered relatively close to the cutoff of 22 micrograms per mL, with a range that is within perhaps 10 points of the cutoff on either side. As Dr. Becker had indicated earlier, the urine NTP test may have an inherent biological intra-patient variability extending to 7 or more micrograms per mL that may largely encompass the clustered scores that I've just alluded to. Thus the discriminating power of the urine NTP test at a cutoff of 22 micrograms per mL may be even less sharp than has been suggested by the first table that I showed you.
If, however, you agree based on the table that I showed you earlier that the urine NTP test can to a large extent discriminate between those with probable Alzheimer's disease and definite non-Alzheimer's disease, then the question arises as to whether such a test has diagnostic value in clinical practice. And to explore this question further, it may be useful yet again to try and understand better what these categories mean.
First, let's address the entity of probable Alzheimer's disease. The NINCDS-ADRDA criteria continue to be perhaps the most widely used diagnostic criteria for Alzheimer's disease. These criteria have the ability to delineate what is arguably the purest form of Alzheimer's disease that can be diagnosed without obtaining histopathological confirmation, and the criteria include core elements, supporting elements, features that are consistent with the diagnosis, and features that make the diagnosis unlikely, or uncertain. But based on the criteria that I just showed you, there are three elements to the diagnosis of probable Alzheimer's disease, three key elements: one, the presence of dementia; two, evidence that cognition has progressively worsened; and three, the exclusion of brain and systemic diseases other than Alzheimer's disease that could explain the cognitive changes. And therefore the steps involved in making the diagnosis of probable Alzheimer's disease are generally as follows. Step 1 consists of confirming the presence of dementia and of progressive worsening of cognition, and Step 2, excluding other causes of dementia by blood tests, brain imaging, and other appropriate procedures. And for all intents and purposes, probable Alzheimer's disease is still a diagnosis of exclusion.
There are several important considerations when one looks at the relationship of probable Alzheimer's disease to possible Alzheimer's disease and mild cognitive impairment. These entities may not be pathologically distinct. The core pathological elements of Alzheimer's disease are present in a high proportion of those diagnosed with possible Alzheimer's disease, and probably in a high proportion of those diagnosed to have mild cognitive impairment using the criteria stipulated in this study. And the majority of those with mild cognitive impairment so defined may progress to overt Alzheimer's disease over a number of years.
The stepwise assessments used in making a diagnosis of possible Alzheimer's disease and mild cognitive impairment is generally similar to that used in making a diagnosis of probable Alzheimer's disease. And indeed the treatment of probable Alzheimer's disease, possible Alzheimer's disease, and mild cognitive impairment may be similar, now or at some point in the future. So again, there isn't a sharp distinction between these entities.
Now, let us look at what actual conditions the definite non-Alzheimer's disease category included. The individual diagnosis for each subject in this broad category was entered in that subject's case report form. And this table is based on those entries. And what this table, which is on two slides, indicates is that a small majority of individuals in this category was considered to be normal, and that the remaining individual entities were each quite uncommon. In fact, if one groups all so-called psychiatric entities together, there is a total of only 11 such subjects in this study. Thus, in conclusion, the definite non-Alzheimer's disease category does not represent a single clinical entity, a valid diagnostic term, or a term that most clinicians would be familiar with. It is an artificial construct created solely for the purposes of this study. That category represents a diverse group of separate conditions, the majority of which are individually only minimally represented in the study cohort. Many of these conditions are unrelated to each other. In clinical practice, it is these individual conditions, like age-associated memory impairment, or depression, or anxiety, that are usually diagnosed, not definite non-Alzheimer's disease. A clinician would most likely want to make a distinction between Alzheimer's disease and the individual entities in this group, but not between Alzheimer's disease and the broad category of definite non-Alzheimer's disease. And lastly, how much confidence would a clinician have in making a distinction between probable Alzheimer's disease on the one hand, and age-associated memory impairment, or any psychiatric disorder on the other, based on the results of a study which had only three subjects with age-associated memory impairment, and 11 subjects considered to have any psychiatric disorder.
Now, suppose we still agreed that the study results do help in discriminating between the probable Alzheimer's disease and the definite non-Alzheimer's disease categories. A further question is whether it is actually common in clinical practice to have difficulty making a distinction between individuals who conform to the profile for both these entities. And further, once a diagnosis of probable Alzheimer's disease has been made, does it not by definition indicate that most, if not all, the entities in the definite non-Alzheimer's disease category should have been excluded, assuming that the criteria were correctly applied? Both the published literature and personal experience suggest that once a diagnosis of probable Alzheimer's disease has been made, pathological entities that may still not be excluded include those shown on this slide, among others. This list is not all-inclusive, and these conditions include fronto-temporal dementia, dementia with Lewy bodies, vascular dementia, and normal pressure hydrocephalus. Some of the entities in this list may indeed represent mixed forms of dementia, and entities that are controversial. Importantly, the study results contain no data at all which indicate that the urine NTP test can help in making these distinctions.
Let us now shift to how the sponsor envisages that the urine NTP test can be used in practice. And I will be showing you a succession of statements taken from the most recent submission. Based on these statements, the test design was anticipated to have potential value in evaluating patients with cognitive symptoms and signs, including those suspected of having Alzheimer's disease, confirming the presence or absence of the definite non-Alzheimer's disease category, and distinguishing that category from probable and possible Alzheimer's disease, as well as mild cognitive impairment, helping the medical professional who makes the initial evaluation, for example, a primary care physician, decide about the need for proceeding further to additional diagnostic testing or specialist referral. In other words, a test that is used early in the diagnostic process. The test is intended to be used in conjunction with standard diagnostic procedures, and not as a stand-alone test. Now, let us look at a couple of hypothetical scenarios in which the test may potentially be applied. These scenarios have been alluded to earlier, and criticized. There are obviously a number of other scenarios which could be explored, but I have every reason to believe that the scenarios that I'm about to describe are quite common in clinical practice.
The first scenario is that of a middle-aged man or woman. And my concept of the term "middle-aged" is somewhat different from that of Dr. Bloch, colored somewhat by my own age. This is an individual who is about 55 years old, which I think most of you would agree is middle-aged rather than elderly. Anyway, the first scenario is that of a middle-aged man or woman who complains of poor memory to a primary care physician. The history has been unhelpful in making a diagnosis, there is no clear abnormality on physical and neurological examination in a brief assessment of the patient's mental status. The primary care physician says the assessment inconclusive. Possible diagnoses that the primary care physician wishes to consider include early Alzheimer's disease or other type of dementia, age-related symptoms, depression, anxiety, a busy schedule, etcetera. The urine NTP concentration is equal to or less than 22 micrograms per mL. Let's take 16 micrograms per mL for simplicity's sake. Based on the results of the study, the possibilities include definite non-Alzheimer's disease, possible Alzheimer's disease, and mild cognitive impairment. Probable Alzheimer's disease is unlikely, and the question arises as to whether the urine NTP test will truly help in deciding whether to proceed to further diagnostic evaluation, including referral and more testing, in this scenario. Could a clinician really forgo further testing based on these results? The same scatter plot that has been shown to you before may help further in addressing this dilemma with this particular patient's results falling roughly there.
The second scenario is identical to the first until the urine NTP test is done. The urine NTP concentration is less than 22 micrograms per mL, let us say 28 micrograms per mL. Based on the results of the study, the possibilities include probable Alzheimer's disease, possible Alzheimer's disease, and mild cognitive impairment. The definite non-Alzheimer's disease appears unlikely. The question therefore again arises as to whether this urine NTP test would help in deciding whether to proceed to further diagnostic evaluation, specialist referral, etcetera. The decision may be somewhat easier than in the first scenario, again based on the study results. But more importantly how confident can the primary care physician be that most of the individual entities in the definite non-Alzheimer's disease category have been excluded when most of these entities are minimally represented? And the scatter plot, again, shows you roughly where this patient may end up, somewhere here.
Let's now briefly talk about the sponsor's view of the limitations of the urine NTP test. Based on this statement taken from the briefing document, the sponsor appears to believe that the test is not of value in diagnosing what are termed rare diseases, mixed pathology states, or other incompletely understood complex or controversial entities. This view of the limitations of the test leads to further questions. First, for instance, how does a medical professional such as a primary care physician who requests the urine NTP test reliably determine in advance that the patient has a diagnosis for which the test would not be appropriate. Secondly, how useful is a diagnostic test that is only applicable in those instances where a diagnosis is already easy to make, in other words, in purer forms of the disorder. And it is worth noting that mixed pathological states and incompletely understood or controversial entities are not uncommon in a population with dementia or impaired cognition.
Lastly, I will briefly address the procedures used to assign study subjects to the four diagnostic categories. It remains somewhat troubling that many key study assessments that form the basis for assigning subjects to the respective categories were not performed concurrent with the study. Some assessments were done as much as three years prior to study entry, whereas others were done after the study was completed, and in fact after this application was first formally submitted. Further, a number of subjects entering the study were not newly diagnosed, raising a question as to whether the study results are as a whole applicable to newly diagnosed subjects. In addition, the methods used by the investigators to make individual diagnoses for subjects in the definite non-Alzheimer's disease category are unexplained, and we have residual concerns about the adequacy of the evaluations, and the uniformity of diagnostic methods.
So, in summary, this agency continues to have significant concerns about the results of the sponsor's study. One, were study subjects credentialed appropriately, and two, is the urine NTP test of clinical value. Thank you very much.
DR. TAYLOR: You have about five minutes, Dr. Becker.
DR. BECKER: I'll conclude today's FDA presentations with a summary and restatement of key points. The urine NTP test is intended for use as an adjunctive test, adding to the information obtained from other tests that underlie the NIH and AAN criteria for AD and MCI. In particular, the sponsor combines the patients who do not fit the definite non-AD criteria, and claims to distinguish them from definite non-AD patients with 91 percent specificity. Dr. Kondratovich has demonstrated that a 91 percent negative agreement between NTP results and the NINCDS classification does not imply 91 percent diagnostic specificity. Indeed, we do not know the specificity or the sensitivity of the NTP test used alone, or in combination with clinical criteria for distinguishing patients who truly have AD from those who do not. The combination might perform substantially worse than, the same as, or better than the clinical criteria used alone. We cannot know what the true answer is without comparing results from the clinical criteria and from the urine NTP test to a higher standard.
The sponsor asserts that the urine NTP test improves the positive predictive value and the negative predictive value of patient classifications compared to the prior probability, the PPV and NPV characteristics. We note that the prior probability performance accounts for only the natural distribution of patients across the four classes. That is, the sponsor's comparison treats the urine NTP test as a stand-alone test, and indeed, as a stand-alone test that is examined for its ability to emulate the imperfect NIH and AAN criteria, but not histologic reference criteria. It is a comparison in which the diagnostician's skills in applying the clinical criteria are factored out of the process. On the contrary, the diagnostician, whether trained in primary care or a specialty surely brings skills to the process. The diagnostic efforts of the physician, whether he's a primary care or a specialist physician, are not worthless. We need to know how well the NTP test complements those skills. FDA seeks the panel's opinion as to how well the data submitted by the sponsor answered this critical question.
The sponsor highlights comparisons for which we seek the panel's opinion concerning clinical significance. These include distinguishing definite non-AD from not definite non-AD, and probable AD from not probable AD. Scenarios have been described already, but this area is worth another visit. Imagine that a skilled evaluator, using the NIH and AAN criteria, favors definite non-AD for a patient, and then obtains a urine NTP test that is negative. Should he conclude that the likelihood of definite non-AD is now increased? That is, are there test performance data to support such a view? Furthermore, the clinician knows that a negative NTP result is certainly consistent with MCI and with possible AD, two classes that include a substantial number of patients with AD pathology. Should he alter the course of the patient's workup in any way, given this distribution? Imagine another physician, solidly convinced of definite non-AD for her patient, who nevertheless sends a routine urine NTP test, and gets a positive result. Should she consider this to be a save? Has she avoided a terrible mistake? This would perhaps be an easy scenario if the physician already had doubts about the truth of a definite non-AD hunch, since there may be enough divergence of urine NTP results among MCI patients of justify going ahead with a workup, whatever the NTP result, when MCI is in the picture. But this physician was solidly convinced that her patient is definite non-AD until the urine NTP test came back. Should she now line up a referral at a regional AD research center? Or should she stand on her clinical skills and dismiss that pesky lab result? Is the lab result a signal? Is it a red herring? Does it have any known significance at all? The answer is we don't know.
I'll say a few more words about within-patient variation of test results. We found that 25 percent of the test results can be bracketed within an interval centered on the cutoff value, and representing a conservative estimate of the average within-patient test variation. We recognize that only a small number of patients were multiply tested by the sponsor. Does the panel believe that the sponsor should pay careful attention to eliminating or compensating for short-term within-patient variation of test results?
Lastly, the last topic concerns whether intended uses that do not require histological validation can be framed for the urine NTP test, and other AD tests on the horizon. We believe the answer is yes, and that these uses share some aspects of intended uses that were brought forward for the PMA under consideration. There might be room, given well validated IV tests, to improve medical practice, even with the imperfect clinical diagnostic criteria now in use. One opportunity, as noted by the sponsor, is in more rapid and appropriate referral of patients from primary care facilities. Another might be in helping to refine the appropriate selection of advanced tests by experts. Establishing such intended uses will require carefully designed studies with well defined hypotheses and well defined success criteria. We seek the panel's opinion concerning the feasibility of such studies. Thank you.
DR. TAYLOR: Thank you. In view of time constraints, and the FDA has used up their time commitment, we will break a little early for lunch. We will give the latitude of a whole extra 15 minutes for the lunch break. So we will reconvene at 1:00, at which point in time the panel should be prepared to discuss these issues, and at that point, may refer questions not only to the FDA, but also to the sponsor. So at this point could we break? I would remind the panel that these issues are not to be discussed over the lunch break. Thank you.
(Whereupon, the foregoing matter went off the record at 12:03 p.m. and went back on the record at 1:04 p.m.).
DR. TAYLOR: Okay. At this point we're going to recommence the panel discussion, and I'd like to call the meeting back to order, and would remind public observers of the meeting that while this portion of the meeting is open to the public, public attendees may not participate unless specifically requested to do so by the chair.
We did finish the morning session without an opportunity to specifically question the FDA and their particular presenters as members of the panel, so I would now invite the panel if they have specific questions relating to the presentations in the FDA segment to address those questions now. This is Dr. Duffell.
DR. DUFFELL: Bill Duffell, industry rep. Question is for FDA. I noticed in the panel material that the product is already in use for CLIA-approved labs. And I just thought maybe someone from the agency could kind of explain the significance of our decision today concerning this product for PMA approval versus its current availability in labs.
DR. TAYLOR: Is that something you can address, Dr. Gutman?
DR. GUTMAN: Yes, that would be appropriate for me to comment on. There actually are two mechanisms for laboratory tests to enter the U.S. marketplace. One mechanism is for a sponsor to make a commercial kit or system, and then to sell it at multiple sites. And in order for that marketing practice to occur, there are requirements for FDA premarket review, and there are requirements for companies to follow quality system regs and reporting requirements.
There is, however, an alternative mechanism to enter the marketplace. Individual labs do have the opportunity to create what are called in-house or home-brewed tests, or laboratory testing services. There is actually regulation for those. That is under CLIA. It's a very different regulatory construct than FDA. That would be an operation at a single site so that you couldn't export the test to multiple sites, although you would be allowed to obtain samples from multiple sites. So samples can flow to the lab. The CLIA certification can be direct through CLIA or through one of the dean status inspection groups like the College of American Pathologists that operates on CLIA's behalf. And there are differences in that CLIA is looking at analytical performance and the quality of the lab system, and as you would gather from FDA's review, we're looking at analytical performance and clinical performance. CLIA tends to have a more systemic approach, and FDA takes a more device-specific approach. So there are differences. The company, as I understand it, and they can comment if I've got this wrong, is a CLIA certified lab and so it is permitted to market at this current time their high complexity CLIA lab. The FDA approval would allow them to export that product to other labs for use at other sites.
DR. TAYLOR: Is that helpful, Dr. Duffell? Any comment from the sponsor? Okay. Any other questions regarding the presentation this morning? Panel members?
DR. GOLLIN: I have a question. Do you have a question?
DR. LICHTOR: Oh yes, actually I do have a question.
DR. TAYLOR: Addressed to?
DR. LICHTOR: To the sponsors.
DR. TAYLOR: Okay. I'll deal with the FDA questions first, and then we'll come back to the sponsor questions. Any other questions for the FDA? I would like to remind the panel that we're now going to have a discussion concerning the data, the presentation that we've heard from both the sponsor, and from the FDA, and the information that we received in our packets relating to this meeting.
Any member of the panel can address questions then from this point for about the next hour or as long as it takes to the sponsor, or to the FDA. When we've had this general discussion, there are some particular FDA-related questions that have been addressed to the sponsor, and we will then address those as a panel one by one. There will then be a second opportunity for the public to ask questions, a second open public hearing, and then we will finish with summations from the FDA and from the sponsor. The panel will then conclude their deliberations by voting on the recommendation that the panel wishes to make to the FDA concerning this PMA. So members of the panel, do you have any comments or questions for either the sponsor now or for the FDA? I know you do, Dr. Lichtor.
DR. LICHTOR: Okay, I'm Terry Lichtor and I'm a neurosurgeon, but I was curious if the sponsors have any data regarding what stage of Alzheimer's disease the NTP becomes positive? Is it something that is early on, or something -- a late sort of conversion, and in particular, has any work been done in animal models such as like transgenic mice. And the main reason why I ask that is because, having done some work with them, they get human basic Alzheimer's disease at a very precise time, so you could sort of ask the question when in that time course does it become positive.
DR. AVERBACK: This is Paul Averback. Those are both very good questions, and I could answer them in 97 hours, or I'll try to do it in under a minute. Your second question is particularly fascinating because there is a genetic transfer model for NTP in the mouse. You won't find it in the peer review, but it is in the patent literature, and it's a fascinating findings because these mice get amyloid and phosphorylated tau accumulations, and they lose a lot of cells, and they act funny. And we're extremely excited about this, because it's a wonderful model for looking for therapeutics.
As far as evolution over time, multiple time-point samples is problematic. There does seem to be a trend over time, but it's not -- we don't have enough data to be definitive on that. But it does remind me that I should answer something that was stated this morning about these variabilities of multiple time-point samples. And the agency put up a slide of our multiple time-point samples and pointed out some coefficients, the variation, etcetera. What they didn't point out, and I think is extremely exciting, is that out of these 14 cases, and they had tests at one month, three months, and up to two years, not a single case went from positive to negative, or from negative to positive. So on a science test, this is phenomenal. And if you look at the tests we all use, like the article in the New England Journal of Medicine last year about PSA. I know it's not the greatest test, but PSA, 50 percent of them flip-flop after three months. So to talk about standard deviations is one thing, but biological, none of these cases have changed over. There was one case in the table that actually went from normal to elevated, and in fact this patient had gone from clinically normal to become AD.
DR. TAYLOR: Does that answer your question, Dr. Lichtor? Dr. Nath?
DR. NATH: I have a couple of questions.
DR. TAYLOR: Okay, state your name again, please, for the record?
DR. NATH: Avi Nath. And one is that I noticed that the concentrations of NTP in the urine were in microgram quantities, and while the data for one of your publications on cerebral spinal fluid is in nanograms per mL. And so I was wondering why does it get so concentrated in the urine if this is a brain-derived protein, or is it coming from other organ systems? And is there a correlation between CSF blood and the urine levels in single individuals?
DR. AVERBACK: Also a very good question. The nanograms per mL was based on a sandwich ELISA that used monoclonal antibodies. The current assay is a different format, and it is much more sensitive. And this is not unusual. As better tools are developed, we find that there's more signal.
DR. NATH: So would you revise the CSF then to say if you were to use a different assay system, would it be micrograms instead of nanograms?
DR. AVERBACK: No. The CSF data was done with previous generation tests.
DR. NATH: No, the question was then why is the concentration so low in the CSF compared to urine. And does the high concentration in the urine reflect that maybe the protein is coming from non-CNS organs. Maybe it's coming from some other place.
DR. AVERBACK: No, excuse me. The standard that was used to quantify it in the older work was based on recombinant protein estimates of quantity. Whereas we have a different assay now that uses a synthetic standard and has -- it's a totally different system, so it detects more. I'm not saying that the signal is necessarily different. We don't use this format on CSF.
DR. NATH: If it's a brain-derived protein, and it goes into the CSF, you say it gets diluted there. If it goes into the plasma from the brain, from CSF to plasma, then it would get further diluted, right?
DR. AVERBACK: Not necessarily. A lot of analytes concentrate in urine. There are many analytes --
DR. NATH: That's the question. So you think it's getting concentrated, and it is brain-derived, it's not coming from some other organ.
DR. AVERBACK: There's no evidence that it's coming from any other organ. There was a lot of work done on that.
DR. TAYLOR: Dr. Averback, if you could remain, I have a couple of questions that are a little bit on the same theme. This is Dr. Taylor. The biological variability data that the FDA showed, the 14 patients. There was a little bit of variation, and it was two days to one month was that interval. Do you have any data showing repeating the assay on the same patient within a few days as to whether there's any short-term variation?
DR. AVERBACK: Well, we don't have within-day because we used first warning. And we've done some preliminary work with consecutive days. But it wasn't done as a specific study. These are real examples that we have.
DR. TAYLOR: Okay, so let me extend the question from there to analytical variability. From reading the study, all of the assays were referred to the central lab, which I assume is the same lab that's done the 1,500 specimens under CLIA specification. Is that correct?
DR. AVERBACK: Yes.
DR. TAYLOR: And so all the data is in one lab. I'm not sure, are there data as to reliability, or reproducibility of performance of this assay in other labs, or in other hands?
DR. AVERBACK: Yes, there is. In our briefing document there will be a summary there of the precision study that was done at four sites.
DR. TAYLOR: Okay. The FDA gave a table saying that there was an -- in trial run variation of two to three micrograms in other hands for NTP measurements using different assays, I guess. And within your hands, in your CLIA lab, the variation, or the SD was one microgram. Does that reflect the data?
DR. AVERBACK: I'd have to ask my -- is that correct?
DR. LEVY: Yes, it was correct. My name is Dr. Levy. It was correct.
DR. TAYLOR: Okay. So my question then extends to the labeling that you have for your test. It's an ELISA test based upon an alkaline phosphatase color intensity, optical density measurement, right?
DR. AVERBACK: Yes.
DR. TAYLOR: And so you're running controls, and you're running standards. From looking at your labeling, there's a little bit latitude allowed in terms of optical density measurement for the standard or the control to qualify as a valid control or standard. And that's fine, because that's the way optical density works. But that's going to affect your standard curve. And what I really want to know is if you run the same specimen these, four, five times, then, one, what sort of variation do you see in the standards when you run it five times, and then what sort of variation do you see in the actual result if you run it four or five times.
DR. AVERBACK: Yes, we have done these studies. It's extremely low variation. I can't quote the exact number, but I can run upstairs and get it for you.
DR. TAYLOR: Is that where the one microgram SD number comes from?
DR. LEVY: The one microgram -- comes from running twenty times the same specimen for five days in terms of --
DR. TAYLOR: Okay.
DR. LEVY: So you have many data points.
DR. TAYLOR: Okay, good, that's the sort of answer I was looking for. Thank you.
DR. LEVY: Yes. And that was giving you the total -- variation.
DR. TAYLOR: Fine. So the 22 microgram cutoff level, the concern that that still raises that I think you should have an opportunity to address, when you look at your scatter gram plot of where the values fall in relationship to the 22 number, if it's a one microgram variation and it goes from 22 to 23, or from 22 to 21, that actually makes a fairly major change in the numbers that are called positive or negative with regard to the threshold. And I'm a little concerned, and I'd like to give you chance to address how that's dealt with, or what data you have concerning that. For example, the changes, and I apologize that I've done this off the top of my head so it's not -- I don't claim any statistical validity to this, and I hope no one gets up and has a go at me for it. But if you change this, if you, for example, if it just goes up one microgram, and the cutoff then is 23 instead of 22, you in fact change the number of probable cases from positive -- from 51 down to 44. That's in the probable AD category. And likewise, if you take the converse of that and drop the number down to 21 as being the positive, then your definite non-ADs change from 40 to about 30. And that of course changes the specificity, the sensitivity, the predictive value, and everything else. So I'd just like to have you respond to that, and how you think that that can be dealt with.
DR. AVERBACK: I think that's a very legitimate and good question. We have several answers to that. ROC curves compare sensitivity and specificity, and the area under the curve no matter how you look at this is extremely high. I won't run through too many slides of that, but in our submission we took cutoffs everywhere from 20, 22, 25, 30. No matter where you put the cutoff, you're still going to get the same neighborhood of results. And in the CLIA lab, our cutoff has been in the same zone for several years. And I can sense that there might be some insinuation that if you just kind of move these dots a little bit this way or that way, the relationship would go away, but no matter where you draw the line there, you still see the same relationship, within orders of magnitude that is to say. So we have at least one slide. We'll call it up shortly.
DR. TAYLOR: Sure, please.
DR. AVERBACK: I apologize for the delay. Here's the -- this is the scatter plot I was referring to. So here's the cutoff at 25. And it really doesn't seriously impact the relationships we've been describing.
DR. TAYLOR: Well, it actually does move a lot of probable AD cases though below the line, that were above the line before.
DR. AVERBACK: To some extent. I mean, there's no denying that there are cases that cluster, like in any scatter plot, but again, I just say the ROC curve is so right angled that it's -- you're getting a gain in the specificity. Because there's only -- before there were four points above the cutoff on the definite non-AD, now there's three. That's a gain of 25 percent. So you'd have to do a lot of dots on there on the left to make 25 percent.
DR. TAYLOR: Now, I understand you would, but on the left, that's the probable AD column. You've gone from six to 20 below the line.
DR. AVERBACK: Six to 20?
DR. TAYLOR: Well, from numbers of six that were below the line when it was 22, to maybe 20 dots below the line now it's 25. On the left-hand column.
DR. AVERBACK: Well, I mean those are values near the cutoff.
DR. TAYLOR: Yes.
DR. AVERBACK: This is not a stand-alone test. We do not say to make a decision based on that cutoff. We're showing that it has useful information because it closely agrees with diagnosis. But it's not a cutoff that you use to say ?I'm going to start a drug' or ?I'm going to do some surgery' or anything like that.
DR. TAYLOR: Point taken, that's why I asked the question as to what -- how accurate that one microgram variation was. And I received a satisfactory answer to that, so thank you. Anyone else have questions? Yes, Dr. Blumenstein.
DR. BLUMENSTEIN: What was the planned size of the study that has been presented to us today?
DR. AVERBACK: The plan was to have 350 patients who would be initially eligible, and we expected to get to 200. And the study was closed when we got to 200 that fulfilled inclusion criteria. And -- go ahead.
DR. BLUMENSTEIN: I have a protocol here, but I don't find anything about that.
DR. AVERBACK: It's in the pre-IDE document.
DR. BLUMENSTEIN: Is there also a statistical analysis plan that was created at that time?
DR. AVERBACK: Yes.
DR. BLUMENSTEIN: Do you have other ongoing clinical studies at this moment?
DR. AVERBACK: For this marker?
DR. BLUMENSTEIN: Yes.
DR. AVERBACK: Other ongoing studies. Not of this variety.
DR. BLUMENSTEIN: Which variety?
DR. AVERBACK: Not of a prospective, multi-center, multi-year category.
DR. BLUMENSTEIN: Do you have an autopsy study ongoing?
DR. AVERBACK: We try to follow up every case that we can that -- every case that comes to the reference lab is followed up. We do follow-up periodically with every single case. So we have these 1,500 approximate cases. Anytime we can do follow-up we do. We contact the doctors as often as we can.
DR. BLUMENSTEIN: Do you have any results from that?
DR. AVERBACK: Yes. We published one study of that sort a few years ago. And it had excellent agreement.
DR. BLUMENSTEIN: With autopsy diagnosis of Alzheimer's?
DR. AVERBACK: No, with follow-up. Follow-up. Autopsy for these cases, it's 10 to 20 years. Because these are newly presenting cases in the realistic scenario. These are not institutionalized. It's real patients. Most of these cases were ambulatory.
DR. BLUMENSTEIN: Have you pursued any patients who might be closer to death in testing and so on?
DR. AVERBACK: Yes, our earlier studies were on different types of populations, institutionalized ones. We have lots of data from late-stage Alzheimer's.
DR. BLUMENSTEIN: Thank you.
DR. AVERBACK: If I may, on the same topic, because it brings up a question that we didn't get a chance to respond to this morning, and I wouldn't want to let this issue just pass. There was some criticism, or shall we say maybe query about why we only had 200, why were there 166 people that didn't qualify. And this is a very legitimate question, and a very good question. Why is that, because that does superficially seem to be quite a large dropout, doesn't it? But I know that some of the panel members who work with demented patients will know that in clinical trials with demented patients, it's not easy to get a first morning urine sample. And in fact it's a lot harder than it might look. These are people who are not perfectly in control of their lives, and they're forgetful, and for many reasons as is well known to people in the field, it's hard to get good compliance from them. So 166 dropouts, while it may seem a little funny, is absolutely typical for a study like this.
DR. BLUMENSTEIN: I'd like to ask a follow-up to that. That implies that maybe the dropout is greater in some categories than others?
DR. AVERBACK: Do you have a slide for that Jack, on the dropout cases? It will take awhile to pull it up. There didn't seem to be any differences in those populations with the available data. We did analyze them.
DR. NATH: While he's pulling up the slide, along the same lines, I mean your definite non-AD patients or almost half of them were normals. So did you have problems in normals also getting the morning urine sample?
DR. AVERBACK: No, the dropouts weren't -- they were the more demented.
DR. NATH: They were only in the ?.
DR. AVERBACK: I wouldn't say only. I'd have to look. But for the most part. And it was more predominant in females, as you would expect also.
DR. TAYLOR: Okay, the chart's up now.
DR. AVERBACK: No, that's not the one.
DR. TAYLOR: No?
DR. AVERBACK: Okay, no. Sorry, we don't have the slide.
DR. TAYLOR: Okay. Dr. Gollin has a question.
DR. GOLLIN: Yes. It's on the same subject. So you say in the clinical trial that they should not void for six hours before the early morning urine for the test. But this doesn't seem to be stated in the labeling. And so I was wondering about that.
And second, in your clinical trial, as you said, about a third of the specimens were excluded because of an unacceptable urine specimen. And clearly, as people get older, they have to go to the bathroom during the night. And so how feasible will this test be if patients have to submit three or four specimens before they can be tested, or if some patients just cannot comply?
DR. AVERBACK: Those are both good points. It's a fact of life. You're absolutely right. I can't argue with you on that. I would add one comment to that, and this was data that was filed with FDA in I think 2001. There is a difference between people in the trial, and people in the community who actually want the test. In our own reference lab, with people who want to have the results and do the test, our own statistics were that we eventually could get a sample in over 90 percent of people when they were motivated. In the trial, it was not the same motivation.
DR. GOLLIN: Could I ask another question, please?
DR. TAYLOR: Go ahead.
DR. GOLLIN: Okay, I have another question. You say that there's no downside to the test. Does not positive result on the test raise concern or perhaps label somebody as probable AD? And might a false positive be deleterious to a non-AD individual? I know it's an adjunctive test, but still, might this label someone? In the real world. Not in a control trial.
DR. AVERBACK: Very good question, and also, we prefer to say conjunctive test. I know "adjunctive" has been floating around, but it's -- if you go, I mean I worked in the emergency room for 30 years. I've seen people come in who swallow bottles of pills, and swallow bottles of antifreeze, and they do funny things. The label says don't do that, and they do it. If a doctor is irresponsible, and doesn't follow the labeling, there's only so much we can do. But we are very motivated to make sure that whatever labeling is as tight as possible. And the language that we always like best includes very standard statements that it's always the doctor's decision based on all the clinical information. So you've pointed out a phenomenon that's unavoidable for any product if it's misused.
DR. TAYLOR: Are there further questions? Dr. Parisi, Dr. Nath.
DR. PARISI: I was curious if you could tell us a little bit more about this protein. You know, it sounds like it's related to neurons obviously. It's deposited in the neuronal cell body, and in the threads. And it seems to be, certainly the early literature suggests that it's higher in CSF -- in presumably Alzheimer's patients. But could you tell us a little bit about the processing of the protein? Is that known, or the sequencing? Is the sequence of what's in the urine identical to what you've isolated later from the brain? Have those kind of studies been done?
DR. AVERBACK: Yes. More good questions. The urine protein has been identified immunologically to be identical, at least in immunological terms, with the brain studies using the same monoclonal antibodies on urine blots and urine gels and this sort of things. In terms of the sequence, the putative sequence is published in the Journal of Clinical Investigation. It's in the briefing document. And a lot of work has been done on different epitopes and peritopes, and with analysis using subsequences. And a lot is known. But not, to answer I think your second or your third question there, the exact intracellular pathway is not known. But a lot of information is known. It seems to be very intimately tied into the insulin-signaling pathway, and you can take the NTP gene and put it into neuronal cells and culture. And a huge amount of work has been done on that. And the apoptosis related markers all shoot up. The neurons sprout, and I mentioned the genetic transfer experiments. And it seems to be related to apoptotic pathways with insulin receptors, and there is a relation to oxidative damage as well.
DR. TAYLOR: Dr. Nath?
DR. NATH: Yes, so I have two questions. One is a biochemical question, and then a clinical one. So if you could just clarify for me the biochemical nature of this protein with regards to two things. One is, so you said you immunologically characterized it, but have you ever sequenced that 41 Kilodalton protein just off of a gel through internal sequencing, or peptic digest to see if that truly is the same protein that you have the CDNA sequence on?
DR. AVERBACK: Have we sequenced the actual protein, isolated and purified from urine?
DR. NATH: Yes.
DR. AVERBACK: No.
DR. NATH: The 41 Kilodalton protein. From anywhere I guess.
DR. AVERBACK: No. The work has been done on the messenger RNA, and then the CDNA, and then the recombinant protein. And then there have been probes made to each of these. And then different protocols have been done to make fractions that are done on gels like you say, and labeled in different ways.
DR. NATH: Okay. Then a related question was, and if you could clarify for me, I was just trying to figure out, I don't follow this literature that closely, but from your paper in JNEN which was 1996, there were several different forms of this protein that you characterized. I think four different forms. And then here it says that the 21 Kilodalton NTP species is of particular interest because it is over-expressed and accumulated in brains with Alzheimer's disease. But then subsequent publications all relate to the 41 Kilodalton protein. So could you explain those differences to me?
DR. AVERBACK: Yes. First of all, as much as I'd like to claim authorship of that paper, I can't. So when you say it's our paper, these are researchers at Harvard. And they're extremely careful. They have usually about a five-year lag from when they do the work to when they submit it for publication, which sometimes frustrates us, but it also reassures us. At the time, they had information based on gels where there were different bands. And it appears that the 21 kD is just part of the larger molecule. The larger molecule is based on the actual MRNA derived from the actual brain, whereas the earlier work, the '96 work that you're quoting was using antibodies as reagents.
DR. NATH: So is that a cleavage product of the 41 Kilodalton?
DR. AVERBACK: Believed to be, yes.
DR. NATH: I see.
DR. AVERBACK: At one time it was thought to be a dimer, but cleavage is a better way to say it.
DR. NATH: And then the clinical question I had was that you had MRI scans on some of these patients, maybe not all, and certainly some of those cases that were presented earlier it did appear like there was some kind of correlation between the urine testing and the MRI findings for atrophy. I was wondering if you have analyzed this clinical trial for whatever scans you have available to see if there is any correlation between MRI findings and the levels of NTP?
DR. AVERBACK: Well, the MRI findings in this trial have been a long and winding road. We've looked at them quite extensively. The problem is that they were read independently by different radiologists at different sites, and that's -- I mean, studies often allow for that, and sometimes they don't. It depends how rigorous the study is for the MRI. We didn't seem to see the best of agreements, and I'm not criticizing MRI. Far be it from me to do that. As you know, structural imaging in AD diagnosis supposedly, according to convention, is to identify other lesions, such as subdural, or tumor. And the relations, I'm sure you know as well as anybody. If you look through the literature, the correlations between them in all different types of parameters, be it neuropathology, or be it clinical features, there is a wide swing in it. But to answer your question, yes we've looked at them. It's not really that good. We did find other subsets of that that are of interest, but I don't want to go too far into that now.
DR. PARISI: There actually are --
DR. TAYLOR: This is Dr. Parisi.
DR. PARISI: Parisi. There are several studies that show relatively selective loss of hippocampal volume. And that's been shown in some studies actually to correlate quite well. I was curious, did you find infarcts? I mean, this is an elderly population. I was surprised at one of the comments in your report that said there was nothing added by the MR scans. And at least our experience is that these are oftentimes mixed pathologies, and small infarcts are very, very common, or severe white matter disease, for example, is a very prevalent finding in these elderly patients. Just curious about your experience.
DR. AVERBACK: Yes, I agree with you. In fact, I used to work with Leslie Iverson who was in the OPTIMA Group, who were the first to actually do the serial hippocampal measurements at Oxford with David Smith. We have in our possible AD group, and it's not a large enough group to be dogmatic about it, so we haven't made it part of our formal presentation, but in the possible AD group, if you dichotomize them according to NTP elevated or NTP normal, on a 2 x 2, versus vascular lesions reported by the radiologist or no vascular lesions reported, there's quite a good relationship. The NTP negative possible ADs have a lot more vascular changes. That's one thing that we found. It's retrospective, it wasn't prospective, and I wouldn't proffer it as reality, but it's extremely intriguing.
DR. TAYLOR: Okay. Dr. Blumenstein, Dr. Lopez, any other questions at this point?
DR. LOPEZ: Yes, a question for Dr. Kondratovich.
DR. TAYLOR: Dr. Lopez.
DR. LOPEZ: If you can go back to explain why this study design does not support the claim that this is an adjunctive test.
DR. KONDRATOVICH: Because we need to show that there are additional information which are related to the test which already performed, NIH criteria. You apply another test, and there are additional information in this test. So you need to have some kind of like high order standard in order to understand do you really add more information, because if there are disagreement, then there is not any resolving of this disagreement, because we have only NIH criteria and NTP test. And also some examples I give. So like adjunctive test. If you already know NIH criteria of categorization, and then you add NTP test, you cannot evaluate performance of that combination because by definition this combination should increase, for example, sensitivity, and there are no big loss in specificity -- or that improvement in specificity, there are no big loss in sensitivity. But in order to do this, we need to have high order references, high order standard. And this data is not available.
DR. TAYLOR: Yes, please. Would you like to respond? Absolutely. Please, again give your name.
DR. BLOCH: I'm Dan Bloch, and I'd like to respond. Keep my thoughts straight here. First, it's a misconception that this is an adjunctive test in the sense that there's a step-up procedure here where there's a hierarchy of things that are done, first, the classical workup, and then only after that is NTP applied. That's not the case. It's an aid at the same time with everything else that's being done. Remember, the claim here is in the primary care setting. The specialist hasn't come into play yet. This is an aid among other things that are done concurrently. That's one thing.
A second thing is if it were, and it's not, but there are examples, many of them medicine, where the true positives are a two-stage process, where you first apply something, and then you see if in addition an independent thing is also positive. Then if the two are positive, this is super super diagnostic plus. And you have four possibilities, both of the methods are plus, one's plus, one's minus, the other way one minus, one plus, and both are negative, with both negative being super-negative. Those would be the proper statistics in the scenario that you just heard from the FDA statistician. We did not -- this is clearly not that kind of a statistical workup. Our statistics were totally inappropriate for the kind of uses that she's described. So again, I just want to make it really clear that in our intended use, this is an aid, and it's not intended to be used only after another diagnosis is made.
DR. TAYLOR: Okay, thank you. Dr. Becker, did you wish to comment?
DR. BECKER: My name's Bob Becker. The term "adjunctive" does appear in parts of the labeling, although we can go back and forth as to exactly whether that's the only way in which the test would be used. The central feature is that new information is to be contributed. And there needs to be a manner by which the presence of that new information can be ascertained. The only way that we are aware of by which that can be accomplished is by having an external standard of some kind that this information brings you closer to than does the workup without the information that the test putatively provides.
DR. TAYLOR: Thank you. You may respond again.
DR. BLOCH: You know, what additional information is --
DR. TAYLOR: Name again for the record.
DR. BLOCH: I'm sorry, my name is Dan Bloch again. I'm the consulting statistician for Nymox from Stanford.
Again, the question is, as I understand it from the FDA, and the way they're posing it is saying you must have an external standard, or some other kind of standard to judge this. Because what is the added gain that you get from this. And I would remind you that, and you know this, is that without NTP, in the primary care setting, the primary care physician will be correct 15 to 25 percent of the time. That's their sensitivity. With NTP positive, their sensitivity will be increased to 60 percent. Now, those are just numbers, those are statistics, but they have meaning. And this is actually part of the data. To have a different gold standard, a higher standard based on autopsy is clearly not -- I mean, I don't know how to answer that. The higher standard, if it's going to be based on pathology means you have to kill early onset patients. Or you follow them for 30 years. That's clearly not -- if you asked for that you will never have anybody -- you will never get the diagnostic set for early Alzheimer's, if it has to be based on pathology. So I don't understand that posture.
DR. TAYLOR: Did you have a question, Dr. Parisi?
DR. PARISI: This is Joe Parisi. I beg to differ with that. Many patients die in MCI and early AD. I mean, we have a large base of pathologic specimens from those patients.
DR. BLOCH: In our sample of 200 patients, two have died, and have been autopsied. Both were NTP positive, and both had definite AD. So, from the autopsy experience that we actually do have from those that were followed, we have 100 percent accuracy with the NTP.
DR. LOPEZ: I have one last question.
DR. TAYLOR: Dr. Lopez.
DR. LOPEZ: If you compare the possibles with the non-AD, what's the sensitivity and the specificity?
DR. BLOCH: I did not. This is background. The consulting statistician to Nymox was John Kennedy, a Ph.D. statistician, well known, and was in practice for himself. He died of a heart attack earlier this year. Three or four months ago Nymox asked if I could consult for them. I have not performed any of the analysis, with one exception, and that was at the FDA's request. We performed a bootstrap analysis to validate that the cutoff of 22 did not introduce bias. And the bias corrected an accelerated estimate, and I think statisticians here would understand this, showed that the bias of using the cutoff from the sample that we derived it from was about 1 percent for all performance characteristics. In other words, there was no bias that was introduced. I'm saying this because, again, in the earlier presentation, it sounded as if we just simply derived the value from the data that was obtained, and then we optimized things. But the FDA actually asked us directly to do a bootstrap analysis, to do a bias correction, which we did do, and it showed no bias. That's the whole story.
DR. TAYLOR: Any other questions from the panel? Mrs. Butcher has a question. Okay, please respond and then we'll come to Mrs. Butcher. Please. Again give your name.
DR. KONDRATOVICH: I would like to make comment about bootstrap analysis. We ask sponsor maybe sponsor can apply bootstrap analysis in order to correct biases which usually can be seen if you use the same training set for, like testing set. A bootstrap analysis conducted by the sponsor was absolutely irrelevant. It was more for evaluation of confidence interval for proportion, which can be obtained by standard software like StatExact. So bootstrap does not have any relationship with really some kind of statistical procedures like with one method or bootstrap 0.632 plus. But I don't like to put some technical details. And also, all this bootstrap approaches, it's not completely recognized in statistical community. Usually, independent data set required to validate cutoff. So sponsor bootstrap was absolutely different entity about the confidence interval for the usual binomial proportion, how to calculate confidence intervals for positive and negative predictive value.
DR. TAYLOR: Thank you. Mrs. Butcher had a question.
MS. BUTCHER: Thank you. I guess my concern, or what comes to my mind as a consumer is since the FDA has been integrally involved with this study from the beginning, and now we're at the end, and we're asking do we -- does this study add additional information, could that not have been presented in the beginning of the study, and had the extra standard applied in the beginning of the study, or the design made around having the higher standard, or a different standard. And as a consumer, I come to the table and say does it help. Does it serve a purpose. And it appears from listening that a primary care physician would have a patient present, do this test, and then it would help in that scenario. And if that's the early diagnosis, or the early look that we're having at the AD, then it appears to help. So my question is if the FDA, or since the FDA has been involved from the beginning, and now we're looking at a requirement for an extra standard, or does this test add additional information, how could that not have been put into the mix in the very, very beginning?
DR. TAYLOR: I assume you're addressing that question to the FDA?
MS. BUTCHER: Yes, I am.
DR. TAYLOR: So I think Dr. Gutman you are on the hot seat.
DR. GUTMAN: It's okay. The company's correct, there were very complex negotiations. They were nuanced. There was some congruence between what they wanted and what we wanted. There have been changes in the intended use that have complicated the use of the gold standard. We do think for this particular claim that we've arrived at at the end it exactly captures, and your challenge is to decide how it captures, or whether it captures, it captures Dr. Becker's concern about whether for the claim on the table you have enough insight based on the study that was done to convince that there is this incremental value. The claim is a little bit different than the claim that we started with. The clinical studies are not actually congruent from the starting point in the original pre-IDE, so there was some drift, and some changes, and some changes in data collection, some changes in intended use that I think have made this a -- let's say a colorful journey.
DR. TAYLOR: Does that answer your question, Mrs. Butcher?
MS. BUTCHER: Yes. I would like to follow up with another question for the sponsor.
DR. TAYLOR: Please follow up.
MS. BUTCHER: And that is dealing with the diversity of the sample, and the people that were involved in the study, very small numbers of other than white people, very small numbers of African-Americans, Native Indians. Has there been any distinction in response for the diverse population?
DR. TAYLOR: This would be for someone from Nymox.
MS. BUTCHER: Yes.
DR. TAYLOR: I think you did show one slide earlier this morning.
MS. BUTCHER: They did show one slide, and that's the one that I was referring to.
DR. AVERBACK: We were constrained by the trial design that required it to be consecutive unselected patients as they presented. So we couldn't do any cherry-picking of patients. So it is a representative sample of the different clinics that were involved. The exact numbers --
MS. BUTCHER: That was the slide.
DR. AVERBACK: I mean, if you look at the -- if you add together all the non-whites, it's not that far off from the overall population.
MS. BUTCHER: My question was was there any difference in the response to the test at all.
DR. AVERBACK: No, we didn't see any difference.
DR. TAYLOR: Dr. Gollin has a question for whom?
DR. GOLLIN: For Dr. Averback. It's a question from both of us, I think. And because he just said he was going to ask the same question, or a similar one.
DR. TAYLOR: "He" is Dr. Lichtor for the record.
DR. GOLLIN: Dr. Lichtor. I'm Dr. Gollin. Would Huntington disease patients, or other neurodegenerative disorder patients be defined as non-AD? I mean, ones without AD. Okay, so say Huntington's patients, okay. We'll make a very specific category for which testing is available. Have you tested Huntington's, or other neurodegeneration patients to be sure that their NTP value is less than 22 micrograms per mL?
DR. AVERBACK: Yes. This is -- we discussed this this morning as well.
DR. GOLLIN: I didn't feel like I got a good enough answer. I'm sorry.
DR. AVERBACK: No, that's fine, I'm just recognizing it. It's not to say that it's not a bad question. It's a good question, because in the interval I actually looked in the briefing document, and it's actually right in the briefing document. So if you look in that JCI paper, in one of the tables there you will see there was a cohort of Parkinson patients, and multiple sclerosis patients in the sandwich assay that was in use at the time for NTP. So there was quite a lot of effort to make sure it wasn't a non-specific neuro finding.
But again, I have to emphasize that our intended use for this is for the primary care physician in the realistic context to help them evaluate Alzheimer's. Someone who has a clear-cut stroke, and who's demented, just is not that much of a diagnostic dilemma. And to be a definite vascular dementia. And a Huntington's, depending on who the clinician is, may not be somebody who would fall into the capture of the intended use population.
DR. TAYLOR: Does that satisfy the dual questioners?
DR. GOLLIN: Yes.
DR. TAYLOR: Okay.
DR. AVERBACK: In fact, here's a quote from Dr. Parisi, which basically I think says it better than I said it. "Patients with stroke who become demented may be less likely to be evaluated in dementia clinics, and less likely to be given the diagnostic label of dementia. This may explain why vascular dementia is less common in referral-based series compared with population-based series."
DR. TAYLOR: Would you like to argue with that, Dr. Parisi?
DR. TAYLOR: Good. If there are no further general questions, at this point in the procedures we really need to move to the specific questions that the FDA has raised, and provide an opportunity for the panel to address each of those issues or questions. I understand that there are four questions, Dr. Becker, is that correct? And we have an hour allowed for this. So I would propose that we try to move through these about 15 minutes per question, although I recognize that the questions do have overlap and are somewhat intertwined, and it might not be as simple as that sounds. So I think we could perhaps proceed by putting up the questions, and I would point out that copies of the questions are in the folders and were available on the sign-up table outside of the room. So is this the first one, Dr. Becker? Okay, then I'd like to really ask the panel to begin discussion, and is there anyone on the panel that feels they'd like to start? Dr. Blumenstein.
DR. BLUMENSTEIN: I've been sitting here struggling with this. And I'm a fan of the show Twilight Zone, those of you who are old enough to remember that show, or if you have access to the SciFi channel now. I feel like when I hear the sponsor talk I've entered into the Twilight Zone, whereas when I hear the FDA talk things are all right. With the sponsor, I don't see that the study design matches the indication, and there has been changes in the study design, and it's quite confusing, and so forth. I also find that the analysis does not match the study design. And I want to now kind of explain. The FDA has already presented a lot of the reasons why I feel that way, and I don't want to be redundant and go back into that. But it comes down to that I don't see a hypothesis. I don't see any way to assess a specific diagnostic setting. I see this as being throw a net out there, get some data, and then see what the data say about the performance of this device.
The question was brought up by Dr. Butcher over there about whether this actually adds value and so forth. Well, on the surface it appears that way, but I don't believe the analyses. I don't believe that the numbers that are represented are actually what they say they are. And that has to do with the fact that these -- the kind of thing that the FDA has already said, that it's inappropriate to talk about sensitivity and specificity, predictive value of a positive test, predictive value of a negative test, and those sorts of quantities that are traditionally used to evaluate diagnostic tests. It's inappropriate to use those numbers in this setting. And now that can be a little bit controversial, because it has to do with whether you feel that there's a gold standard. And sometimes gold standards have some imperfections in them. But in this case the gold standard is false gold.
There's a great deal of squishiness in the categorization, the partitioning of the patients into the diagnostic categories. The ROC curves that are presented, where one category, the most extreme categories are presented to me are just totally nonsense. And yet you see a very high area under the curve, and that's represented as being something meaningful. It doesn't mean anything. It's based on, first of all, sensitivity and specificity, which don't mean anything, and so on.
What it really comes down to is that whether you talk about this as adjunctive, or conjunctive, or whatever language you want to use, this test tells you something. And if it told you something about whether there was a pathologic condition in the brain, then it seems like to me it would be of extreme value. But we don't know what it tells us. We know it tells us that there is some kind of relationship to these four diagnostic categories. There seems to be kind of a trend there. And admittedly that trend exists. But when I try to figure out, well, how is that going to be used in a diagnosis, if I believe that -- if I take the word "sensitivity" and I take the number that's represented as being an estimate of sensitivity, I think wow, that seems pretty good. But then I have to stop and think, well what does that number actually mean? First of all, it's not a sensitivity. And so as a result of all that, I've come to the point where I can't feel that this test actually measures something that is clinically meaningful.
DR. TAYLOR: Other panel members? No further comment?
DR. LOPEZ: I always have the question about who are the --
DR. TAYLOR: This is Dr. Lopez.
DR. LOPEZ: Who are these possible AD cases, possible Alzheimer's disease cases. Because you follow the criteria, these are supposed to be people with Alzheimer's disease plus some other condition. And the NTP was able to identify half of those cases. We follow the definition proposed by the NINCDS criteria that these are cases with multiple comorbidities, this is the population that is usually seen by PCPs. This is the person that goes to a PCP every two months, or every month, to be assessed, and that's the person who will complain of having a memory problem. It could be for Alzheimer's disease, or it could be for something else. So, in terms of getting a benefit, if half of that population is going to be negative, but may have Alzheimer's disease. So I don't know how the consumer is going to benefit at that point. So I think that this group, this possible AD group needs a better definition.
DR. TAYLOR: Anyone else on the panel with Discussion Point Number 1?
DR. NATH: I have very little to add, but was just going to state that I share the same concern as Dr. Lopez. And additionally, if the intended use of this test is to separate probable from non-Alzheimer's patients, then the concern really is the non-Alzheimer's group because more than half of them are normals, and the rest of them were diagnoses with just sample sizes of one, two, or three. So it becomes a really, as the FDA pointed out earlier, was an artificial control. And having distinct -- enough number of populations with each of those distinct diagnoses would really be helpful to know if you can really separate one from the other or not.
DR. TAYLOR: Dr. Duffell?
DR. DUFFELL: I guess I'm struggling with some of the comments I've just heard, because sounds like we're trying to say it's black or white so to speak. And if I understood the sponsor correctly, they are just saying it's just another piece of data in the bag, so to speak, the medical bag, to make that diagnosis. It's really not drawing a line to be on one side or the other. So it's not conclusive, it's just a little bit more information. And yet, what I've seen presented seems to suggest to me that there is a higher probability with this test, it's not absolute, but a higher probability that you will come to a right overall conclusion. So I don't know, maybe I misunderstood your last remarks, or some of the others, but you know, it's just going through my mind. I just wanted to share it with the rest of the panel. So you know, just keeping it in mind. Because I do see that there is value to this test. It may not be absolute, it may not draw a line in the sign, but it's just another piece of data in the clinic to make a decision of how we follow and treat this patient moving forward.
DR. TAYLOR: Anyone want to comment on that? Dr. Blumenstein.
DR. BLUMENSTEIN: Well, and I agree that there's a probability of a positive test giving a higher -- I mean, more likely to be probable AD. But it's not a sensitivity. Can't be labeled as such. And it would be misleading to label it as such. And then -- comes the issue then of how it is that one behaves given that all you have is a probability that it's probable AD. And so that's really what it comes down to.
DR. DUFFELL: May I follow up?
DR. TAYLOR: Yes, please. This is Dr. Duffell again.
DR. DUFFELL: Okay. That helps me. I appreciate that clarification, because I think if it's kind of a labeling issue, then that's something where the company needs to work with the FDA to make sure that there is adequate disclosure in the contents of the labeling about what type of an interpretation one can make. So.
DR. TAYLOR: I see no further comments from the panel on this. Oh, I'm sorry, I didn't see your hand behind the microphone. This is Mrs. Butcher.
MS. BUTCHER: Yes. And I guess I'm looking again at the populations. Are there important populations that require more study to establish effectiveness for the urine NTP test within them? And the answer is we don't know. If they're all coming out the same, then we don't know if there are other populations that we need to address.
DR. TAYLOR: Correct, with the numbers available we don't know the answer to that. Correct.
MS. BUTCHER: That's right. So we just don't know. We can't answer that.
DR. TAYLOR: Yes. At this point I'm going to ask Dr. Gutman if he feels that the panel have given any sort of adequacy of an answer?
DR. GUTMAN: Yes, I think it's been very helpful.
DR. TAYLOR: Then could we move to Question Number 2? And I'd like to ask Dr. Becker to display that on the screen such that everyone can readily see it. Okay, this is Question Number 2. Do we have anyone on the panel that wishes to initiate the discussion? Dr. Duffell.
DR. DUFFELL: Yes, I think this question actually speaks to the point I was trying to make earlier. You know, what I'm highlighting in my mind is the word "certainty" and "exclusion." That's black and white again, and I don't think that's what the sponsor put forth.
DR. TAYLOR: Okay, you'd like to find a word other than certainty, would you?
DR. DUFFELL: I'd like the statisticians to remark on probability. I mean, that's probably what we're talking about. But again, I think that's probably something for the company to work out with FDA from a labeling standpoint.
DR. TAYLOR: So that's three probables right there. So maybe that's really where we are. I agree, and as we pointed out at the beginning, these questions are obviously interwoven, and answering, or asking, or discussing one is obviously spilled into the other. And this in fact has partly been discussed with the previous question. Does it raise any other issues for the panel? Dr. Nath?
DR. NATH: Well, I guess the issue it raises for the company was that -- what they had brought up earlier was that they're not really trying to increase the certainty of the diagnosis of Alzheimer's beyond what the clinical criteria are suggesting, at least who have been examined by a specialist. So, the question really is, is this a fair question to ask.
DR. TAYLOR: Yes. That's the fair question. Nothing more from the panel on this question? Dr. Gutman, has that been adequate?
DR. GUTMAN: Well, I think what, you know, perhaps this question wasn't phrased as precisely or crisply as we might have done a better job. Because I think the issue, at least the issue that resonates throughout FDA, and that we're asking for in-point is, is the certainty added by this diagnosis one that will actually demonstrate effectiveness. And our concern that was very much on the table is will this test actually change diagnostic decision-making in a positive way. So is there enough certainty being put on the table that more intelligent decision-making at the expert level or at the primary care level will actually occur? I mean, the bottom line is does it make a difference.
DR. TAYLOR: Okay. So, let's quickly try to rephrase that. Does the test add sufficient clinical value that it would make a difference to patient care at the intended site of use. So if that's now the question, then Dr. Blumenstein has an answer.
DR. BLUMENSTEIN: Not an answer. I don't think that's been studied, and I think the way that that could be studied is that if you did a randomized trial, and used the test in one group, and didn't use the test in the other group.
DR. TAYLOR: Expand on that, could you?
DR. BLUMENSTEIN: You would obviously have to have some measure of outcome that would talk about, let's see, how does it add certainty to the diagnosis, or something along those lines. Some outcome measure. But you would then study what happens within each group of patients, and see if it, quote, "improved things" according to the selected outcome measure, whatever that would be.
DR. TAYLOR: Are you feeling helped, Dr. Gutman?
DR. GUTMAN: Well, I find that beguiling. I'm not sure -- is that the only choice? Are there other choices?
DR. BLUMENSTEIN: You're the one that asked the question.
DR. GUTMAN: Well, that's one answer, so yes, I mean I respect that answer, sure.
DR. TAYLOR: Dr. Lopez next, and then --
DR. LOPEZ: Well, I think there are two type of groups here. One is the expert. I mean, that won't change much. I mean, the expert will be able to make the diagnosis without any problems. The expert does not need the test. The question is whether the PCP working in the middle of Pennsylvania, seeing 90 patients per day can get help of this test. And probably yes. But it has to be demonstrated. It is positive. If the test is positive. Because you always have the biomarker temptation. Say I'm doing a urine test, it was negative. You may not have Alzheimer's disease. So you always, you have this conflict of I hate to come back to the word adjunctive test, and a biomarker. And that's the risk that you have in that physician, working alone, very busy, that really needs help.
DR. GUTMAN: Okay, but I think the question we'd like, and you don't have to answer it this round. You can answer it the next round, or you can answer it when you finally vote, because both Drs. Kondratovich and Dr. Mani put it on the table is, is the signal strong enough that it will allow you to make clinically meaningful, based on the data at hand, will it do what everyone wants it to do, which is help make better decisions at --- well, at the primary care level.
DR. NATH: So, I agree that --
DR. TAYLOR: Dr. Nath, and then Mrs. Butcher.
DR. NATH: Sorry. So I agree that the concerns are the same as we discussed with the previous question, but I don't have a problem with a test being done at a primary care clinic, and then being confirmed by a specialist. And for example, the thing that comes to my mind is an ANA test for lupus. I mean, most often we'll do a screening test, and we know we get a lot of false positives. And then we just send them to the rheumatologist, who will most often tell us your patient does not have lupus. And so what is important about this test is the degree of false positivity and false negativity that we get with this test. And unfortunately we just don't have enough data to make those kind of conclusions at the present time.
DR. TAYLOR: Okay. Mrs. Butcher, you've been wishing to ask a question?
MS. BUTCHER: Well, as a consumer I was going down the same path, and my whole thought about it is when I go to a doctor as a consumer, he's only going to have so many minutes for me. And if he has this tool in his toolkit, and can pop it out, and this doesn't mean that he will leave his clinical analysis and all of his experience behind. He takes that into the room with him as well. But this is just another tool to help him to do better what he needs to do. So I look at it really as just an additional tool in the toolkit of a primary care physician to move people in the path to get what they need. And the help, if it's available, should be given. Because they're on the front line.
DR. TAYLOR: Okay. I think Dr. Gulley has a question.
DR. GULLEY: Yes. So, as a --
DR. TAYLOR: This is Dr. Gulley.
DR. GULLEY: Yes, Dr. Gulley. As a primary care physician, they may see a positive test and then be able to refer them on, and expect that many of those patients would have either probable, possible, or MCI. However, if they had a negative test, that really does not add, in my mind, any significant utility. It does not mean that there's an exclusion of one of those categories.
DR. TAYLOR: So would you like to see that addressed as a labeling issue?
DR. GULLEY: That is one option.
DR. TAYLOR: Dr. Blumenstein?
DR. BLUMENSTEIN: Well, one of the things that a randomized trial -- I hate to sound like an advocate of a randomized trial because that seems like a dirty word these days, but I am. But one of the things that might happen in the study where you would look at how this test was actually used by primary care physicians is that you might find that they become over-reliant on it, and don't do the other things that they're supposed to be doing, and then finding the negative test, and so forth. So that's why it has to be, you know, an outcome that's comprehensive and actually sees what happens to the disposition of all patients randomized.
DR. TAYLOR: That's an interesting point, but on a philosophical issue, you know, you'd rule out half of medicine if you took that attitude because tests are there and people do get reliant upon them. Good or bad, that's just the way it is. Dr. Gutman? Joe?
DR. PARISI: I thought the -- this is Parisi talking again. I thought the cases that Dr. Mani used illustrated very nicely that a positive result or a negative result still left you in the same quandary. So I'm not sure how it really helps. And I think it might be more upsetting to a patient to be labeled as potentially having something where they may not have it, and more upsetting to their family. So I think this is a very, this is a very crucial decision, and I'm not sure we have enough information to really make that -- be able to say that we have -- that we can say with some certainty that it has meaning or doesn't have meaning.
DR. TAYLOR: So if we asked the question differently, and say it was a positive test may mean something, but a negative test doesn't.
DR. PARISI: Perhaps. Have to think about that.
DR. TAYLOR: Dr. Gulley.
DR. GULLEY: Yes, James Gulley. I think that in either case for the primary care physician, a positive test would result in a referral, but a negative test should perhaps in a referral too because -- so you're left in the same quandary, because the number of patients with the negative test that have other diseases going on that would be outside of the perhaps scope or practice of the family care doctor are -- compromise a large portion of that population.
DR. TAYLOR: I think I saw another hand. Yes, Dr. Duffell.
DR. DUFFELL: Yes, I was just going to comment because we've said it twice. I know you've said it once, and you just said it about labeling the patient. I don't think that's what we're talking about doing with this test of labeling, because again, that gets to the objection I had before about certainty and all that. It's just another piece of data. It may ultimately lead to a label, but that would be decided by the specialist in that case, referred on from the primary care.
DR. PARISI: I guess one other comment I would have is that ideally a biomarker should be able to detect something that you can confirm pathologically, if you want to make it a marker for disease. I'm not sure we've done that with this agent.
DR. TAYLOR: Okay. Dr. Gutman, have we done Question 2?
DR. GUTMAN: Yes, I think you've had a lively discussion on the essence of what this question was intended to evoke.
DR. TAYLOR: Dr. Becker, could we move to Question 3? I'll give everyone a moment to read it. Again, this seems to overlap with some of the discussion we've had, but Joe, it really relates to your point in a sense. You have any comment? This is Dr. Parisi.
DR. PARISI: Well, again, I think the data we've seen for the diagnosis of definite non-AD versus one of the other three, there certainly seems to be a trend in those two groups. But I'm not sure you could use it for a diagnosis, for exclusion. I don't think we have that much information.
DR. TAYLOR: Dr. Nath?
DR. NATH: I agree. The concerns are still the same. I mean, if you are going to differentiate probable Alzheimer's from non-Alzheimer's then you've got to characterize the non-Alzheimer's group a lot better than what we have currently in this trial.
DR. TAYLOR: Anyone else?
DR. GULLEY: I would say that --
DR. TAYLOR: This is Dr. Gulley.
DR. GULLEY: Yes. I would say that a positive test only means that definite non-AD is likely excluded, and that doesn't appear to be terribly clinically meaningful in the primary care practice setting. And a negative test only means that it is unlikely that the patient has probable Alzheimer's disease, but may still have mixed disease. And so a negative test appeared to have little clinical utility.
DR. TAYLOR: Again, the discussion does really overlap the previous two questions. So are there any other new thoughts, or new discussion issues from the panel? That seems not to have been wonderfully helpful, Dr. Gutman.
DR. GUTMAN: No, that's fine.
DR. TAYLOR: Could we go to Number 4 then, Dr. Becker? This question again has overlap, but it does raise the specific issue of whether the setting in which this test is used is defined, and I know that's been alluded to by both the sponsor and the FDA, and been touched on by the panel. Is there further discussion about whether the setting can be defined?
DR. NATH: I guess the setting -- one thing we know for sure --
DR. TAYLOR: Dr. Nath is speaking.
DR. NATH: So, one thing we know for sure is that it's not helpful in a specialist clinic, and it may be, the intended use is really in a primary care physician's clinic. And we're assuming that, you know, they're not going to be able to do a Mini-Mental Status evaluation, and a decent neurological evaluation. So the -- and the concerns with being able to use the test adequately in that setting, the primary care setting, overlaps with the previous questions, with the previous comments made on the previous questions I think. So the concerns still remain the same in the primary care setting, but I think it can be clearly said that if it's going to be of any benefit for future use, it would only be in a primary care setting.
DR. TAYLOR: Dr. Duffell?
DR. DUFFELL: Yes, I'd just like to echo your remark. I think the usefulness is in the primary care because the treatment decisions won't be made at this level. That'll be made by the specialist level, is at least how I would interpret a response to this.
DR. TAYLOR: So, yes go ahead. Dr. Lopez, and then Dr. Nath.
DR. LOPEZ: In general, PCPs prescribe medication, and they -- usually when somebody complains of having memory problems they get a prescription for a cholinesterase inhibitor. So they initiate treatment. So my question is if the clinical symptomatology indicates Alzheimer's disease, and the test is negative, are they going to change the prescription. In an expert clinic, probably not. In a PCP, it may. You have somebody who is complaining I have to fill out three prescriptions for high blood pressure, one for diabetes, and one for Alzheimer's disease. So probably at the moment of prescribing medication, the PCP is going to say, well, since this test is negative, I'm going to cut down the cholinesterase inhibitor. So it can go either way. So that's why you have to be very careful with that. But in general, PCPs initiate treatment.
DR. NATH: So I'm just trying to put my -- Avi Nath again. So I was just trying to put myself in the shoes of a primary care physician, and I realized my practice is anything but, is that if you were a primary care physician, and you see a patient who complains of memory dysfunction, and the test comes back negative. So what will you do? If you don't have a conclusive diagnosis for him, you're going to refer him to some specialist, anyhow, whether it be a psychologist or a neurologist, to try and figure that out. If the test is positive, then he's going to refer the patient anyhow. So my guess is that what it's going to determine at the primary care physician, correct me if I'm wrong, is not what the test shows, but rather what the patient complains of. And if what the patient complains of is memory dysfunction, and he can't really figure that out, he's going to send the patient off irrespective of what the test shows.
DR. TAYLOR: Did I see someone else on the panel? Dr. Gulley.
DR. GULLEY: Just one brief follow-up to that. Yes, James Gulley. Unless the primary care physician chooses to treat the patient for Alzheimer's if he had a positive test, as was mentioned previously. He may choose to initiate treatment for Alzheimer's disease, and may not.
DR. NATH: Well, the treatment of Alzheimer's is not just, you know, giving them some Aricept. I mean, there is lots of things that have to be done for treating the patient properly. And which involves social work. I mean, lots of things that go along with it. So the care, I don't think that the primary care physician is going to be able to handle these kinds of patients very well. He's going to need help from some specialist clinic to help him.
DR. TAYLOR: Does the panel feel there's a difference in a sense as to how the primary care physician would handle patients that are probable AD versus the other two categories that have a lot of positives in them? Because that's really what we're talking about, right? With possible AD and MCI, we've got, you know, not quite a 50/50 split of positives and negatives. With probable AD it's pretty good, it's 51 to 6 out of 57. So can the primary care physician make that discrimination as to which patients get referred and which ones don't? And does this test help? That's really the question, right?
DR. NATH: Yes, I mean that is the question, and what I was trying to struggle with is that is he really going to base his decision on the test, or base his decision on the complaint of the patient. Because if the patient complains of a memory dysfunction, and even if the test is negative, he's still faced with trying to tell the patient, either he says, well you know, you're just faking it, go home. The patient is not going to be satisfied. He's going to go end up in a specialist clinic anyway. Right? He's going to get a second opinion. Or he says, okay, I can't figure it out, I'm going to send you to somebody else. And he's going to seek help from a specialist. Maybe I'm wrong, but that's the way it appears to me.
DR. TAYLOR: Yes, Dr. Lichtor has a comment.
DR. LICHTOR: Well, since you bring that up, all the -- although I don't do primary care, but all the primary care physicians I know, when they have patients with memory problems, they refer them all to neurology, and they don't really treat them anyway. So, that makes this point a non issue to me. I mean, I think all the patients are going to get referred to neurology anyway. So, I don't see how it's really going to help them, because they're going to go, as you say, based on their clinical diagnosis, not whether the test is positive or negative. I don't know any primary care physicians who try and manage these patients.
DR. TAYLOR: This is Dr. Lopez.
DR. LOPEZ: If you are in a city where you have a big university hospital where there are many specialists in town, probably they are going to be referred to a specialist. The problem is when you're in those small towns where you don't have access to a specialist, or the specialist is two hours away from the clinic, or three hours from the PCP office, or six months, the appointment's in six months. So, that's why the PCP is important. It's more in the rural areas than in the city areas.
DR. GUTMAN: There's just one additional nuance --
DR. TAYLOR: This is Dr. Gutman.
DR. GUTMAN: -- that might have been missed here because it wasn't as clearly phrased as we might have done, which is that the notion of individual management versus group is sitting here because the groups do seem to segregate in a certain way, and there is some concern about the variation around the cut point. I don't know if anyone would dare to express an opinion on whether that's overwrought, or well taken, or something in between. But that was an inherent at least subtext of this question.
DR. TAYLOR: Well, I think we had a little discussion earlier with the sponsor regarding, first of all, why the cut point was 22, and second, given that it's 22, what the analytical precision of the test is, and whether it can hit 22 every time on the patient. And some variable data was provided by the sponsor, and for all I know you have that data in your voluminous files. I didn't see it.
DR. GUTMAN: We tried to share everything relevant with you.
DR. TAYLOR: So, it seems to me as though the analytical accuracy of the test is perhaps okay, although a one microgram variation does push patience either way, and in some ways it would be good to see a greater number of patients, and see if they continue to segregate. They have done another 1,500 patients, and it would seem to me that in terms of just comparing analytical value, there might be some utility in looking at how reproducible it is to re-run some of those 1,500 to aggregate a larger number. I mean, we just saw 14 patients that were repeated here. They might have a larger number where the clinical outcomes not the question here for clinical comparison, it's the analytical comparison. So it doesn't really matter which group they're in, it's just whether the analytical comparison works. There might be some value to that. And it would seem to me, too, that it would be good to have an evaluation of the reliability of the test in other labs. And again, I've been told that that exists, but that the variation is even greater there. It may be, although I'm not sure from the data that I've seen whether I'm interpreting that correctly, and perhaps I could ask the sponsor that question. If it is looked at in different labs, are you seeing different variation? And maybe you could address that when you sum up at the end. Does that help, Steve? Okay. I ask you now then, Dr. Gutman, have we addressed all four items, or are there still residual issues?
DR. GUTMAN: No, I think you've done fine.
DR. TAYLOR: Okay. Yes, I'm sorry, I didn't see you Dr. Duffell.
DR. DUFFELL: It's just an afterthought on your comment. Those other results that the sponsor has I'm sure weren't collected under the setting of a clinical trial. So you've probably got consent issues I would imagine as to whether or not you could go back to some of these patients and use earlier results and things of that sort. I don't know if that really would be something that they could do or not.
DR. TAYLOR: Well, I'm not sure whether you want to use earlier results. I'm just wondering whether they've got bank specimens. I know that you throw away any frozen urine, but on the other hand, once you've processed the urine, you can freeze it and do repeat testing if I understand it correctly. I mean, the controls are frozen and thawed.
DR. DUFFELL: I mean, I don't even know what FDA would think about going retrospectively like that to that kind of data or not. But anyway, I just thought I'd bring it up, because it may not be as easy as, you know, to the lay person it sounds like, oh, well you've got all those samples, let's just go grab them and try this. It may not be quite that straightforward is all.
DR. TAYLOR: Well, nothing's ever that straightforward, and I wouldn't presume to speak for the FDA. I'm not sure any individual can, but Dr. Gutman's as near as we can get to it today. So you should perhaps address that question to Dr. Gutman.
DR. GUTMAN: You should recommend good science and we'll try and work with the company on whatever the recommendations are.
DR. TAYLOR: Thank you. Okay, at this point then I -- I'm sorry, Joe.
DR. PARISI: One more concern that we've talked about a bit, but in the data that we were provided, patients that had sequential exams, Patient 7 is very interesting. A 77-year-old lady, she started off in 20.7 at initial. One year later it was 10.1. So she dropped by a factor of half. At two years she's 11, so she's hovering around 10, and then she's up to 19 again at another two-year point. I'm not sure what the interval between those two-year points was, but that seems like an enormous swing, to my mind. And you know, if you catch her early on when she's 20.7 and she's a little cognitively impaired, even though it's below the threshold, you might be worried about some kind of cognitive problem. On the other hand, if you catch her a year later when she's 10.1, you probably wouldn't have any concern at all. So I just want to emphasize, you know I think this variability is a problem.
DR. TAYLOR: Yes, I mean this is really rather alluding to, and we did have a brief discussion with Dr. Averback earlier about that. And he made the point that in the 14 patients none had switched, and that's absolutely true. But the issue is whether there's a big enough number there, and whether they systematically looked at whether patients do switch, and whether if you run the same patient on five consecutive days, although I agree that's going to be really hard to do in many of these patients. But what sort of variation do you get in the test. And then if you run exactly the same split specimen on five consecutive days, we've heard an answer to that, and that's reassuring. But I think that is a little bit of a concern considering how close the clusters are to the cutoff point on either side, both above and below. You're sitting in the laboratory, and then having a clinician call and say what does this mean, you say well, there's the reference standard, and then they quote statistics to you, and then you're lost, you know?
I think it's been a reasonable open discussion. We maybe have touched on most of the issues here. Anything we've missed as a group? So the form here then at this point is that we are now at -- okay, we're now at 2:40. We normally would have a 15-minute break, and so we will do that, at which point we will return at let's say 3:00. And we would then have the second public session, at which point the public can speak, and we then go on to have a summation from the FDA, and then a final summation from the sponsor. So let us reassemble in 20 minutes. Thank you.
(Whereupon, the foregoing matter went off the record at 2:40 p.m. and went back on the record at 3:00 p.m.).
DR. TAYLOR: Okay. We have reached the point where we hold the second open public session. I'd wish to ask now whether there are any individuals who wish to address the panel. If so, would you please raise your hand and identify yourself at this time? Seeing no one, I wish now to ask the panel if there are any other questions they wish to address before we proceed to the final summations, either to the sponsor or to the FDA. Are there any final questions?
Okay, seeing none, before we move to the panel's recommendation and the panel's vote, are there any further comments or clarifications from the FDA?
DR. GUTMAN: No, we think you've covered all of the points we wanted you to.
DR. TAYLOR: Then we can move to ask the sponsor whether they have any final issues or clarifications that they would like to present to the panel. A summary, yes. Please. There's normally about 15 minutes allocated to this, Dr. Averback.
DR. AVERBACK: Okay.
DR. TAYLOR: That's okay?
DR. AVERBACK: We'll try to be as brief as we can. Again, want to thank the panel for listening to our presentation today, and asking some very good questions, and very constructive comments. There are a number of issues that we would like to address directly that we would like to now respond to.
Our thesis is that this measurement is absolutely safe. There's no downside in our view. It can't hurt anybody. There's no definitive action taken from it that can lead to danger, and there's no bodily risk, and it adds useful information. A lot of the debate that we listened to in the last half hour struck us as debate about the practice of medicine. And different doctors have different patient populations. In fact, according to the literature, 70 percent, or 65 percent of Alzheimer's cases are managed and treated by primary care physicians. And they do not refer every case. We've seen a heterogeneity in your comments, which is very typical of the practice of medicine, and it's part of the problems that we're addressing here today. And we think that this device may actually help to answer a lot of those basic questions about the heterogeneity of practice. I know I'm at one extreme, a neuropathologist, and at the other end, an ER doc. I see the two wide extremes. Some people think that's a little too extreme, but there's a lot of swath in between. And we think this device will be helpful that way. And we're not claiming that it'll make decisions.
And a lot of the discussion we heard, it was disappointing to us because there was discussion of actions being taken with the device. I'll get a positive, I'll do this. I'll get a negative, I'll do this. We would like to stress to you that overall, and I'm speaking statistically, these people can be referred or not referred, they can come back for follow-up or not come back for follow-up, or come back earlier, or come back later. They can have other tests done. They can have -- it's infinite the number of choices in the practice of medicine. Any active doctor knows this. So we are not saying that you get a result and you stereotypically do this or do that. And the discussion seemed to go that way, and I would just like to again say we're only trying to say that it adds useful information in the toolbox, and that that will overall be helpful.
A couple of specific points we would like to touch on. The comment about the outcomes trial from Dr. Blumenstein. Just in defense of our little company, this trial was a blinded trial. So in order to establish the veracity of this, it had to be blinded. The doctors in this trial couldn't use the results to show different outcomes. This is blinded. And I'm told, I'm not an expert in this, but I'm told by experts that the standard of approval is will the device influence diagnostic decision-making. And that's the standard of approval. Will it add some useful information, in other words. We are not saying that it replaces a doctor, or that it leads to concrete decisions.
And I think all of the issues that were raised are pertinent and good, and you know, we would obviously be willing to work with FDA to tighten up the labeling as the case may be, or do postmarketing surveillance to try to answer any and all of the issues that have been raised. Thank you very much.
DR. TAYLOR: Thank you. At this point the panel is now ready to vote the recommendation to the FDA for this PMA. I would remind the panel that the industry representative and consumer representative do not vote, and that as chair, I vote only should there be a tie. I'm going to ask Ms. Rufina Carlos, the Executive Secretary, to now read the panel recommendations options for premarket approval applications. Ms. Carlos?
MS. CARLOS: The Medical Device Amendments to the federal Food, Drug, and Cosmetic Act as amended by the Safe Medical Devices Act of 1990 allows the Food and Drug Administration to obtain a recommendation from an expert advisory panel on designated medical device premarket approval applications (PMAs) that are filed with the agency. The PMA must stand on its own merits, and your recommendation must be supported by safety and effectiveness data in the application, or by applicable publicly available information. Safety and effectiveness are defined as: safety, according to 21 C.F.R. 860.7(d)(1), there is reasonable assurance that the device is safe when it can be determined based upon valid scientific evidence that the probable benefits to health from use of the device for its intended uses and conditions of use when accompanied by adequate directions and warning against unsafe use outweigh any probable risks. Effectiveness, 21 C.F.R. 860.7(e)(1), there is reasonable assurance that the device is effective when it can be determined based upon valid scientific evidence that in a significant portion of the target population, the use of the device for its intended uses and conditions of use when accompanied by adequate directions for use and warnings against unsafe use will provide clinically significant results. And valid scientific evidence, 21 C.F.R. 860.7(c)(2), valid scientific evidence is evidence from well controlled investigations, partially controlled studies, studies and objective trials without matched controls, well documented case histories conducted by qualified experts, and reports of significant experience with marketed device from which it can fairly and responsibly be concluded by qualified experts that there is reasonable assurance that the safety and effectiveness of a device under its conditions of use. Isolated case reports, random experience, reports lacking sufficient details to permit scientific evaluation, and unsubstantiated opinions are not regarded as valid scientific evidence to show safety or effectiveness.
Your recommendation options for the vote are as follows. Approvable, if there are no conditions attached. Approvable with conditions. The panel may recommend that the PMA be found approvable subject to specified conditions, such as physician or patient education, labeling changes, or further analysis of existing data. Prior to voting, all of the conditions should be discussed by the panel. Not approvable. The panel may recommend that the PMA is not approvable if the data do not provide a reasonable assurance that the device is safe, or the data do not provide reasonable assurance that the device is effective under the conditions of use prescribed, recommended, or suggested in the proposed labeling. If the vote is for not approvable, the panel should indicate what steps the sponsor may take to make the device approvable. Dr. Taylor?
DR. TAYLOR: Are there any questions from the panel to Ms. Carlos regarding these voting options before I ask for a main motion? Are there any questions? In that case, is there anyone on the panel who wishes to make a motion? Dr. Blumenstein.
DR. BLUMENSTEIN: I move not approvable.
DR. TAYLOR: Is there a second for the motion? Dr. Parisi. The motion is open for discussion. Any discussion from the panel? Last call, is there any discussion from the panel? In that instance, I will then ask the panel to vote, reminding the panel that the procedure will be to raise your hand, and that following the vote I shall ask each member of the panel briefly to state their reason for voting the way in which they did. The motion then is for non-approvable. All those in favor, please raise their hand. So I'll read the names for the record. It's Dr. Parisi, Dr. Nath, Dr. Blumenstein, Dr. Gulley, and Dr. Gollin. Thank you. Those against? Dr. Lichtor, and Dr. Lopez have voted against. That leaves, I believe, no abstentions. Correct. I will now ask each member of the panel to briefly state their reason for voting the way that they did. We'll begin with Dr. Parisi and move around the table.
DR. PARISI: Thank you. I think the issue is whether or not NTP really serves as a reliable biomarker of disease. And I think the conditions for identification of a biomarker, its ability to detect a feature of the Alzheimer's neuropathology, and I don't think that's been addressed or demonstrated. It needs to be validated and neuropathologically confirmed AD cases, and that hasn't happened. It needs very high sensitivity and specificity, and we've heard a lot of data to that point. Some of the data are conflicting, but -- so I don't think we have enough information to really come to a conclusion about the specificity or the sensitivity of the -- of NTP.
Ideally, a marker ought to be biologically -- have some kind of biological relationship to disease pathogenesis, and I'm not sure we've seen -- I don't think we've had data to support that point either. There was a consensus conference at NIA that Dr. Becker mentioned briefly, a consensus report of the working group on molecular and biochemical markers of Alzheimer's disease from the Ronald Reagan Research Institute and the NIA working group that was published. And I think a lot of the issues that we've discussed today actually come out in this paper. One of the recommendations of that paper was actually that the marker, the putative marker be confirmed by at least two independent studies conducted by qualified investigators with results published in peer review journals. And I guess I would encourage the sponsors of NTP to possibly pursue that venue of validation.
DR. TAYLOR: Thank you. Again, I'll ask the panel to make any comments that are in addition to or differ from those made by a former member. So Dr. Nath, could you comment next?
DR. NATH: So while I agree with the comments made, there's no doubt that the test itself is very safe. And I don't think I would argue against that. The questions were regarding the efficacy, and I wasn't entirely convinced that the current clinical study is adequate in order to really justify it. I felt that the sample sizes were too small, conclusive diagnosis could not be made in between the groups, and no longitudinal data was available from the patient samples themselves. And in the few examples provided, it seemed like the cutoff really is too close for differentiating the non-Alzheimer's from the Alzheimer's patients. And in the few examples of longitudinal data, the fluctuation is also of concern. So these pieces of information influence my decision.
DR. TAYLOR: Dr. Blumenstein?
DR. BLUMENSTEIN: It may seem odd for me to say this, but I'm reminded of the difficulty I had when I was asked to be on the panel for the silicon gel breast implant. The issue to me is characterization of efficacy and safety. First of all, I don't know that I agree that the device -- the diagnostic is perfectly safe, because I think that it will change people's lives to see a result in a clinical setting.
The characterization of the performance of the device is not, as I've tried to discuss before, is not adequate. The wrong kind of language is used, the wrong kind of comparator was used, and all sorts of problems like that. I don't, and I'm not saying that the device isn't effective in the sense that it does seem to correlate with something that seems to be related to a diagnostic behavior. But it does not -- I don't think it could be characterized as it has currently been studied or represented in a manner in which I feel comfortable with the labeling.
DR. TAYLOR: Dr. Lopez?
DR. LOPEZ: Okay. One thing that I, as a neurologist, and as somebody who works in the field of dementia, I believe that anything that increases awareness of the disease is positive and is important. So I think that would be very important to have something in the community, and the PCPs to have a tool that can increase their awareness of the disease. The problem that I have with the study is that it's not -- I'm not convinced that it works in Alzheimer's disease. I would be convinced that it works in Alzheimer's disease if the probables and possibles were similar. And I would go back to the charts, and I would review all those possible AD cases. It may be the case that you have here cases where they don't have Alzheimer's disease, or they have other dementias. And see if that can improve the sensitivity in that group. And that would be probably one step towards the approval.
DR. TAYLOR: Dr. Gulley?
DR. GULLEY: Yes. So, I'd like to echo the remarks of Dr. Nath and previous members. I don't think that effectiveness was demonstrated. I think that perhaps safety can be assumed with this. However, based on presented data we are unable to assess whether there are false positives, the true number of false positives or false negatives, and therefore you cannot define the patient's at risk to look at the risk/benefit ratio.
As far as the effectiveness, in addition to the previously mentioned comments, there appear to be good correlations with two of the disease categories, probable Alzheimer's disease and definite non-Alzheimer's disease. However, there didn't appear to be good correlation with the possible Alzheimer's disease or MCI, and certainly there was no correlation with the neuropathologic gold standard. Furthermore, there are some concerns with the patient variability in the testing, as well as the use of the training set and the testing set as the same group.
DR. TAYLOR: Okay. Dr. Lichtor?
DR. LICHTOR: Okay. As a neurosurgeon who manages these patients to a certain extent, to me there's really two major issues. A, it's not really an identifying patients with Alzheimer's disease. It's more B, which is the help and management of patients with dementia who do not have Alzheimer's disease. And that's more of what I see. But I feel that this test does add some information, and only time will tell whether or not this will pan out to be helpful.
I think the downside of the test is really nothing, so although -- so there's really no risk that I see. I think many tests in medicine are not 90 percent reliable, or 98 percent reliable. I mean, we wish that were, but whether you realize it or not, a lot of times we make surgical decisions saying maybe there's only a 70 or 80 percent chance our surgery is going to help. And just be honest with the patients and tell them that. And I think it seems like you're being up front about what the efficacy, or at least the reliability of this test is.
I also say that although you say that the test is really geared for primary care people, and it may help them some, but I think also many of my neurology experts are frequently not sure of the diagnosis in these patients. And I think this test may provide some additional information, but obviously we don't have enough data to tell. But I think only time will tell if this test will help in management of these patients, but I think there's a possibility that it would, and as long as there's a possibility that it would I don't see any reason why it can't be added to the number of other tests that we order for these patients.
DR. TAYLOR: Dr. Gollin?
DR. GOLLIN: I have a number of concerns about the effectiveness of the test. I don't feel that we've been given reasonable assurance that the device is effective under the conditions of use that have been prescribed, recommended, or even suggested in the labeling. First, in terms of whether adequate specimens can be collected on the pre-analytic side, I'm concerned that the test can't be used in many of the patients who may really need it, because they may not be able to give specimens.
I have concerns, number two, whether the analyte is constant from sample to sample, and from laboratory to laboratory, and so on. I'm concerned about study design, and whether we can interpret the results that we have, and to me the numbers still are small. I don't have sufficient data to be convinced that a patient with a negative result will be helped by the test, and even whether a patient with a positive result will be helped by being treated by a primary care physician rather than referred to a neurologist which might otherwise happen. And so for all these reasons, I don't think the test is useful in refining the process of physician decision-making in terms of the diagnostic workup for Alzheimer's.
DR. TAYLOR: Thank you. I would like to summarize then the voting outcome for the recommendation. The motion on the table was for non-approval. Voting in favor of the motion were Dr. Gollin, Dr. Gulley, Dr. Parisi, Dr. Nath, and Dr. Blumenstein. Voting against the motion were Dr. Lichtor and Dr. Lopez. And you've heard each of the panel members give a brief explanation for the reason they voted, and the motion therefore is passed of a recommendation for non-approval.
As a final issue, I'd like to ask each panel member if they have any comment or recommendation for the sponsor as to what they believe may make the test approvable in the future. And in this instance, I'd like Mrs. Butcher to open the comments.
MS. BUTCHER: Again, I think the numbers may help, if there were larger numbers in the study. And secondly, to work closer with the FDA so that you're not at the end of the process saying that there's not a fit. And perhaps to design the whole study so that it marches hand in hand, and when you get to the end, you come out with a result that you're both -- that the FDA and the sponsor is aware of and ready to go forward with.
DR. TAYLOR: Dr. Duffell?
DR. DUFFELL: I'll pass, but I would like to make a closing remark when we're done with comments on approvability.
DR. TAYLOR: Do it now.
DR. DUFFELL: Okay. I think it's, you know, I wanted to make remarks to both FDA as well as the sponsor. I mean, this is a long, exhausting process of many years. They bring them to this table, as well as a lot of hours by FDA in analyzing and working with the sponsor over the years. I think what we've seen here today is an ongoing thing that I see where there's always room when you're dealing with science and medical data for reasonable men and women to differ on a conclusion. Obviously the sponsor felt as though they had an approvable product, and FDA clearly had some questions about that. I think what is most important at this particular juncture, and I'm sure FDA will work with the sponsor in doing this, but just to make it clear is I think, being on the receiving end of this before, is to get the comments quickly back to them so that they can work constructively together to try to either resolve questions that can be answered within the existing data set, or be about evaluating whether or not new studies are needed to satisfy information that we didn't have today. So I would urge FDA to work with them, and likewise the sponsor to work constructively with FDA in getting those things under way.
DR. TAYLOR: Thank you. Dr. Gollin?
DR. GOLLIN: I didn't feel like the data convinced me. Therefore, I would urge the sponsor and the FDA to work together to get data, or interpret -- or analyze the data such that the panel can be convinced that the test is an effective test.
DR. TAYLOR: Dr. Lichtor, do you have any suggestions as to what might be done?
DR. LICHTOR: Well, just a few brief suggestions. And one, I guess everyone else said, I'd like to see bigger numbers. But two, I think if you can show how management of patients is changed by this study. For example, do you have patients that, say, further tests which may be expensive weren't ordered because of the results of this study. Or could this study somehow expedite the management of these patients with dementia who -- where the diagnosis is not clear. I think that's the more challenging group.
DR. TAYLOR: Dr. Gulley?
DR. GULLEY: There may be several different options here. One perhaps would be repeating a test using neuropath as a gold standard. However, realizing that that is a long process, perhaps a test in the primary care setting, if this is the intended -- because the neuropath gold standard would be needed to show if the test were to be better than the currently accepted NIH criteria, then perhaps you could do the test in the primary care setting to see if that helped aid in the diagnosis, or referral pattern, or changes in management as was previously mentioned by Dr. Lichtor, but decreased the number of expensive tests needed to get the diagnosis. I don't know the exact endpoints of that trial, but that could be done in discussion with the FDA.
DR. TAYLOR: Dr. Lopez?
DR. LOPEZ: Well, I'm just going to repeat what they said before, except I believe that the test should show that it can identify people with Alzheimer's disease in general. Probable and possible. And in this case, it may require to go back to the possible AD and review the charts. It is -- maybe you have there some people with other dementias.
I'm still, I'm not sure if the MCI group helps here. It may, it may not help. We don't have -- the criteria for MCI are constantly changing. And the criteria for MCI is going in a direction which is very different to the one proposed by the American Academy of Neurology. So I don't know if it helps here.
The other thing that would help is if you can show that you can identify people at different stages, in mild, moderate, and severe. And you can do the -- you have the Mini-Mental State Examination score here. You can use people -- you can dichotomize the score in plus/minus 20. So those with the scores higher of 20 would be mild, scores less than 20 would be moderate and severe. And you can show that this test can identify people in early stages, and in the moderate and severe stages. People in the -- you don't need the test in people in the moderate and severe stages. You don't need to do the test. They have the dementia, it's pretty much advanced, the diagnosis is there. The target group is a group with mild dementia. Because you don't know in which way that group will go. And if the diagnosis will be different down the road, it will change the diagnosis.
So I think that there are two issues here. One is what I would like to see, is that the test works in Alzheimer's disease in general, probable and possible, and that the test is sensitive to detect Alzheimer's disease in mild stages. It may be that what you are picking up here is just a more advanced process. And it may be that the test is sensitive only to Alzheimer's disease in moderate to severe stages. But it would be important to know that you can pick it up through the whole natural history of the disease. But -- and I'm not sure if including cases with MCI is useful.
DR. TAYLOR: Dr. Blumenstein?
DR. BLUMENSTEIN: I'd like to echo what Dr. Lopez said, especially the issues about longitudinal studies, and studies that would give you some idea of the time course. Treatment monitoring also comes to mind. I'm not sure how many potential treatments are being evaluated, but that would be an excellent place to get repeated test data. And I would have felt a lot more comfortable had this test -- the data that we have been presented to us as more like a biomarker, and if we had seen things like associations with the components used to make the diagnosis in addition to the overall partitioning of the patients into those squishy categories.
DR. NATH: I also feel that a longitudinal study would resolve a lot of the questions that were raised by a number of individuals on the panel, because it would not only validate the test for reliability and variability over a period of time, and collection, and all those other kinds of things, but it would also help resolve the diagnoses of these patients over a period of time, because the patients in the possible category will declare their diagnoses if you follow them for a finite period of time. And all those -- even if you did not have autopsy data on those patients, you have a better certainty of what the diagnosis really is.
I would also strongly suggest, as I have done previously, is that consider using other groups of neurological diagnoses in sufficient numbers to give us a feel for whether this really is something specific for Alzheimer's, or would it be just as specific for any neurodegenerative disease.
And if a longitudinal study is going to be designed, then some idea of correlation with severity of the disease is important. And as Dr. Lopez said, that recognizing the mild patients is probably going to be clinically the most relevant. But if you're going to do correlation with severity, and I know it costs -- these studies can be quite prohibitive, but if possible, a standardized protocol for doing MRI testing might actually be very beneficial because that will resolve the issue of vascular dementia versus, you know, other forms of dementia, and front-temporal dementia. But it will also give you the ability to quantitatively actually measure atrophy, whole brain atrophy, or hippocampal atrophy, and really show that, yes, the protein levels correlate with some objective measure besides the clinical evaluation. Because we know that you can have a lot of plaques and tangles in the brain and still do really well on the MMSE if you're very highly educated. You know, so there are limitations to the clinical exam, and I'm hoping that the MRI might add another dimension to it.
DR. TAYLOR: Dr. Parisi?
DR. PARISI: I think we're all struck with the potential very exciting observation that NTP may have a real role in an understanding of pathogenesis of Alzheimer's, or may have a role in the disease itself. But the validity of the assay, it needs to be established against carefully studied cohorts of ideally longitudinally followed patients and controls. Patients with dementia, including other non-Alzheimer's type dementias.
One thought that comes to mind is that the NIA sponsors many Alzheimer's disease centers around the country, and they all have large cohorts of patients, and perhaps partnering with some of these may in fact enhance your ability to do some of these studies.
DR. TAYLOR: Thank you. I think that summarizes the panel comments. As chair, sitting watching the proceedings, I would agree with many of the thoughts of other members of the panel. I think there's partly an issue of claims versus labeling, and obviously the claims then reflect the stringency with which the FDA look at the data and measure the data. So it would seem to me there might be a basis for the sponsor Nymox to sit down with the FDA and see if they can negotiate a set of claims and a set of labeling requirements that allow the design of a study that's actually feasible. Clearly it's not feasible to do a study where the outcome is the current gold standard, which is histopathologic diagnosis. That's not going to work. So you're left with these surrogate standards for diagnosis that themselves are hardly gold standard. They are sort of something below bronze. And it does make it very difficult to design a study when the thing you're measuring against is itself not really quantifiable. And that's the challenge I think you have. It's almost as though you're looking for -- in another environment, if the test were classified as a 1 or 2, as substantial equivalence to something. And I don't know whether you can evolve in that direction with the FDA. I think we all feel more data is required, both horizontal in terms of numbers of patients, and longitudinal. I personally found the prospect of the test very exciting, and it would truly be disappointing for all of us if there is not something here that can be useful.
With that, I believe that the panel comments are closed, and there's a final housekeeping issue from Rufina who's going to remind us, I think, to destroy all of this material.
MS. CARLOS: Before we adjourn for the day, I would like to remind the panel members that they are required to return or destroy all of the confidential materials they were sent pertaining to this meeting. Materials you have with you may be left at your table, and any others may be sent back to me at the FDA or shredded as soon as possible.
DR. TAYLOR: At this point then, since there is no further business, I would like to adjourn this meeting of the Immunology Device Panel. I thank you all for your attendance, and for your courtesy during the course of the meeting. Thank you.
(Whereupon, the foregoing matter was concluded at 3:40 p.m.).