fresh frozen plasma, and we're asking the same questions now in red cells.
So in terms of red cells as a whole, we've talked about the potential for benefit, life-saving elements, but also possible harm, and I think there are several studies that have looked at this.
This is actually a very large study looking at almost 80,000 patients with myocardial infarction, and the curves there, it's a very small detailed graph, but effectively what it's showing is that mortality, or long-term survival after a myocardial infarction, is highly associated with hematocrit presentation.
In fact, in this 80,000 patient dataset, there were almost 3700 patients who received a transfusion during their care, and I think this is a very interesting, though retrospective, again, study, that looks at how the association of transfusion would benefit, or potential for benefit.
And if you look here, on the table, you can see that the presenting hematocrit relates very significantly to the association of mortality with transfusion.
So, in fact, if you presented with hematocrit below 24, then with transfusion, your survival was in fact fivefold that of your peers who didn't receive a transfusion. The benefit, if you call it a benefit, associated, continues up until a hematocrit of approximately thirty-three.
When you reach 33 to 36, as your presenting hematocrit, you don't appear to--it's pretty even, whereas if you go above, there's in fact a statistical association with worsened outcome.
So this is one of the studies that potentially suggests an association with transfusion with both life-saving benefit but also potentially threatening the outcome. There are several other studies in the kind of populations I mentioned after cardiac surgery, in critically-ill patients, and a BEAR study, which was a prospective randomized controlled trial, with restrictive and liberal transfusion targets, is also a good example, as well as the overall outcome, was noted in that study but in some subsets, particularly the younger patients and the patients with lower APACHE scores, lower critical illness scores, then in fact a post hoc analysis showed a p value of less than .05 in terms of the association of transfusion strategy with outcome.
So there is some sort of evidence, retrospective, to suggest, and some prospective, suggesting that there is potential for harm and benefit from red cell transfusion, in general.
Well, with this in
mind, the next question becomes, if you accept that there is potential harm
from a transfusion, then is one of the elements that we look at, the age of
transfused blood. And what you're
looking at here is a distribution of the aged blood when it's transfused to
heart surgery patients at our institution in
Notably, the median age here is 14 days, and that again is very similar to the Koch stud that's been referred to before, and I think probably the reason why they chose 14 days, as much as any other cutoff issue that they selected.
Notably, in the population of blood cells that are transfused, where it's been looked at, at least 20 percent of red cells are transfused after their fourth week of storage, and depending on the month you look at, up to 38 percent of pack red cells can be transfused within two days of expiry, and some months. It's a highly variable variable.
So we've referred to the storage lesion and it's not something I'm going to go into in great depth, but very well-described, in many facets. But I'm going to move on and look at some of the clinical studies that are out there.
There have been a whole slew of studies, many of them relatively small, as few as 30 patients, that have shown associations of the age of blood, with older blood being associated with a worse outcome in these populations, and as referred to, there are also several studies that have shown no association.
Again, what's common to all of these, in fact, is the size of the datasets, and they are all relatively small, mostly less than about 500 patients, which is why, I think, this publication that came out several weeks ago by Colleen Koch at the Cleveland Clinic is so interesting, and that is because it's the first one with a very large dataset that we've had a chance to assess the issue of aged blood storage.
And what they did was take a period of time, and a relatively homogenous group of cardiac surgery patients at a single institution, and divide them, again, down the median age of blood transfusion.
And they selected patients who, by chance alone, had received all of their blood transfusions that were either less than or equal to 14 days, or greater than 14 days, and you can see the breakdown there. And they asked a propensity adjusted association and a multivariable logistic regression analysis for the storage age association with postoperative mortality that's within 30 days. And then they looked at a Kaplan Meyer approach to long-term survival.
This is the table showing the outcomes, and in hospital death was associated with a p value of .004 for increased mortality with older blood, and as referred to by the previous speaker, there were specific major organ complications also associated, potentially implicating certain pathophysiologies in this association.
And notably, other organs were not associated, particularly the neurologic outcome and the myocardial outcome, did not seem to be related to this association with mortality.
They also, within the dataset that they looked at, had a composite outcome of organ outcome on mortality, and developed this in terms of the oldest unit each patient had.
So even among the patients that were in the older age group blood, they found patients who had older blood tended to have a higher complication score. And I also want you to fix on the shape of this figure, because I'll be showing you some other data from our institution which you can compare with this.
And the long-term mortality, as we've already seen, was associated with a worse outcome when the blood was older, and effect which seemed to approximately resolve after about a half a year.
So if we just move on now to work at our institution, this was actually work we published prior to the publication of the Koch data, so I'll show you some data we've done subsequently to compare our outcomes with theirs, to see if they are similar, and you can see the data there.
And these are the number of units that the patients were receiving, on average, and you can see that two is by far the most prevalent units. We did not include within this period of time the patients who did not receive any blood transfusions. These were the blood storage vehicles for that period of time. You can see them in AS-1 and AS-3.
And in contrast to the average age of the units transfused, we looked specifically at the oldest unit that any patient received, and you can see that in contrast to the median age of old transfusions being 14 days, if you look at patients receiving blood with heart surgery, there tends to be an older average median age, and probably the larger explanation for that is that many of these patients receive more than one unit, and of course each time you receive another unit, you have a second chance to get an older unit.
So median age, 19 days. So apologies for the numbers here being small, but these are post hoc numbers, just to give you a comparison to the Koch data, and the first numbers there are essentially reproduction of their analysis, and you can see the 30 day mortality. 30 day mortality is 1.68 in the patients who received only blood less than 14 days, and 2.38 in those who received more.
Those are actually very similar numbers to those in the Koch database, and the p value, while only trending, was .14. As I say, this was not the primary analysis.
When we looked at a similar analysis, but looking just at the oldest unit, in this case we found the p value was .003, and again if we used the median value of oldest unit, less than--or more than 19 days, the value, p value was .01 in terms of statistical significance.
But, in fact, the primary analysis we were looking at was the storage age of the oldest unit, and postoperative mortality, and similar to the Koch study, the long-term survival.
What we found, in fact, was that the oldest unit was a strong predictor and highly associated with mortality risk, even when the number of units transfused, and the Hammond score, which is a well-validated preoperative mortality risk score, were included in the model, and the associated risk was an increase of 20 percent for every increase of seven days in the age of the blood.
But what we were interested in, in addition to evaluating for the presence of an association, was whether there was a curvilinear pattern to this association, if you wish, the shelf life. Was there a best before date, or anything in that nature?
So we added in a cubic spline analysis to this initial analysis, and--excuse me. This, again, essentially just shows the same data I just presented in another format, showing that the effect was similar in all groups relative to their preoperative predicted risk of mortality.
But this is our figure showing stratified by score, so patient risk, in fact, very similar patterns, again, very similar to the pattern we saw with the Koch study, showing--or suggesting possibly that there is a period after which this association changes in its steepness of the curve, if you wish.
Similarly, if we stratify by the number of units patients received, again, a very similar curve shape, and again, approximately 28 days, we see a change in the slope of the curve.
Also similar to the Koch data, we found an association, not as strong as the Koch data showed, with the sicker and more transfused patients, showing a statistical increase in their mortality risk with more older--their oldest unit being an older blood unit.
So leukoreduction was raised, actually, as another variable, and it turns out that in our dataset, leukoreduction was introduced approximately halfway through the study, and so we were actually able to introduce this as a question.
Mortality, in hospital, and long term, has actually already been looked at with leukoreduction. In fact, in one meta-analysis of trials with leukoreduction, and they have in fact found no association with outcome.
Interestingly, though, in the three trials with cardiac surgery subgroups, one of the more critically-ill groups evaluated in this study, they did see an association in three randomized controlled trials combined, of leukoreduction with reduced mortality in this period of time, 22 to 66 months.
As pointed out, the issue of leukoreduction potentially has some relevance to the age of stored blood, because if there are white cells lingering, then they have longer to show their displeasure and release, potentially, of cytokines and substances of such.
And in fact one study, postcardiac surgery patients, has looked at the association of leukoreduction with age of blood, and in fact demonstrated an increased incidence of pneumonia in the patients who had not been leukoreduced.
Interestingly, in our study we did not find any association between leukoreduction and patient outcome.
So, in conclusion, I think red cells can contribute to complications, in addition to their life-saving potential, and certainly in the clinical arena, I think there's been a move towards using them as a last resort, once we've exhausted all other safer methods of avoiding extreme anemia, but prolonged storage of blood is relatively consistently associated with higher complication rates, including mortality in some patients datasets.
And the role of leukoreduction certainly remains to be determined. And overall, my conclusions are that effectiveness trials, which essentially is a lot of what we've done in the past, have been sufficient, are going to be continuing to be added to by post-marketing surveillance safety follow-ups.
So how this relates, if you accept that the retrospective data is useful, then I would say this suggests that we shouldn't take steps backwards before we know that we're in the right place in terms of the approach to storage of blood. Thank you. Questions?
DR. SIEGAL: Okay. Thank you very much. Are there questions?
DR. SZYMANSKI: Thank you. I was impressed that you brought the leukoreduction into this picture, since right now we have been talking only about age of the storage, the length of the storage as possible problem.
But we have not questioned as to what other features might be involved here, because when your red cell ages, many changes happen, nonviable cells increase and also the ability to deliver oxygen decreases, and that's another possible parameter that can be involved, as well as release of all these cytokines, and maybe the cell deformability change, plus free hemoglobin may be a problem.
And I think it would be really nice if some kind of prospective study could be done, that would analyze various factors, that might be responsible during the, you know, during the aging of the stored red cells. I think that would be very important, because up to now, we have only talked about viability. We haven't talked about function and other characteristics of this which could be harmful in a clinical situation.
DR. STAFFORD-SMITH: My comment, that I think we've referred to before, there's one study by Hebert and colleagues, that looked at, you know, a small dataset, approximately 60 patients, the feasibility of adjusting the age of blood that people receive, to look at if there is a difference in outcome.
I think the potential limitations or the challenges that are going to occur in terms of designing the ultimately study are firstly, that--and we, in our institution, are beginning this process--is the position of equipoise, and is there sufficient equipoise to justify giving one person, or one group of people a particularly old unit of blood?
And to demonstrate the associations that are being demonstrated, you really have to move out towards, you know, the significantly older blood, and even, for example, the Hebert study was looking at four days old versus 19 days old, on average, and one would expect that that is not really where the signal is, if there is a signal.
Now the only other variable I--well, the only other thing I'd like to mention is we were highly concerned, obviously, with the retrospective nature, and trying to do everything we could to evaluate the dataset.
One thing certain is when one is in the operating room and reaching for blood, one doesn't look at the date of expiry when one's looking at the patient characteristics. But we did look at the date of expiry and patient characteristics for many of the variables that we had available, and there were none where we were able to show any major deviation from standard, you know, random, essentially, distribution of units to patients with various characteristics.
DR. SZYMANSKI: Were these units given during the serious or operative period?
DR. STAFFORD-SMITH: I beg your pardon?
DR. SZYMANSKI: When were these units transfused, during surgery or--
DR. STAFFORD-SMITH: For our study, they were from the start of the surgery to the discharge from hospital. Or death.
DR. CRYER: You had almost half your patients receive two units, or less, it looked like. Was there a difference in mortality in that particular group?, because you wouldn't think that that would be dying from some complication of the operation. It would have to be more likely the blood.
DR. STAFFORD-SMITH: Right. We broke down--we did a subanalysis of patients, in fact, with one unit or one and two units, and, in fact, in that dataset, this association was not present.
Now having said that, that's a relatively small total number of patients also.
DR. FINNEGAN: I have two questions for you, actually. The first one's a little rude. Do you know, either in your study or in the
DR. STAFFORD-SMITH: How do you mean, the processing? You mean the target hematocrit?
DR. FINNEGAN: Yes; for the 42 days. In other words, what we're looking at now, do you know if you fell in the 67 percent threshold or in the 75 percent threshold in the processing of your red cells?
DR. STAFFORD-SMITH: Right. I apologize. I don't know that.
DR. FINNEGAN: My second question is Mark Gladwin has done some work in sickle cell patients and found that the native serum hemoglobin, that is, free hemoglobin in the serum causes significant problems with nitrous oxide, the scavenging. And so my question would be: Do you think that some of the problem you're with the older cells is in fact related to the free hemoglobin?
DR. STAFFORD-SMITH: That is very possible. There are lots of other circumstances that I can relate to clinically, where you see that. For example, when we've done some of the trials with free hemoglobin, attempts to replace blood, blood substitutes, one of the major problems is hypertension, for example, and acute kidney injury, both of which are related, presumably, to the nitric oxygen scavenging of the free hemoglobin.
DR. FINNEGAN: So that obviously, at the higher the cell survival rate, then the lower the free hemoglobin you're going to have, the better the outcome?
DR. STAFFORD-SMITH: Along that rationale, that would be true; yes.
DR. KATZ: In a New England Journal paper, recently, or what you presented, are you able to control for the operative team that's doing these surgeries and these mortality analyses? The surgeon and--
DR. STAFFORD-SMITH: No, I understand what you're saying. We didn't make a specific attempt, but again, I guess if I was to try and defend, you know, the analysis, I would say that our desire to pick a younger or an older blood wasn't affected by the surgical team.
In fact, I think we were pretty much random in terms of which blood, other than probably the one at the top of the pile of bloods under the ice bag that we picked, and again that was pretty much random throughout every variable we looked at.
We didn't specifically look at surgical team.
DR. SIEGAL: Dr. Rentas.
DR. RENTAS: I just happened to read the paper that you just mentioned last night, and actually it was mentioned by Dr. Davey as well. You show Table 2 but Table 1 was never shown, and it seems to me like the preexisting conditions in some of these patients that were given older blood was increased as compared to the patients that were given fresh blood.
However, when you look at the discussions, the authors really didn't go into any details about that. Is there something you can say about that?
DR. STAFFORD-SMITH: Well, they actually also have a figure which attempts to demonstrate a very similar pattern of transfusion among the older and the younger patient groups in terms of blood.
I'm not sure that I can explain why those patients had differences in the older and younger patients.
DR. DI BISCEGLIE: This may be related. The technical thing in the conduct of all of these studies. You had said something like the more blood the patient needs, the more likely they are to --
DR. STAFFORD-SMITH: Well, the more opportunity, because sort of like if you throw a dice, there's more chances you'll get a higher number.
DR. DI BISCEGLIE: I'm sorry. I couldn=t --
DR. STAFFORD-SMITH: Sorry. Each time you throw a dice, you get another chance to get a high number. So you just take more chances to get a high number or an old blood, in fact. It's a potential bias of if you take a patient with many transfusions, they just have many more chances to get an older unit.
DR. DI BISCEGLIE: -- that the blood bank is giving you the oldest, and then the next?
DR. STAFFORD-SMITH: No.
DR. DI BISCEGLIE: Under a policy of first -- because if that's correct, the next time you throw the dice, then --
DR. STAFFORD-SMITH: Well, at the risk of bringing out something that's hard to explain, and maybe even harder to understand, we were as concerned, as maybe you are about this. So we tried to find a way in which we could assess for the potential for bias because of the number of units a patient had received.
And so what we did was we took the patient's blood unit number, and we took the last number of each unit, and we considered that a random number generator.
And if you give a patient more units, sure enough, the mean number that they receive goes up. Gradually climbs. So what we wanted to see was, in our analysis, if we used that analysis with this random number generator and controlled for the number of transfusions, did an association between mortality and increased transfusions disappear?
In other words, was it adequately accounted for by controlling in our model for number of transfusions? And in fact the effect disappeared with the random number generator being used, whereas with the age of transfusion being used, it was a robust finding and it didn't disappear. I don't know if that is clear.
DR. SIEGAL: Any more questions?
DR. SIEGAL: All right.
Then let's go on. Now we come to
Larry Dumont, director of Cell Labeling Laboratory, assistant professor,
DR. DUMONT: mitigating condition, members of the committee, and Dr. Epstein, and FDA, and those of you beyond the barrier, good afternoon.
It's been a "fun" time. Dr. AuBuchon sends his regrets. He really wanted to be here. He knew it'd be a lot of fun in the discussion.
I wanted to point out, for some of you that probably don't know, what the best collaborative is. It's a group of manufacturing members and scientific member, and we're an independent group, we meet a couple times a year, and we run lots of self-initiated independent type studies looking at improvements in blood safety.
This paper, I think you all have reprints of. It should appear this month in Transfusion.
My conflicts of interest have not changed in the last couple of hours. And what we want to talk about is the red cell performance criteria, of course, that FDA has on slate. Now when they evaluate red cells, they of course look at several in vitro characteristics, like hemolysis, maintenance of ATP, etcetera. Also this autologous 24 hour recovery, that's the main point of discussion today, and in some case, they could go to clinical outcome or safety trials.
But again we're going to look specifically at this one outcome; continue to look at it. I'm going to give you my own rendition of the history of this. In 1947, we started out with a mean 24 hour recovery of autologous radiolabeled red cells of greater than 70 percent. That has morphed into a greater than 75 percent recovery in 1985, and then, with the addition of a standard deviation requirement, in 1998, and that has continued to morph until what we have today with mean recovery greater than 75 percent, the standard deviation criteria, and this business of the lower confidence limit, one cited 95 percent for the population proportion of successes has to be greater than 70 percent.
And this is where that 75 percent comes into play, and I call that the success threshold, and that's where the business of 21 successes out of 24 trials, or 18 successes, at least, out of 20 trials, comes from.
And I think the major discussion is not that value. It's that value, right there.
Well, as you already know, we're talking about lots of percentages, and it gets confusing, at best, but I think I can tell you it's going to be okay, we'll get through all these things.
To try to help that, I've actually constructed some cartoons, and we're all fairly comfortable, I think, looking at Gaussian curves, and these are cartoons, because actually these distributions are not strictly
Gaussian, and so my apologies to all the hoard of biostatisticians in the room, but I think it'll make the point.
So in 1970, a distribution that would look like this, with a mean of 70 percent, or higher, that would be okay. And this, down here, is the 24 hour red cell recovery.
Well, in about 1985, that was moved up to a minimum of 75 percent for the mean. Then in the 1990's, there was the addition of the standard deviation criteria.
Well, about 2004, there was a decision made that everything less than 75 percent happens to be an unacceptable individual recovery. So anything down here is bad. One more little apology for the cartoon. Microsoft won the battle the day I was making these. This is actually supposed to come right down to seventy-five. Couldn't figure that out. But I think you get the idea.
Well, the implication for this, from a practical standard, is that a test red cell, or a new red cell product, a new bag, new solution, new machine, really must have at least a 90.3 percent success for an 80 percent chance of passing a in vivo recovery trial, something that I would run in my laboratory. So that looks like this.
So for an 80 percent success probability to run a trial, a manufacturer that would come to me and want to contract for my research lab, we would hope that they would be up--that's about an eighty--87 percent with that standard deviation criteria, to pass, have a reasonable chance of passing the current criteria.
Well, we looked at that and we asked a couple questions which have been asked today already.
The first one was what is the clinical evidence that this number down here, this 24 hour recovery number, has anything to say about the reactions that we see in patients, which are real; but how does this help us with that?
And when we looked at that, we kind a got a blank piece of paper out of it, and just one small comment about the epidemiological studies that have been talked about today. Certainly, I'm a big proponent of those. I like them. I think they're great. But we need to realize that-- one example. Red cells are not issued at random. They absolutely are not.
I was reading the New England Journal paper from Cleveland one night, my wife, who's a blood banker, was making dinner. I got to the table that described the characteristics of the early and the late, and it showed ABO type. And I said, "Hey, Deb," I briefly described the study for her, I said, What do you think the distribution of ABO types for the newer and old were? She filled in the table.
I mean, every blood banker knows that. So that's a real fallacy of some of those studies. So we came up with a blank sheet.
So their next question
was what is the capability of current red cells products to meet these
criteria? So that's where we went. And our conclusion, which we'll go through in
more detail, is that a success threshold of 67 to 70 percent, instead of 75,
will provide a reasonable probability of passing FDA-proposed criteria for red
cell products in current use in the
And that would look, again on this cartoon, a distribution like this, that's clearly moving up but doesn't meet this previous high standard that I showed you.
So the objective of our study was to define the ability of currently-available red cells collection and storage systems to satisfy this criteria of in vivo recovery proposed by FDA for approval of red cell systems.
And the way we went about it is we looked for data that was available for all approved, cleared methods between 1990 and 2006. We went directly to the laboratories that conducted these studies and we ended up with data from 11 laboratories, and they sent the data into us and we put it in a central database, all coded in secret, so nobody knew what it was.
Of course you had to do a data review and cleaning, and find where the obvious mistakes are in that, and double-checks, etcetera.
And then we also, after that, we went back to the sponsors of these old studies, and we said please dig out your records and verify that we have the right data that you had submitted to FDA. So we went through that verification.
Then we ended up with a database that you already heard that we also shared at the request of FDA with them, and they were actually very helpful in reviewing it in detail, and so we got a pretty solid database.
We then stratified into three types of products. Liquid stored for 42 days. So these are things that are in AS-1, AS-3, AS-5, stored in the refrigerator. Gamma irradiated products. These are stored 28 days, post-irradiation. And then products that have been frozen and deglycerolized, stored 15 to 30 days in the freezer.
Then we approached it
from a sampling standpoint, where we went into each of these groups, and we
sampled samples, sets of 24 each, and we repeated that for a total of 5000
times. Some people might call this like
So we repeated that for each of these groups. Thank God for computers. And out of 34 studies, we had some leukocyte-reduced products, some nonleukocyte-reduced. We had some automated collections, some manual collections. Of course we had the frozen/thawed. Liquid stored gamma-irradiated. In total, we had 941 of valuable recoveries from this dataset.
And for a descriptor of each of these groups, we'll go through that now.
This shows for the liquid-stored products, there were 641 of them, and this is the 24 hour recovery, and this is a frequency histogram, and with the lowest recovery in this report of 36 percent.
The frequency that was less than 75 percent in this group was 11.7 percent. Remember now, these are called failures, and if we just did a binomial expansion of this descriptor, and looking at a sample size of 24, the probability of having greater than or equal to 21 successes by sampling out of this was 69.3 percent.
But we did this, and then plus we did the resampling exercise. The gamma-irradiated products had 31 percent that were less than the 75, and you can see means of standard deviation's over here.
And the probability of having a successful experiment against this population was exceedingly small, less than 4 percent.
The frozen/thawed products were--actually, they looked the best of them all. We had 5.6 percent less than 75 percent, with 95.7 percent chance of passing the criteria according to the binomial expansion.
So how does this work? Well, this is just an example of one type of sample of 24 that we took. So this was the 257th replica of 5000 that we did like this. We had a lab identifier. We knew what storage solution it was in. We had a recovery value.
And in this case we were evaluating less than 70 percent recovery. So you can see that this one is 64, this one is 67, this one was 58, and that one was forty.
So those were failures at the 70 percent criteria, and so this sample of 24 did not pass any of the current criteria, less than 75 percent, meaning it had greater than 9 percent standard deviation, and we had four out of 24 less than 70 percent.
So again we repeated this, we had 5000 groups like this where we had these descriptors for each of the populations.
And then we reevaluated at different cut levels, success thresholds, 75, 70, and 67 percent.
So here with the resampling, this is with the 42 day liquid stored products. This is the number of recoveries that are greater than 75 percent. So these are all failures, and the number of successes that we would have in doing this study with that population is about 67 percent.
They all passed the mean criteria. They all passed standard deviation. Well, 95 percent passed the standard deviation criteria.
With the gamma-irradiated product, just like the binomial predicted, it was a miserable success against that criteria, 3.5 percent passed. The mean greater than 75 percent, 96 percent of them passed that, didn't do so well on SD criteria.
Frozen/thawed products did the best of them all, with 95 percent passing the 75 percent success threshold.
So our initial conclusion on this was based on that failure rate, and the concern that we didn't have a good correlation between 24 hour recovery and solid clinical outcome, was that that was an unacceptable number in our eyes.
And we felt that the general clinical performance of these products is certainly adequate as proved over years of clinical practice, and it represents the state of the art.
So we ask ourselves what's the sensitivity to the success threshold. So this shows the chance of passing. This is the liquid stored products, the gamma-irradiated products, and the frozen products, and these are success thresholds of 75 percent, 70 percent, 67 percent, and you can see that it's quite sensitive to the success threshold between 75 and 70 percent.
That made a big difference, a little bit more difficulty if we dropped to sixty-seven. The paper describes how I got the 67 percent. We won't go into that today. And of course the gamma-irradiated product is very sensitive in this range and doesn't have much effect for the frozen/thawed.
So the summary for the chance of passing these criteria, 42-day liquid stored, gamma frozen, shows that for the 42-day liquid stored product, a 100 percent of them passed the mean criteria. 95 percent standard deviation criteria. With a sample size of 24, 69 percent passed the 75 percent. With a sample size of 20, it gets worse. As we would expect, 58 percent. If we would modify the success threshold to 70 percent, you can see how that probability or that power improves, and that's what it looks like for 67 percent.
So some other key observations that I wanted to show the committee.
One is current red cell products are not different than the study population that we examined, and number two, there are differences between laboratories and/or study subjects.
So this is liquid-stored products. This graph I showed you a minute ago, where below the 75 percent, we had 11.7 percent of these products were less than that. Less than 70 percent recovery, 4.5 percent were less than that.
Now here's some data, recent data. These are from two studies that are ongoing in two different laboratories. These are controlled products. This is a product that's being transfused this afternoon in our blood banks, in our operating rooms, and out of 36 products, right here, four out of 36, or 11 percent, are less than 75 percent. 2.8 percent are less than 70 percent.
So, to me, this looks like that. There are differences between laboratories, and probably study subjects. So this shows you the same recent data, two different laboratories, to be unnamed. This shows 24 hour recovery, exactly the same conditions, and you can see the distribution in the laboratory that I call number one and you can see the distribution in laboratory number two.
So we have some differences between the labs and we clearly have differences between subjects.
We have some other observations that we haven't had a chance to evaluate yet but we're generating a hypothesis of why this might be caused from special effects in the specific subject. So there's a whole host of unanswered questions, I think, in this assay method.
So our conclusion is
that the FDA proposed success threshold of 75 percent is not validated against
currently-approved red cell products available in the
Based on actual in vivo recovery performance, a success threshold of 67 to 70 percent will provide a reasonable probability of passing FDA-proposed criteria for new products, and it might look something like this, where the lower confidence limit, 95 percent, for a population, proportion of successes has to be greater than 70 percent, where the success threshold is 67 percent.
In my view, if we're going to use this kind of criteria, the mean and standard deviation are helpful but I don't think they should be in the criteria, cause I think this takes care of the whole issue, and the other problem is none of these distributions meet normality assumptions. So it's a real problem, trying to make inferences with means of standard deviations.
Once again, we would suggest not to make it unnecessarily burdensome for new innovations to enter the market, and would suggest that a distribution that looks something like this with--this is shown with a 70 percent cutoff--would be a reasonable approach for a criteria for new products.
And I wanted to acknowledge these are the study laboratories where the studies were done. The sponsors of these studies are shown here. The individuals that worked really hard to pull out the old dusty records are shown right here.
Thank you very much. I'll take questions.
DR. SIEGAL: Questions for Dr. Dumont?
DR. FLEMING: Dr. Dumont, you had a key introductory slide, about six slides in, that raised two critical questions, what's the clinical evidence and what's the capability of current RBC products. I don't know if we can put that up here while we speak.
AUDIENCE MEMBER: We can't hear you.
DR. FLEMING: Oh. So we were asking for his slide that--it comes right after this, I think. Okay; there you go. Thank you.
So there are two key issues here. One of them is what I might call one based on clinical relevance issues, and another that is more statistical power issues, and surprising to you, maybe, being a statistician, I really want to focus more, right now, on the clinical relevance issues.
We will come back to these power issues after the next presentation, when I think there's even more data to address what is the likelihood of success.
But you spoke--and I understand--you spoke with some concern about the reliability or validity of the type of data that we've seen, indicating that these measures of success could truly be relevant or related to what we really care about, which is the risk of clinically-relevant outcomes--ventilary support, renal failure, sepsis, multiple organ failure, mortality, etcetera, and we spoke a lot about mortality in the previous presentation.
I understand your point. There is certainly valid reason to be concerned about lack of randomization, etcetera. There still is, however, a considerable amount of evidence there that raised some concern, even though you can validly question the reliability of that concern.
But I haven't heard what you've provided as the evidence for why the shift doesn't matter. So specifically what's not shown here in your slide presentations, but what's shown in your paper, is that the median recovery when you have frozen is 88 percent, and when it's that high, that's the reason, as you're saying, when you're shifting, your distribution has shifted over here. That's the reason you have a high likelihood of meeting FDA criteria.
The median recovery for gamma-irradiated is about 79 percent. You've shifted this distribution over considerably, and as a result, you correctly noted, you have a low probability under the FDA criteria, that those interventions would be approved.
In essence, how is it that you're explaining to us that while you're questioning the evidence that shifting from here to here is in fact putting you at greater harm, what evidence are you giving us that it's not putting you at greater harm? That's a very substantial shift here, and are we saying, are you saying that it's perfectly okay to have a 78 percent, or 79 percent success, or average recovery, rather than 88 percent average recovery, and that those two differences don't matter clinically?
What is the evidence that you're--so you're contesting the evidence that says it does matter clinically, but you're not giving us any evidence that says it doesn't matter clinically, and that's a pretty substantial shift between the average recovery for the frozen versus gamma-irradiated.
DR. DUMONT: So you're looking at the right side, I'm looking at the left side, and I would submit that when we were here, when we were here without the shaded area in there, that we had products, and in fact we have products today that are being used, that have--they do have some kind of risk profile associated with them.
I subject that putting this mark at 75 percent is strictly arbitrary, and that it is not demonstrated anywhere, that I'm aware of, that that is associated with any of the negative events that we see in the clinic. And in fact there would probably be an even stronger association if we would look at other parameters such as 2,3-DPG level.
I mean, we can just pick one, and, you know, there's fifth of them, and we use this for good reason, but I think that it's supplying--I mean, my suggestion to the committee to consider is that that's supplying an unnecessary burden for new innovative products.
DR. FLEMING: But what you're suggesting, that I agree with, is the further you require this distribution to be shifted to the right, the higher the burden it is for a product to achieve the criterion.
But the fundamental argument for where that should be shouldn't be on a statistical power calculation. It should be on a clinical relevance situation. There have been data put forward that say when you go from this region over toward the left, you're going to be in a higher risk for clinical outcomes of concern.
You're contesting the reliability of that data but you're not providing us any evidence that in fact reassures us, that when you allow this distribution to shift substantially to the left, that it's not going to be harmful.
It sounds a bit like absence of evidence is evidence of absence. We don't have data that it's a problem. Therefore, it's not a problem.
DR. DUMONT: Well, okay. I get it. Can I answer? All right.
DR. DI BISCEGLIE: May I clarify a question, I guess the onset to Dr. Fleming's question as you say, that the evidence that he's looking for is in fact that most of the approved products in fact shift that curve to the left now, and so we have the clinical outcomes that we have today. Isn't that the evidence that he's asking for? No?
DR. DUMONT: I believe, in my view, that's the only evidence we have. I believe the other data that--where we say younger red cells have a higher recovery, and younger red cells may have better clinical outcomes, that may be an example of true, true and unrelated, because we have no data that says that this particular measurement is causal in clinical outcomes.
DR. VOSTAL: If I could just make one point. The criteria that we're talking about today really applies to liquid stored red cells, and the other conditions, which are gamma-irradiated cells and frozen red cells, they're special cases, and we'll discuss those at some other time. But for today's discussion, it's only liquid stored red cells.
DR. SIEGAL: Dr. Cryer.
DR. CRYER: I'd like to ask, in the FDA presentation, there was three different graphs they put up, and I assume those were all liquid red cell products. Okay. Were those three all in your study as well? Do you know? Because one had a huge variation and eight of them below the line, and--
DR. DUMONT: I think they were; yes.
DR. CRYER: They were all over the place. And another one was really tight.
DR. DUMONT: Those data, I believe, are included in this dataset. There's the additional 94, but you're talking about the ABC slide?
DR. CRYER: Yes. The ABC. Yes. The ABC slide.
DR. DUMONT: Where you had the--
DR. CRYER: And I guess the problem I have is if that's true, A and--I think it was A and C, I can't remember. But one of them--whatever--there was one bad one and there was one tight one, and I'm having a little trouble, why you would think, using your statistical analysis here, that those two products were in any way similar.
You're saying they were both safe and fine, basically is what I'm hearing you say. And I wouldn't want one. The other one looked okay.
DR. DUMONT: Well I'm saying we don't have the clinical outcomes to answer that question.
DR. CRYER: I would agree with that, but you're measuring a process. You're not measuring clinical outcome. You're measuring a process. This whole thing measures a process--
DR. DUMONT: Absolutely.
DR. CRYER: --of how reliable the survival of red cells is after a process. That's what it measures.
DR. DUMONT: I'm sorry. I can't address that any further.
DR. FLEMING: Before we lose this slide, I want to make sure we keep our eye on the target here, because it doesn't matter that it's liquid or not liquid. Suppose we are just focusing on liquid.
What this slide is saying is if you have liquid, where the distribution is here, centered around ninety, that's going to be a product that's going to be just fine with the current FDA criteria.
If you have another product where it's centered around 78, make it liquid, it's not going to do just fine under the FDA criterion that's currently in place.
But if you soften the criterion, it too will do just fine, and so it doesn't--these issues are not specific to whether it's liquid or frozen. The point is whatever the formulation is, are we saying that if you have a recovery that is normally distributed around 90, that's great. We're all agreeing that's great.
But if you have one normally distributed around 78, that's just fine too. If you believe that, then we should make these changes, and that's going to get those products on to the market just as--or very readily. But what's the scientific clinical rel--this isn't the statistical--clinical relevance, that when you get to that much lower a recovery, it is in fact just as good as when you had 90 percent recovery.
DR. ZIMRIN: I guess I'm a little bit naive, but I'm used to scientific presentations that actually try and present all the data, and a balanced view of the data. I find this a little bothersome, that there's studies, that we hear about two studies that suggest one thing, and we don't hear about the whole host that don't.
I mean--and I'm sorry, I've forgotten his name--but a speaker implied that there are a bunch of small studies and then the New England Journal study came along.
But there was actually
a study looking at more than 2000 patients in the
So I feel a little bit frustrated here because I would like to have a scientific, thoughtful analysis going on, and it seems that we've gotten sort of--I mean, this has gotten sort of polarized in this, and I just find that disturbing.
So when you haven't seen the data--but I don't think we've been presented with the data, actually, at least all the data that's out there.
DR. VOSTAL: I'd just like to make a comment about the
European study. It didn't show a difference. However, those are red cells stored in a
different storage solution. It's called SAGM, which includes mannitol. So it's not the same storage solutions we use
DR. KATZ: Larry, it seems like a lot of your argument hinges on the ability to get something important to market, and it might help, particularly nonblood bankers in the group, to have an idea of what it is that we're having trouble getting, or will have trouble getting as a result of more stringent criteria.
DR. DUMONT: You want an example of what kind of product might be--there could be a new product for the processing of whole blood into multiple components. That would be one example.
There could be a new type of blood bag that would not use DEHP plasticizer. There could be a treatment process to inactivate pathogens in the blood product. Those are types of things that would be subjected to this test. Is that what you were going for?
DR. CRYER: Yes. Can I ask one more. Maybe, Tom, you can help me with the statistical part, but it seems like if you're looking for better accuracy in a process, and it seems that the criterion that the FDA put down with having a lower confidence interval, be above seventy, you can achieve that by having a higher mean with the same variability, or you can achieve it by having the same mean with less variability.
And it seems to me that what we're asking for is more consistency in a process and less variability, and this really only addresses fixing it by moving the mean.
DR. FLEMING: You are correct, and Dr. Dumont I think acknowledged, when he was presenting, that we're trying to simply--and I appreciate what he was trying to do. He was trying to take some complicated issues, simplify them, assuming that you have normal distributions, and you're absolutely right--you don't necessarily have normal distributions, and the essence of what the FDA criterion indicate is that you want the area under the curve, that falls to the left of 75 percent, to be rather low.
And you're exactly right. You can get that by either shifting the distribution to the right, with considerable variability, or tightening the variability, getting more precision around having maybe not a higher mean but a mean that's sufficiently above 70, 75 percent, that it's sufficiently precisely estimated that you have a low probability of being below seventy-five. So you're right.
DR. SZYMANSKI: These studies really done on technical level, and there are technical variabilities, you know, you have to take that into account. That doesn't mean that they are the true, absolute--absolute truth.
For instance, if you measure the red cell mass, different methods, you will get different results, and right now, most of that cell masses in these studies are measured with technetium, in technetium, and that has a higher grade than chromium.
And when I compared results, mass results measured with chromium, or with technetium, I found 10 to 15 percent overestimation with technetium. I presented that in ASHE meeting in 1996. So, you know, this also is one variable, that one has to consider what you use to measure these various values, and there can be variations between different laps, depending on what is their methodology.
And then again I want to bring this donor, recipient donor variability, because that is a biological variability which is hard to make, you know, totally uniform, that there is no variation. That you can't have a Gaussian curve but this very, very, you know, precise area.
And so I mean, you can decide whatever you want, you can have a perfect, you know, very high levels, and very high thresholds, but then when you go to labs and you actually measure these things, you might really not get them, and then you really have difficulty in obtaining, you know, validation. And if you apply the methodology as we have used them in the past, these are the variabilities that there is. I mean, it would be lovely if it would be much, much better. But those happen.
FDA can put high values, expected to produce wonderful clinical outcomes. It might; but you might never be able to measure that.
DR. DUMONT: Mr. Chairman--oh. Sorry.
DR. CRYER: One more methodological question that addresses the variability issue. Do the labs that do this sort of testing ever use a pair design, so that the person--it'd be the same person that got the control thing one week, and then a week later you did it with the new test on the same person, so that--in an attempt to get rid of the variability between subjects.
DR. DUMONT: We do use pair designs at times. However, this criteria is not on a pair design. This is an absolute criteria. So I agree that the pair design would resolve a lot of that issue.
DR. VOSTAL: If I could comment to that. When companies come to us and talk about the design of these studies, we always suggest to them they should use a control arm in their studies, so they can identify individuals who do have poor recoveries, and those can then be excluded from the final analysis. But it's up to the companies to make that choice. We don't require that they run a control arm since we do have a standard, a cutoff standard.
DR. DUMONT: One of the questions that I had from a regulatory standpoint, if we're going to make the leap to say that this axis, right here, relates in some loose way to clinical efficacy, then if company XYZ comes out, and they have a new product, and they compare it to red cells and ASX, that's in current use, and they show that they're superior to that, then are they going to be able to get a claim in the market, that they have a more efficacious red cell?
DR. FLEMING: That doesn't follow, because if we use, in cholesterol-lowering agents, if we use LDL changes to approve new agents, and you have statins now that have a 30 percent reduction in cholesterol, and they're approved based on that, fortunately, we didn't approve lipid-lowering agents when they had a 10 percent lowering of cholesterol, it didn't provide a benefit--but if you have a new cholesterol-lowering agent that gives a 50 percent reduction, it doesn't prove that you are better than one that has a 30 percent reduction.
Let's think of Dr. Szymanski's concern about the variability -- lab variability--the conclusion that I draw from that is it makes me more worried about using the surrogate at all. I wouldn't push that argument too hard, because the alternative to using a surrogate is to do large-scale thousands of person trials with non-inferiority analyses ruling out that this new formulation isn't unacceptably worse in terms of what we really care about, which are the clinical outcomes.
So if we don't believe in the surrogate at all, or we have considerable concerns about it, the conclusion isn't to make it even weaker. The conclusion is to turn to something else. So I'm not of the mindset, even though I'm a critic of surrogates, that we have that level of concern, unless my colleagues persuade me that the issue here is if we can still believe in this surrogate, what is a level of rigor that we need to have, in order to be confident that we are protecting the public, that we don't have a meaningfully less-effective agent when you're using these measures?
And if the argument is being given that this standard isn't going to be met by large fractions of current state-of-the-art agents, I'd like to just defer until Dr. Kim's presentation, because I think that isn't true either.
DR. RENTAS: If I could say really quick, I think the numbers presented by Dr. He speak by themselves. Even when you apply the 95/70 rule back to 1998, 2003, 17 out of 19 will meet that criteria. I just think that speak by itself there.
DR. SIEGAL: Last comment? That's it.
DR. FINNEGAN: My comment was could you please take a break.
DR. SIEGAL: Yes. Well, all right. Let's hear from Dr. Kim and then take our break. And those who need to take a break now are excused.
DR. KIM: Good afternoon. My name is Jessica Kim from FDA, and I'm a biostatistician at the Division of Biostatistics. I'm holding cough drop because I have very severe coughing, so that's why my voice is a little weird.
My presentation's title is the Statistical Methods in the Evaluation of Red Blood Cell Products (in vivo study), and I'm going to present statistical ways of understanding current FDA acceptance criteria for RBC products that the Agency accepted since 2004.
Here is the outline of my presentation. First, I'm going to go over timeline of when each element of the current acceptance criteria was adopted.
And then detailed statistical analysis, detailed statistical procedures of the current acceptance criteria will be discussed, and during the discussion two items will be focused.
One is the criterion that emphasizes on the viability of the individual RBC products, and the second one is the statistical power of a study which played important role, analyzing the historical data. And then briefly, the summary of the BEST data will be provided to continue FDA's analysis of the BEST and FDA-combined data, and then I will summarize my presentation.
Now here's the timeline. Up until 1997, 75 percent RBC survival was used for the acceptance criteria for RBC products, and in the period, 1998 to 2003, mean RBC survival at least 75 percent, and standard deviation is at most 9 percent, and at least 20 units at two sites. The more specific statistical criteria was settled for the acceptance criteria.
And then after 2004, the 95, one-sided low confidence interval for the population proportion of successes need to be greater than 70 percent, and here, a success is defined as RBC survival is at least at 75 percent. That element was added to the current acceptance criteria for RBC products.
Now, in summary, we can see that you have two parts of this current acceptance criterion, and let me look at the second part of the current acceptance criterion. That about the sample mean, at least 75 percent, and the sample standard deviation is at most 9 percent, and at least 20 units, in total, and at least two sites.
These criterion is mostly about the sample data, about the study result. It does imply about the population proportion but we do not connect using this criterion. This is only for the--criterion for the sample data, and that the first part of this current acceptance criteria is more about emphasizing individual units' viability and connects to the relation to the population distribution.
Now in vivo study, that the one-sided 95 percent lower confidence limit criterion is equivalent to, say, we are testing, the population proportion of successes is greater than 70 percent, versus than or equal to 70 percent.
And the corresponding testing hypothesis or procedure to test such a hypothesis, we need a couple of them to be prespecified ahead of time.
One is the definition of success, and the other one is the determination of the study size with the significance level, and the desired power.
And here this individual success is defined as in vivo RBC survivor is at least at 75 percent, and the significance level, in other words, the force positive rate in this case, is defined as one-sided at 5 percent. And I would like to point out the one thing in this criterion. Traditional clinical trial under FDA's regulation, one-sided at .025 percent used and the two-sided, 5 percent used, which means this criterion actually has a little higher force positive rate than the traditional clinical trial under FDA regulation.
Now before I go to the actual statistical procedure to testing such hypothesis, this slide shows the graphic interpretation of the in vivo study hypothesis. Now this histogram is constructed using the BEST data, and if the population distribution of in vivo RBC survival percent is given as this graph in slide six, using threshold value 75 percent to categorize each individual as a failure or success, the right-hand side under the histogram will give you proportion of successes, and the study hypothesis, in vivo study, we want to have, we want to make this area under the curve as large as possible, and actually we set this value at at least 70 percent.
And you can also notice that the data from the BEST study, I checked the normal assumption. Unfortunately the normal assumption was rejected and I believe this distribution curve is not symmetric, and also we have some extreme values that violates the normal assumption.
So I want to make sure that the mean 75 percent, and the standard deviation, 9 percent, is about the sample information, not about the population distribution.
So we are not talking about the distribution is standard at the 75 percent.
Okay. Now two. The next question in testing such a hypothesis is the answer about the question about the sample size, to admit the minimum acceptance proportion of successes and to take care of the limited resources of conducting such studies, FDA agreed, and they recommended at least 20 units in at least two sites. And this table shows different, various samples sizes, study size, and the true number of allowable failures out of these studies, specific study with a specific study size, to meet this one-sided 95/70 rule.
So, for example, if a study conducted with a size, 24, and you have 21 of them is greater than or equal to 75 percent, then that study will meet the 95/70 rule.
And the next row, if study conducted with a study size, 28, and 24 of them meet at least the 75 percent, that study will meet the 95/70 rule.
And I want to point out from this table, as the study size increases, the number of allowable failures, the number of individual units that did not meet the 75 percent, it can be increased, and still meet the 95/70 rule.
And that's partially related to the previous question, that if you shift the mean, shifted the mean response of the population distribution, that the variation, and also the variation is getting smaller, which mean the larger your sample size, you will still meet the--you have chance--to the 95/70 rule.
Now the next slide explains the correlation between the sample size and the statistical power. Here's technical definition of the statistical power. Statistical power is the likelihood of achieving a statistical significant result, if your research hypothesis is actually true.
What that means, and under our situation, if, in fact, the population proportion of successes is greater than 70 percent, what is the probability or what is the likelihood that a RBC recovery study will meet the acceptance criteria?
Now that's the technical definition of statistical power, and that's the practical application of the statistical power to our situation.
Now the big things that I want you to notice in this definition of the statistical power is the assumption. How much confidence, how strong evidence do we have about population distribution? Depending on that information, your likelihood, your probability or your statistical power will be different.
And this next statement, the sample states the relationship between sample size and the statistical power.
Now if the likelihood is good, if the chance that you will meet the conducted study, will satisfy the current acceptance criteria is good, in a sense, at least 80 percent, then your sample size would be considered adequate.
In other words, if I make it a negation, if the likelihood is not good, if you have a low chance of meeting the current acceptance criteria, then your sample size would not be considered as adequate. So the answer to that kind of the, the issue, to take care of that issue you can increase the sample size to get the higher power, to higher chance of meeting the acceptance criteria.
So this is the technical definition and the relationship of the sample size and the statistical power, and next couple of slides, I will show, again, the numerical ways of looking at the statistical power in relation to the sample size in the big assumption, the "if" part.
Now this table, I calculated all the powers with a different sample size, different study size. So this number, 14/19 indicates the study size, and the parenthesized value is the number of allowable failures to meet the 95/70 rule.
And here, first column is assumed to be true rate. If we had prior knowledge on the population distribution, you can have .75, that's what you believe about the population distribution. Under that assumption you can calculate the power.
If you have a prior knowledge about the population distribution, the population proportion of success as .85, you can use that assumption for the population proportion of successes, you can calculate to the power.
Now I want you to pay attention. If you have a strong belief about the population distribution, the population proportion of success if .9. As the sample size increases, the power increases. The likelihood that that particular study will meet the acceptance criteria increases.
Well, under the fixed sample size, if you have information about the population proportion of successes greater, increasing than with fixed, under the fixed size, the power increases. And the number, the true rate increasing, meaning it's farther away from the testing value, the testing hypothesized value which is the 70 percent.
So what I want to emphasize from this calculation, from this table, power depends on study size and also the assumed true rate, population, the proportion of successes. So the question becomes this increasing sample size, that's obvious, to get the high power, but the question becomes how much information do we have about the population proportion of successes?
How much statistical evidence or the clinical evidence, how much evidence do we have? which number can we use to evaluate the likelihood of the particular study will meet acceptance criteria?
So I'm going to talk about this estimation about the population proportion of successes, using the FDA and BEST data.
Now this graph shows with the selected study size, study size 33, 28, and 24, the power curve looks like this, and again, the power increases as the true rate increases, and the power increases as the sample size increases, study size increases.
So if you have a strong belief, strong evidence about the true population proportion of successes as 0.875, that's the vertical line, then the larger the sample size, this upper--the curve of the sample size is 33. The large the sample size, the strong likelihood that you will achieve.
Okay. The next two slides is a brief summary of the BEST data, and this table is quoted from Dr. Dumont's paper, and FDA investigated and verified all the information as correct.
Now there were 42 liquid-stored data, 641--42, liquid-stored data values, and the percentage of the individual RBC recovery will meet at least 75 percent was 88.3 percent. And the 70 percent threshold value, there were 95.5 percent of them, meet that 70 percent threshold value.
And then there were 98.1 percent of them meet the 67 threshold percent, and the mean response was 82.1, and standard deviation was 6.71.
And so using those proportion, meet the different threshold value from the BEST data, we can calculate the power, the likelihood to meet the current acceptance criteria with the specific sample size of 24, and it was 0.693 and 0.979 and 0.999. And as I emphasized before, depending on the estimate about the true rate of the population proportion of successes, the power increases, and this one is just a sample size 24, and if you increase the sample size the likelihood will increase.
Now this table summarized the BEST data, using the number of years that the product, the particular product approved, and whether the particular product satisfies the current acceptance criteria. So, for example, there were two studies, retrospective, collected, approved in year 2000, and those two studies met the 95/70 rule.
And the year 2001, there were two studies collected, and one of them didn't meet the 95/70 rule. And the BEST data, there were one, two, three--four studies approved in the 2004 and 2006, and FDA was able to collect four additional approved studies in year 2004, 2007, and those approved, additional approved studies, would be added in the FDA's analysis.
Now here's the scatter plot of all the data, and the axis indicates the year of the study approved, and with a decimal point extension indicates the study number.
So in 1990, study number nine was approved, and this is the distribution of the data, values of the 24-hour RBC recovery percentage. And that horizontal line indicates the individual success threshold value for the individual RBC recovery percentage.
And the circled study are from the four additional FDA's data.
The next graph shows proportion of successes of each data, and again the x axis categorized the year of the study approved and the study number, and the horizontal--the vertical line indicates observed proportion of successes.
And the circled location shows a study didn't meet the current criteria. And as you can see, in year 2001, with the study number 31, in year 2003, study number 21--all of them met the current criteria, even before the 95/70 rule was adopted.
So overall, this graph shows the improvement of the product as year goes by.
Now using the BEST and the FDA combined data, I found out the proportion of individuals who meet at least 75 percent, and the proportion of the individuals who met the 74 and the 73 percent, and then with a different study size, I calculated the power.
Again, I want to amphioxi the power, meaning the likelihood of your particular study will satisfy the current acceptance criteria, increases as sample size increases.
And again, if the true rate, using a true rate about the population increases with a fixed study size, the likelihood increases.
So the key point is how, again, the same question that I raised before--how we can verify the information about the true rate, information about the true population proportion of successes.
So one way to have a better estimate about the population proportion of successes, we categorize the data into different time periods. Now this table shows 1990 through 1997, and then 1998 to 2007, and the significance, the year of 1998, is the year when the standard deviation criterion was adopted.
And as you can see, the success rate from the data in this period was 0.836, and the success rate in second period was .0898, and because of the higher success rate, again, the power, the likelihood to meet, the particular study will meet the current acceptance criteria is much, much higher, significantly higher.
And the corresponding graph to this table shows more obvious evidence. Here, over the time period, the first and second time period, the power increases significantly with a different threshold value grew.
Now the next table, I categorize the data with another two different time periods, and this table, the significance of the year 2004, is the year when the current acceptance criteria adopted. And using the same threshold value for the individual success, success rate was estimated for the population proportion of successes, and the power was calculated with a fixed study size of 24, and again, the corresponding graph to this table, with a different threshold value, grew over the time period, the power, the likelihood, or the probability that this particular study will meet the current criterion, the 95/70 rule, is significantly increased.
Now the next table categorized the data with three different periods. This is simply--1998 is when the standard deviation criterion was accepted and the 2004 when the current acceptance criteria was adopted, and with the threshold value for an individual success, 75 percent, the population proportion, estimated population proportion of successes increases from 0.836 to 0.883, and then 0.931, and the power significantly increases.
And the corresponding graph to this table is given as this slide with a different--success threshold value grew.
And as you can see over the time, the power increases and it verifies the previous line graph, that over the year, the product has better quality, and over the year, the manufacturers were able to produce the better quality of RBC product.
So here's my summary. Now FDA's current acceptance criterion emphasizes critical significance to ensure that each recipient receives a high, viable RBC product. Now the statistical power to meet FDA's 95/70 rule depends on the study size as well as an estimate for the population proportion.
So to have a high likelihood, to have a high power, one way to solve that question, to answer that question, we can increase the study size, and also we can estimate the true rate under current acceptance criteria and/or under more recent time period, which is more relevant to estimate the population proportion of successes.
And the traditional clinical trial under FDA's regulation, 80 percent power has been used for a typical clinical trial. And the data clear show that the manufacturers are able to produce better products, over time, and the Agency believes the current acceptance criteria serve well for the purpose of regulating viable RBC products.
And thank you, and that's all for my presentation, and do we have any questions?
DR. SIEGAL: Are there questions for Dr. Kim?
DR. FLEMING: I'd like to just quickly step through slides 21, 17, and 9, and in reverse order, could you go back to 17, just to make sure we're drilling down with the same common understanding.
So let's go to 21 first. My apologies. Twenty-one, first.
DR. KIM: I can't see this here. Which one is 21?
DR. FLEMING: Go forward, I think. So it's the fourth slide from the end.
DR. KIM: This one?
DR. FLEMING: Yes.
DR. KIM: Okay.
DR. FLEMING: Okay. So just to break this down a little bit and try to put it into simple terms. If you take these eight studies that are the ones, the eight products that have been approved in the last four years under the current criteria, these studies, when you pool together all of the 173 people, in 93 percent they successfully hit the 75 percent criterion. And so, in fact, if your new product is just average relative to those eight--that's, in essence, as Dr. Kim was talking about, from a hypothesis perspective, if we say out product, in truth, is just the same as the average, not the best of the eight, not the worst of the eight, just average, then we would have a 92 percent chance of getting that product approved according to the current criteria.
Now if you say, well, I'm not going to be that stringent, I'm going to go back to the 19 products that have been approved since 1998, including a couple products that don't meet this criterion, should they have been approved or not? We could debate that one.
But let's assume they're good enough. I'm going to include them in my average.
So if I go back to the 19, by the way, what we know is that 17 of them 19 do meet this current FDA criterion. So the argument that these products are having trouble meeting this criterion is hard to defend when you're looking at, of the products that are approved, 17 of 19 hit this criterion.
But if in fact we say it's good enough to be the same as the average of all of these products, which weigh more heavily, the way things were in the late '90s compared to where they are now, then go back to slide 17, when you pool those together, what you have is a 90 percent success rate, 90 percent of all of those 549 people achieve the 75 percent recovery.
And if you, in truth, have a 90 percent success rate, you've got just under an 80 percent chance that your product will be approved.
So last slide; go back to number nine.
DR. KIM: Number nine?
DR. FLEMING: Slide nine. So you're back about eight slides. It's the one that says at the top--it's the table for power. It's one before this. There you go.
So essentially, what's happening is if you use the current FDA criterion based on a sample size of 24, which means you can only have three or fewer failures, then where are you in this true rate?
Well, in the last eight approvals, their true rate, in truth, is 93. Their true success rate is 93, so 92 percent of those products will get approved.
If you just say I don't have to be as good as the products in the last four years. I'll be as good as the products, the 19 products over the last decade, including two that didn't, in fact, meet these criteria, then that success rate is 90, and so you'd still have almost an 80 percent chance of getting the product approved.
So this is where truth is in the last four years. This is where truth is when you look at the average over the last decade. Now under these criteria, you're still going to let half the products through that have only an 85 percent success rate.
You're going to let a quarter of the products through that have only an 80 percent success rate. You're going to let one in eight products through that have only a 75 percent success rate.
Remember, today, the successful products are 93. If anything, the issue is we're still, under the FDA criteria, letting a fair number of products that look a lot less effective through.
But if your view is 75 percent's good enough, 75 percent is like 95, then let's weaken the FDA criteria so we let more than one in eight of these products through. We can let six or seven of eight of these products through.
So this column is telling you exactly what is the current success rate, according to these truths, and where truth is in the successful products in the last four years is here, at 93, and even if you go back to the last decade, it's here, at 90, and you're seeing a high approval rate, and you're seeing even the possession of some approvals for products that are discernible less successful achieving 75 percent.
But if you want these products, if you want products with a true rate of 75 rather than 93 to get through, then let's weaken the FDA criterion.
DR. DI BISCEGLIE: A quick one, if I may. Just to understand the regulatory process. Is it possible for a sponsor, finding that they've got one too many failures, to go out and recruit another six patients to increase their sample size?
DR. EPSTEIN: The statistician should answer, but the answer is yes, but you pay a statistical price. In other words, the number of additional subjects you have to study is increased over the number you would have had to study, had you first selected a larger study cohort.
But the answer is yes, you can go back and study more and get a more accurate answer. But there is a statistical correction or price you pay.
DR. CRYER: But that shouldn't hurt you if your product really is good.
DR. EPSTEIN: That's correct. But i think we alluded to this, and I just want to make one point clear, and Larry can comment on it, if he wishes.
I think part of the underlying problem here is a business case, because what hasn't been made explicit, the cost per individual patient of doing a radiolabeled red cell recovery is very high, and I think it's around the region of 8- or $10,000 per subject. And so the problem, from the business point of view, is if FDA were to say, well, 80 percent power is adequate because it's a typical standard of a drug trial, from the business point of view, a company is saying, well, I don't want to run a 20 percent risk that my product, which complies with the FDA standard, will fail in a trial that's going to cost me around, you know, $240,000.
And I think that that is part of the driver for wanting a higher level of assurance that the trial will succeed.
But, you know, FDA's point of view isn't to look at the cost of the trial. The trials are feasible and our goal is to have the highest quality products that are achievable with the current technology.
But I think that there's been an undercurrent which has gone unstated, and Larry, if you want to address it, I think it's be helpful.
DR. DUMONT: It's true that these are very expensive studies to run. The other thing, when you're considering cost from a business case, is the calendar time that it takes, and calendar time for people that develop products, I don't do that anymore--to do, for example, a pair trial, I mean that's at least a six month endeavor, if not longer, because you have to wait for 42 days, you have to do the study, etcetera, etcetera.
But the other piece that doesn't have to do with the money is actually the number of subjects that you expose to radioactivity.
And so if you drive up the numbers, I mean, it's easy to work the numbers, but the numbers are actually people that you expose to radioactivity. So the higher that goes, the more people are put at risk, and I think that is in the purview of FDA.
I had a quick question for Dr. Kim while she's standing there, if I could.
I was just curious, the last group of studies from 200,4 onwards, where there's actually been so much focus on this, how do you know that those subjects weren't selected subjects, that, oh, we know that Mary Jean gives low recovery so we're not going to enter her in the study, we're going to pick these people cause they always give us high recoveries.
How do you know that?
DR. KIM: That's not something that we can answer. That's integrity of the sponsors, how they conduct their study, and if we have that kind of problem in regulating and submitting, that ruins the relationship between the sponsor--trust between sponsor and FDA.
If you suspect that there is a case of such kind of things, then I think you have to report to FDA.
DR. ZIMRIN: I find it truly amazing that two different groups could look at the same dataset, or a very similar dataset, and come up with conclusions that are so amazingly different.
Could you explain--and this is probably a futile request--but explain in terms that a nonmathematician could understand--how that could come about?
DR. KIM: The simple answer is the BEST study party focused on estimating the true population proportion rate, combining all the datas, and the FDA looked at the separate time period. So we didn't change, we didn't manipulate, we used the same data. The difference is estimating about the true population proportion, using everything all together versus time period, and the time period is not randomly separated. It depends on when the new criteria was adopted, like if 1998 was the-- significant changes in standard deviation criteria was adopted, so we believe there is some kind of action different from the year 1990 to 1997, and then 1998 to 2003.
And in 2004 is the year when the current acceptance criteria was adopted. So that's the difference.
DR. ZIMRIN: One more question. Of the current additives and things that are commonly in use, can you give me some sense of when they were approved.
DR. KIM: When they were--
DR. ZIMRIN: I mean these ones that we're talking about, these awful ones, that only four or eight would be approved today. Do these include things that we commonly use today?
DR. KIM: The dataset that was used. Doctor--
Dr. ZIMRIN: No, no, no. The product.
DR. VOSTAL: I can't really give you an exact date those were approved, but they were approved a long time ago. But there has been changes in the products from the early times to now, and some of that was already discussed in terms of leukoreduction, you know, and more of the products being leukoreduced, and also there are more apheresis products on the market. So there is a subtle change, over time.
DR. FLEMING: Actually, I don't think the two analyses are different, in the following sense. If you go back to slide 21 again, if you could, it's the fourth slide from the end. The BEST trial, as I understand--the fourth study from the end. Okay; keep going. Okay.
When I was talking through this, I focused first on what we've seen in the last four years, and then I expanded to what you would see over the last decade, and when you bring in this success rate and dilute this one, then you end up with a 90 percent success rate, and the individual basis, and that gives you 80 percent power.
If you bring in these eight studies as well, then the success rate goes down and the power goes down, and that's what BEST is looking at, plus I believe BEST only had four of these trials.
And so that's the only difference between what BEST was doing and what the FDA is doing, and again, to me, slide nine is really the key issue because if you--one more time, if you could, go back to slide nine. And so the FDA is looking not only at four trials in the last four years, they're looking at eight trials, and over those eight trials the success rate is 93 percent and the power would be 92 for somebody who's just the average, similar to those eight.
If you then add back the 11 studies from '98 to 2003, your success rate dilutes to 90 percent, but you still have almost an 80 percent success rate.
If you dilute further back to those studies that go back to 1990, then your success rate is going to be dropping down into the area of 88 percent, 87 percent, and the power's going to drop to 67 percent.
So again, the fundamental issue is, if you have a product that's just average for what you've had approved over the last four years, you've got a 92 percent chance of meeting the FDA criteria.
If you have a product that's just average, over the last decade you have an 80 percent chance of getting the product approved.
But if you want to allow products to be approved here, that have only a 75 percent, when we can get 93 today, at least 90 today--but if 75 is good enough, then, yes, that product only has a one in eight chance of succeeding.
But if we weaken the FDA criteria, we can get a much greater chance of getting approved.
DR. SIEGAL: One last point.
DR. DUMONT: I just want to address that since that was my study. I think the difficulty is we looked at the data as the best representation for what's being used today in the blood bank.
Dr. Kim looked at the data differently. She was looking at more of an instantaneous change in time. So there is a difference in the way the data were--the data didn't change but there was a difference in the way it was looked. And if you remember that last slide that I showed, those were current red cell products that are made in the blood bank, they're leukocyte-reduced, they look just like the total population that we sampled from.
So that, in my view, is the difference between the two.
DR. SIEGAL: Is there any more discussion at this point?
DR. CRYER: I have one last question on that. What percent of the patients that would not meet the criteria,